Elmord's Magic Valley

Hel is now Fenius, and other notes

2019-05-13 16:41 -0300. Tags: hel, fenius, in-english

After feedback from a number of people (1) in the last post, and consulting privately with a number of other people (1), and some consideration on the merits of each naming option (including possibility of confusion with other projects and searchability in your favorite search engine), I'm going with Fenius as the new name for Hel. That will also give me an excuse to get more acquainted with Irish mythology and legends to be able to give cool names to Fenius-related tools.

As for the question of licensing, I'll keep everything under the GPLv3 for now; I can worry about library licensing after I actually have libraries to license. But I have reached the following conclusions about the matter:

If the goal is to have something like the LGPL but allow static linking, it is better to use a stricter license (such as the LGPL) and add an exception notice granting the right to static linking, than to adapt a more liberal license (such as the MPL) to add further restrictions.
The reason is that in the first case, you are still using the LGPL, while in the second you are creating a new license. And if you create a new license, you have to handle updates to the license. In the case of an existing popular license such as the (L)GPL or MPL, there is an organization respected by the community behind the license, who can be reasonably expected to publish new versions of the license in the spirit of the existing ones. If I create a new license, who gets to publish new versions of the license? The answer is either myself (and why would you trust a random person to publish 'good' new versions of the license?), or no one (and then you cannot update the license unless you get every copyright holder (i.e., everyone who ever made a contribution to the codebase) to agree).
Of course these points are moot as long as there is a single copyright holder to all the codebase, but typically contributors retain the copyright to their contributions to the project.
Although at first I thought of the MPL as not being very useful when you want a project to be copylefted, I realized that because MPL'd code is easier to incorporate into codebases using different licenses, in the future I might use the MPL in situations I would otherwise use a MIT-style license, i.e., it actually increases the amount of copylefted code I might publish. I now think it's a very nice license to have in some circumstances.

Stay tuned for a post about persistent hashmaps Real Soon Now™.

P.S.: My coding activity is still a bit limited right now due to RSI, which is getting gradually better.

Comentários / Comments

A name change?

2019-05-07 21:49 -0300. Tags: hel, fenius, in-english

The release name for Hel 0.3 is "Syntactic Mead". It's a play on syntactic sugar, a reference to the honey-based drink which plays a role in Norse mythology, and motivated by the change from a Lisp-like syntax to the current, more complex syntax.

I've been thinking of changing the name of the language itself from Hel to Mead. Hel is a cool name, but perhaps not the most positive of names. (Although if Inferno could get away with that name, perhaps Hel is not that bad.) Moreover, Hel stands for "huangho's Experimental Language"; and while it is pretty experimental (and pretty huangho's) now, I expect it to be less so at some point (though that point may be very far away, if it ever gets there). The qualifier 'experimental' made more sense when I had no idea where I was going with this project, but now I have a somewhat clearer idea of where I want to get. Hel 0.3 is basically the incarnation of Hel that thrived; it could use a name of its own. (Of course, I could just change the meaning of the acronym instead.)

One minor problem with the name Mead is that it might be seen as having some thematic relationship to (a.k.a. being a ripoff of) Elixir; but that's a mostly harmless coincidence. There are also a number of projects called "Mead", most of them abandoned, but some active. There is also a company called MeadCo. There are other Hel projects around too; finding good unique names is hard.

Other names I have considered are Eris, which would give me an excuse to throw in lots of Discordian references in documentation, but there are even more active projects with that name around (including, unsurprisingly, a Discord library); and Fenius, after the lengendary guy who created Irish (Gaelic) out of the best parts of all languages after the confusion of tongues in the Tower of Babel. That name seems to be relatively free from conflicts, but I'm not sure it sounds as cool.

(In the Mead of Poetry theme, I also thought of Kvasir, but of course that's also taken by a programming language.)

What do you think? Do you like (or dislike) any of these names? Should I stick to Hel? Have other suggestions? Feel free to comment.

In other news, Hel (Mead? Fenius?) got immutable dictionaries (persistent hashmaps) this week. But I'll write about that later.

4 comentários / comments

German declension: modifiers

2019-05-03 22:13 -0300. Tags: lang, german, in-english

I've been dabbling in German again, and trying to learn the declensions for the articles, adjectives and other modifiers. These are some notes I made in the process.

The tables below are colorized with JavaScript. If they don't get colorized properly for you, please notify me. If you find mistakes in the text, please notify me too.

Genders, cases and numbers

German has three genders (masculine, feminine, neuter), and two numbers (singular and plural). There is a single plural declension for all genders, so, for didactical purposes, we can think of plural as a fourth gender.

German has four cases (nominative, accusative, dative, genitive), which indicate the role of the noun phrase in the sentence. In general lines:

Nominative is used for the subject of the sentence (the dog sees the cat). It is also the 'default' form of nouns you will find in the dictionary.
Accusative is used for the object of the sentence (the dog sees the cat).
Dative is used for the indirect object of the sentence (the boy gave the girl a book).
Genitive is used for possessives (the girl's book) and similar situations of nouns modifying nouns.

All cases but the nominative are also used as objects of certain prepositions. Each preposition defines which case it wants its object to appear in. Spatial prepositions typically take the dative to indicate place and the accusative to indicate movement:

Dative: im Wald (= in dem Wald) "in the forest"
Accusative: in den Wald "into the forest"

In the tables in this text, genders will appear in the order masculine, neuter, feminine, plural. This is not the usual order they are presented, but masculine and neuter often have similar forms, and so do feminine and plural. I chose this order to leave similar forms close to each other. The order of cases was also chosen for similar reasons.

The definite article

The definite article forms in the nominative are: masculine der, neuter das, feminine die, plural die. It took me a while to memorize which gender is which form, until I realized:

der is similar to er (he).
die is similar to sie (she).
die can also be plural, and so can sie (they).
das ends in -s like es (it). It is also cognate to English that and the Icelandic third person neuter pronoun það. (I realize that a comparison to Icelandic is not exactly the best mnemonic device in the world, but it works for me.)

The declined forms are:

Masc. Neut. Fem. Plural

Nom. der das die die

Acc. den das die die

Dat. dem dem der den

Gen. des des der der

	Masc.	Neut.	Fem.	Plural
Nom.	der	das	die	die
Acc.	den	das	die	die
Dat.	dem	dem	der	den
Gen.	des	des	der	der

There are a lot of patterns to observe here, and they will often apply to other parts of the declension system too:

Masculine is the only gender that distinguishes nominative from accusative; the other genders always have the same form for both.
Masculine and neuter have the same dative and the same genitive.
Feminine uses one single form for the nominative and accusative, and another single form for dative and genitive.
German has a general fixation with the idea of having an -n in the dative plural. This applies to nouns too (e.g., den Kindern "to the children"), except nouns whose plural ends in -s like Autos. Other than that, the article is the same for feminine and for plural.

The indefinite article (ein)

At this point I would like to present the declension for the indefinite article (ein). The problem is that it does not have a plural, so the table would not show all the forms I want to show. Instead, I will present the declension table for kein, which is the same except it has a plural.

Masc. Neut. Fem. Plural

Nom. kein kein keine keine

Acc. keinen kein keine keine

Dat. keinem keinem keiner keinen

Gen. keines keines keiner keiner

	Masc.	Neut.	Fem.	Plural
Nom.	kein	kein	keine	keine
Acc.	keinen	kein	keine	keine
Dat.	keinem	keinem	keiner	keinen
Gen.	keines	keines	keiner	keiner

Many of the patterns repeat themselves here.

Masculine is the only gender with distinct nominative and accusative forms.
The masculine nominative has no ending, but the endings for the other cases are the same as those of the definite article (accusative -en, dative -em, genitive -es).
The neuter nominative has no ending either (and neither does the accusative; remember that the genders other than masculine don't distinguish nominative and accusative), but the dative and genitive are again like the definite article, and the same as the masculine.
The endings for feminine and plural are also similar to the definite article: feminine has -e in nominative and accusative, -er in dative and genitive. Dative plural again has its characteristic -n, but otherwise it's the same as the feminine.

There are three places in the table where kein has no ending: the masculine nominative and the neuter nominative and accusative. These three places will be important later.

Adjectives

German adjectives have three different kinds of declension:

The strong declension is used when the adjective is not preceded by an article or other determiner.
The weak declension is used when the adjective is preceded by the definite article.
The mixed declension is used when the adjective is preceded by the indefinite article and the possessive determiners (mein, etc.).

In the following tables, we will use the adjective groß (big, large) as an example.

Strong declension

	Masc.	Neut.	Fem.	Plural
Nom.	großer	großes	große	große
Acc.	großen	großes	große	große
Dat.	großem	großem	großer	großen
Gen.	großen	großen	großer	großer

The endings look a lot like those of the indefinite article, with some important differences.

Remember those three places where kein had no ending? In these places, the strong adjective declension has the same endings as the definite article:
- großer (like der) in the masculine nominative,
- großes (like das) in the neuter nominative/accusative.
The other difference is that the masculine and neuter genitive ending is -en, not -es.

Otherwise, all the patterns repeat themselves here.

Weak declension

The weak declension is used with the definite article. Actually, to quote Wikipedia, "weak declension is used when the article itself clearly indicates case, gender, and number". It is used not only with the definite article, but also with other determiners like welcher (which), solcher (such), dieser (this), aller (all).

Masc. Neut. Fem. Plural

Nom. der große das große die große die großen

Acc. den großen das große die große die großen

Dat. dem großen dem großen der großen den großen

Gen. des großen des großen der großen der großen

	Masc.	Neut.	Fem.	Plural
Nom.	der große	das große	die große	die großen
Acc.	den großen	das große	die große	die großen
Dat.	dem großen	dem großen	der großen	den großen
Gen.	des großen	des großen	der großen	der großen

Most of the endings are -en in the weak declension. The exceptions (which have -e instead) are:

The same three places where kein has no ending: the masculine nominative and neuter nominative and accusative.
- Masculine always has distinct forms for nominative and accusative, so that's not much of a surprise.
The feminine nominative and accusative. The feminine always has one form for nominative and accusative, and one form for the dative and genitive.

Mixed declension

The mixed declension is used with the indefinite article and the possessive determiners.

Masc. Neut. Fem. Plural

Nom. ein großer ein großes eine große keine großen

Acc. einen großen ein großes eine große keine großen

Dat. einem großen einem großen einer großen keinen großen

Gen. eines großen eines großen einer großen keine großen

	Masc.	Neut.	Fem.	Plural
Nom.	ein großer	ein großes	eine große	keine großen
Acc.	einen großen	ein großes	eine große	keine großen
Dat.	einem großen	einem großen	einer großen	keinen großen
Gen.	eines großen	eines großen	einer großen	keine großen

The mixed declension is identical to the weak declension, except at those three places where kein has no ending: masculine nominative and neuter nominative and accusative. In those places, it has the strong declension endings instead (-er for masculine nominative, and -es for neuter nominative and accusative).

You can think of it as the adjective having the strong endings to compensate for the lack of endings of the article in those cases, e.g., because masculine nominative ein has no ending, the großer following it gets more distinctive endings.

Comentários / Comments

Elmord looks at licenses: MPL 2.0

2019-04-30 15:30 -0300. Tags: comp, copyright, hel, fenius, in-english

In the previous post, I analyzed the LGPLv3 in the context of looking for a license for the Hel standard libraries. In this post, I'm going to analyze the Mozilla Public License 2.0, or MPL for short. The MPL is not as well known as other free software licenses, but it's an interesting license, so it's worth taking a look at it.

Wikipedia actually has a pretty good summary of the license, and Mozilla has an FAQ about it, but here we go.

[Disclaimer: I am not a lawyer, this is not legal advice, etc.]

File-level copyleft

The most interesting aspect of the MPL is that it applies copyleft at the file level. Section 1 defines "Covered Software" as:

[…] Source Code Form to which the initial Contributor has attached the notice in Exhibit A, the Executable Form of such Source Code Form, and Modifications of such Source Code Form, in each case including portions thereof.

and "Larger Work" as:

[…] a work that combines Covered Software with other material, in a separate file or files, that is not Covered Software.

Section 3.3 allows distributing a Larger Work under terms of your choice, provided that the distribution of the Covered Software follow the requirements of the license.

In other words, the boundary between software covered by the MPL and other software is defined at the file level, and the license allows distributing a combination of MPL and non-MPL code under another license, provided that the MPL parts still remain under the MPL. Modified versions of the MPL-covered parts, if distributed, must be available in source-code form, but this requirement does not apply to files that were not originally part of the MPL-covered software.

One consequence of this is that one might take a library under the MPL, put all substantial changes in separate files, and release the resulting code in object form, but not the source code for the new files. Contrast this with the LGPL, which has terms specifically to prevent additions to the library from being 'isolated' from the LGPL by distributing them as part of the application rather than the library: as we have seen in the previous post, Section 2 of the LGPL requires that the library does not depend on functions and data provided by the application, unless you switch to the GPL (thus requiring the application to be GPL-compatible too).

Executables need not be under the MPL

Section 3.2(a) allows distributing the code in executable form, provided that the source code for the Covered Software is available, and recipients of the executable are informed how they can obtain it "by reasonable means in a timely manner, at a charge no more than the cost of distribution to the recipient".

Section 3.2(b) states that the executable may be distributed under the MPL or under different terms, provided that the new terms do not limit the access to the source code form of the Covered Software (i.e., the files originally under the MPL).

This means the MPL imposes no restrictions on static linking, other than that the MPL-covered source code remains available, and you tell users how to get it.

(L)GPL compatibility

Section 1 defines "Secondary License" as one of the GPLv2, the LGPLv2.1, the AGPLv3, or any later versions of those licenses.

Section 3.3 provides that if the software is distributed as part of a Larger Work which combines MPL-covered software and software under any of the Secondary Licenses, the MPL allows the MPL-covered software to be additionally distributed under the terms of that Secondary License.

An important point here is the "additionally" part: the Covered Software is to be distributed under the *GPL license in addition to the MPL, i.e., the resulting code is effectively dual-licensed. Recipients of the larger work may, at their option, choose to redistribute the originally MPL-covered part of the work under either the MPL or the Secondary License(s).

This provision makes the MPL GPL-compatible: you can incorporate MPL-covered code in GPL projects.

The author of an MPL-covered work can opt out of GPL compatibility by adding a specific note saying the code is "Incompatible With Secondary Licenses" (Exhibit B).

Patents and trademarks

Section 2.1(b) ensures that all contributors automatically grant a license for any patents they may hold to use, modify, distribute, etc., the covered software. Section 11 of GPLv3 has similar terms.

Section 2.3 is very careful to state that each constributor grants all and only those patents necessary for use, distribution, etc., of the their contributor version. It does not cover, for example, licenses for code a contributor has removed from their contributor version; or for infringements caused by further modification of the software by third parties (GPLv3 also has similar wording).

Section 2.3 also explicitly states that the license does not grant rights in trademarks or logos. This makes sense in light of Mozilla's fierce hold onto its trademarks and logos, which in the past led to the rebranding of Firefox as Iceweasel in Debian until an agreement was reached between Debian and Mozilla.

Like the Apache License, The MPL has a patent retaliation clause (Section 5.2): it states that if a patent holder sues someone alleging that the Covered Software infringes a patent of theirs, they lose the rights granted by the license to use the Covered Software. This is meant to discourage recepients of the software from suing the authors for patent infringement.

Conclusions

MPL is a weak copyleft license, providing a middle ground between the liberal MIT/BSD/Apache licenses and the *GPL licenses. It makes it really easy to incorporate the code into larger works, regardless of whether this is done via static or dynamic linking. On the other hand, the fact that it does not automatically extend to other files within the same project makes it easy to extend a library without releasing the relevant additions as free software.

I might use the MPL in the future for libraries in situations where the most convienient way to use the library is to just copy the damn files into your codebase (e.g., portable Scheme code). For larger libraries and projects where I want to ensure contributions remain free, I'm not so sure.

I would like a license that's midway between the MPL and the LGPL, allowing generation of statically-linked executables distributable under different licenses like the MPL, but with the boundary between the copylefted and non-copylefted parts defined more like the LGPL (though I'm sure the devil is in the details when crafting a license like this). If you know some license with terms closer to this, please mention it in the comments.

Addendum

So far the interwebs have pointed me to:

The 0mq license, which is the LGPLv3 with an exception allowing the distribution of executables under any terms. However, the exception "relieves you of any obligations under sections 4 and 5 of this license, and section 6 of the GNU General Public License", which seems to imply you don't have to distribute the source code for modifications at all if you use a modified version of the library in your executable. That's even weaker than the MPL.
(As a side note, it also states that "[i]f you modify this library, you must extend this exception to your version of the library", which I would interpret as a "further restriction" under Section 7 of GPLv3, which therefore can be removed. The 0mq project itself is moving to the MPL, according to their website.)
The wxWindows license, which seems to effect a similar weakening of the LGPL copyleft (see above).

Addendum [2]

The MPL states:

1.10. “Modifications”

means any of the following:

any file in Source Code Form that results from an addition to, deletion from, or modification of the contents of Covered Software; or

any new file in Source Code Form that contains any Covered Software.

A possible solution would be to use a license exactly like the MPL, except with an extra item like:

any new file in Source Code Form that other Modifications in senses (a) or (b) depend on.

The exact wording (to pin down the meaning of "depend on") would have to be figured out.

1 comentário / comment

Elmord looks at licenses: LGPLv3

2019-04-29 14:38 -0300. Tags: comp, copyright, hel, fenius, in-english

Hel is distributed under the GPLv3. The license of the interpreter does not impose any restriction on the licenses of the programs it runs, so there is no requirement for Hel programs to be GPL'd too. However, Hel will (I hope) soon get libraries, and the licenses of the libraries may affect the licenses of the programs importing them. So I have to decide: which license to use for the Hel standard libraries?

The obvious choice would be the LGPL (the GNU Lesser General Public License), a license meant for libraries to allow linking to proprietary programs. I don't think I have ever released anything under the LGPL before. While I have a pretty good idea of what the GPL means, the LGPL was not so clear to me, especially about what it means for statically vs. dynamically linked libraries, and how this distinction applies to languages other than C/C++. This post is my attempt to understand it.

The other option I have thought about is the MPL (Mozilla Public License). However, I think the MPL's copyleft may be way too weak for my tastes. I intend to write about it in the future.

[Disclaimer: I am not a lawyer, this is not legal advice, etc.]

Digression: (L)GPL 2 vs. 3

Most licenses are written in a hard-to-read legalese. I find the GPLv3 and LGPLv3 particularly hard to read; this is partly because the FSF has tried to make definitions more precise and to avoid terminology which could be interpreted in different ways in different countries.

By contrast, the GPLv2 and LGPLv2 are some of the most "written by/for humans" licenses around; they are truly a pleasure to read, so much so that, were it not for some important guarantees added in GPLv3, I would be tempted to keep using the GPLv2 for my software.

These important guarantees include:

An anti-anti-circumvention clause: Most countries nowadays have legislation criminalizing the circumvention of DRM mechanisms (such as copy protection in some e-book readers, music players, streaming clients, and so on). Section 3 of the GPLv3 basically states that no work under the GPLv3 is to be considered part of a DRM mechanism for the purposes of such legislation, i.e., if GPLv3 code is used to implement DRM, it is not illegal to circumvent the DRM.
(Curiously, Brazil does not seem to be a signatory of the WIPO Copyright Treaty, but Art. 107 of Law 9.610/1998 imposes similar restrictions on DRM circumvention.)
An anti-tivoization clause: Tivoization is the use of hardware protections to forbid modification of the software running in a system, despite the software being free (for example, a cell phone which runs free software, but which you can't modify without having keys provided by the phone vendor; or, say, a processor running an entire built-in operating system with special privileges which you cannot change or disable). Section 6 of the GPLv3 has provisions which state that if the software is distributed in object code form with a user product (e.g., a phone), any installation information (authorization keys, procedures, etc.) required to install a modified version of the software in the product must also be provided.
Patent protections: The GPLv3 has provisions to preclude discriminatory patent deals, such as the infamous one celebrated by Microsoft and Novell in 2006, in which Microsoft essentially charged Novell for the right to distribute GNU/Linux without being sued for infringing Microsoft patents, which Microsoft claimed to have over the Linux kernel. Section 11 of the GPLv3 precludes such deals.

For all of these reasons, I prefer to use the (L)GPL version 3 nowadays. End of digression.

The LGPLv3

The LGPLv3 is defined by inclusion of the terms of the GPLv3 plus a number of overrides. For this reason, it is relatively short compared to the GPLv3. I will try to summarize each section as I understand it below.

Section 0 defines a bunch of terms. "The Library" refers to the work released under the LGPLv3. An "Application" is an application which uses interfaces provided by the Library, but the application is not derived from the Library itself. A "Combined Work" is the work produced by the combining or linking the Application with the Library. (Some more terms are defined, but we'll see them later.)

Section 1 allows distributing the library without being bound by Section 3 of the GPL. That's the anti-anti-circumvention clause, and I find it somewhat surprising to see it waived here, but there it is.

Section 2 states that if you modify the library in such a way that it depends on data or functions provided by the application (other than as arguments passed to the library), you must either make sure the library still works in the absence of such data/functions, or release the modified version under the GPL (without the extra permissions granted by the LGPL).

The idea here is to preclude a legal trick. Someone might want to extend the library with proprietary code, and escape LGPL's copyleft by leaving the proprietary code as part of the application, and then making the library call the application's proprietary extensions. Section 2 requires that either the external application code is inessential for the library's operation, or else that the library be distributed under the GPL (and thus the application would also have to be distributed under a GPL-compatible license to be usable with the library).

Section 3 states that if your compiled program incorporates significant portions of header files from the library, it has to carry a notice saying it uses the library and that the library covered by the LGPL, and the program must be acompanied with a copy of the text of the GPL and LGPL.

One problem here is that "header file" is not defined in the license. It works for C/C++, but how it applies to other languages is debatable. Does calling a syntax-rules macro from the library trigger this clause?

Section 4 covers the distribution of a combined work consisting of the application plus the library. Besides the usual requirements to carry a notice and accompany the work with copies of the licenses, it also requires that you either (1) use a shared library mechanism to link with the library, so that the user can use modified versions of the library with the program; or (2) distribute the source code for the library and the source or object code for the application in such a way that the user can recombine a modified version of the library with the application object code.

The goal here is to ensure that the user still has all the freedoms to change the library (which is free), even if the application is proprietary. That's great in principle, but again it raises the question of what this means for macro expansion (which happens at compile time); there is no 'uncombined' object code when the application invokes macros from the library.
It also complicates static linking as a form of deployment; you still can do it (macros notwithstanding), but the packaging tool must also generate an uncombined application object code to be distributed with the fully statically-linked version.

Section 5 deals with combining multiple libraries (with similar requirements to provide an uncombined version too).

Section 6 deals with future versions of the LGPL, allowing the author to choose to release the code under LGPL "version 3 or later", and also allowing the author to specify a proxy who can decide whether later versions apply.

Possible solutions

The difficulties of applying the LPGL to Lisp-like languages have been recognized before. Franz Inc. created a Lisp Lesser General Public License, which adds a preamble to LGPL (v2.1, not v3) overriding some definitions of the LGPL to terms more appropriate to Common Lisp, and instructing that static linking is to be treated as a "work that uses the Library", not a "derivative of the Library" (i.e., an Application and not a Combined Work, in LGPLv3's terms), effectively treating static linking the same way as dynamic linking.

Interestingly, the LLGPL is not a simple weakening of LGPL's copyleft: it also states, for example, that redefinitions of functions of the library (something you can do in Common Lisp) do constitute modification of the library itself, and therefore the new definitions are subject to copyleft.

The LLGPL uses definitions that are appropriate to Common Lisp, but a similar set of definitions and exceptions could be crafted for Hel (although I would prefer to avoid language-specific terminology as much as possible).

In a future post, I intend to have a look at the Mozilla Public License and evaluate if it may be a good idea for Hel libraries.

Comentários / Comments

Modules in Hel (and Chez Scheme)

2019-04-18 22:41 -0300. Tags: comp, prog, pldesign, lisp, scheme, hel, fenius, in-english

Today I implemented a simple mechanism for importing modules in Hel. Basically, you can write:

import foo/bar/baz

and it will look for a file named foo/bar/baz.hel, load it as a module containing the bindings defined in the file, and expose the module as baz to the calling code. So if foo/bar/baz.hel defines a function hello, then after you import foo/bar/baz, you can call the function as baz.hello(). Alternatively, you can import the module with a different name using:

import foo/bar/baz as whatever

and access the bindings like whatever.hello().

That's all there is to it, which means there's a lot of things missing from the system. But I decided it was best to do the simple thing¹ and leave the more complex details of the module system for a later phase.

Oh, one more trick: if the module file is not found, Hel tries to import a Scheme module with the given name. So you can actually say import chezscheme and get all the bindings from the host Scheme. (Except some of the names are inacessible, since there is currently no way to use ? or - in Hel identifiers. We'll see how to fix that in the future.

Modularizing the interpreter

Incidentally, I also started an attempt to split the interpreter (a single 1420-line Scheme file) into modules. I still haven't merged those changes into the master repository, and I'm still not sure if it's a good idea. In principle it'd be great for organization. The problem is that the RnRS/Chez module system is annoying to use.

For instance, you have to list all bindings you want to export in the library declaration. This is especially annoying for records. For example, if you declare a record:

(define-record-type Foo (fields x y))

this will generate a record descriptor Foo, a constructor make-Foo, a type predicate Foo?, and two accessors Foo-x and Foo-y. That's all nice and fun, but you have to export each of the generated identifiers individually if you want to use them in other modules.

Another annoyance is that Chez does not seem to provide a mechanism to run the REPL from the environment of the module. You can switch the interaction environment to the exported bindings of a module, but there does not seem to be a way to switch to the environment within the module, to call non-exported functions, etc. The workaround I found was to split all modules into a library definition file (say, utils.sls), containing just:

(library (utils)
  (export binding1 binding2 binding3 ...)
  (import (chezscheme))
  (include "utils.scm"))

and the module code proper, in a separate file utils.scm. In this way, I can load the module code directly in Geiser or in the REPL, outside the module system. Note also that the the library definition file only imports (chezscheme) (so we can use the include form); all other imports are directly in the .scm file, so the .scm file will load properly by itself.

Even so, it is annoying to reload libraries, because you have to reload the users of each library manually too.

A smaller annoyance is that the code takes longer to compile when split into libraries. This is not a problem if you compile before execution, but makes running the code without a separate compilation step (chezscheme --script hel.scm) slower to start.

Yet another annoyance is that in the R6RS library syntax, all definitions must precede all expressions, so you have to move initialization code to the end of the module. [Addendum: this (mis)feature does not seem to be shared by R7RS library syntax. As much as I've learned to appreciate many good aspects of R6RS, it seems to me that library syntax is just better in R7RS.]

In the end, a middle-ground solution may be to avoid R6RS libraries entirely, and just create a single main file which includes the others, all in the same namespace. It's not as elegant, but it makes development easier. [Addendum: one benefit of the .sls/.scm split is that it's easy to switch to the non-library organization by just including all .scm files directly and ignoring the .sls files.]

Todos and remarks

When you write import foo/bar, where to look for the foo/bar.hel file? Currently the interpreter looks in the current directory, but that's far from ideal. A better option would be to search relative to the file where the import occurs. That would be an improvement, but still somewhat annoying: if I have a project with files foo.hel, dir1/bar.hel and dir2/baz.hel, I want to be able to load either of these files individually in the REPL and for each module to be able to import any other module in the project using the same name. What I really want is a notion of a project root to search from. One possibility would be to have a __project__.hel file (or something similar) at the project root. When looking for imports, the implementation tries to find the __project__.hel file up the directory hierarchy. If it's found, the directory where the file is is the project root. If not, the project root is the directory where the importing file is. This is vaguely similar to Python's __main__.py, except there would be only one project file per project (not per directory, which would destroy the idea of a single project root).

There is still no syntax to import individual bindings from a module, i.e., the equivalent of Python's from mod import foo, bar. Maybe we can just use Python's syntax (except (foo, bar) would have to be in parentheses, due to restrictions of Hel's syntax).

There is also no syntax to import all bindings from a module, i.e., Python's from foo import *; we can't use * because that's an infix operator, and the syntax doesn't and won't special-case individual commands (remember that one of the goals of the syntax is not to have hardcoded keywords). Maybe from foo import all(), and also things like from foo import except(foo, bar). I don't know.

Why foo/bar instead of foo.bar? Because I thought that import foo.bar might give the impression that the bindings are to be accessed as foo.bar.hello() (like Python) instead of just bar.hello() (as Hel does). And why I wanted this semantics? Because I was unsure what foo would be when you import foo.bar: a module containing just bar? What if I import foo later? What if foo contains a binding bar itself? Does the module foo have to exist for me to be able to import foo.bar? To avoid all these questions ("do the simple thing"), I decided it would be simpler to import the module without having to deal with the whole hierarchy, and make it available as just bar; and I thought the syntax with / suggested that better.

_____

¹ "When in doubt, do the simple thing" has been a sort of mantra in this project. This has helped me avoid analysis paralysis and keep making progress, even though I know eventually I will have to go back and change/improve things. (Note though that the mantra is "do the simple thing", not "do the simplest thing".)

2 comentários / comments

Português ou English?

2019-04-15 00:36 -0300. Tags: about, in-english, em-portugues

[This post is also available in English.]

Faz alguns anos que eu comecei a postar algumas coisas em inglês neste blog. Inicialmente, os posts em inglês eram limitados primariamanete a tópicos em que eu julgava que seria mais útil escrever para uma audiência internacional, especialmente os posts sobre Lisp e design de linguagens de programação. De uns tempos para cá, entretanto, os posts em inglês têm sido maioria – em parte porque o tópico principal do blog ultimamente tem sido Lisp e design de linguagens de programação, mas eu tenho escrito uma porção de posts sobre outros tópicos em inglês também.

De uns tempos para cá eu vejo nos logs do blog acessos regulares de outros países (incluindo de clientes de RSS, i.e., leitores regulares). Eu me pergunto: seria o caso de 'oficialmente' passar a postar primariamente em inglês? É uma questão complicada. Por um lado, creio que os meus leitores regulares do Brasil (os que eu conheço, anyway) lêem em inglês sem problemas. Por outro lado, eu posso acabar deixando de atingir leitores em potencial que não saibam inglês mas que se interessariam pelo blog. (Eu também posso ter leitores regulares que eu não conheço e que só lêem em português.)

No final das contas, eu vou continuar publicando em ambas as línguas, com a escolha dependendo do tópico. A dúvida é em que língua publicar quando se tratam de posts pessoais, ou sobre assuntos em que eu não vejo uma clara vantagem de publicar em uma língua ou outra. Uma possibilidade é escrever tudo em ambas as línguas, mas isso produz uma fadiga que eu gostaria de evitar. (Uma vantagem de escrever em português é a quantidade de referências não-traduzíveis que eu posso espalhar nos textos.)

Assim, pergunto a vós, queridos leitouros e leitouras: em que língua preferis que eu escreva? Deixe sua opinião nos comentários, e vamos ver no que isso dá.

[English version follows.]

It's been a few years since I started writing some posts in English in this blog. Initially, the posts in English were limited primarily to topics where I judged it would be more useful to write to an international audience, especially posts about Lisp and programming language design. For a while now, however, most of the recent posts have been in English – in part because the main topic of of the blog as of late has been Lisp and programming language design, but I have been writing about other topics in English too.

Lately I have been observing regular accesses from other countries in the blog logs (including from RSS clients, i.e., regular readers). I wonder: might it be the case to start 'officially' posting primarily in English? It's a complicated matter. On the one hand, I believe my regular Brazilian readers (the ones I know anyway) can read English too. On the other hand, I may end up leaving out some potential readers who don't know English but would find the blog interesting. (I may also have regular readers I don't know who can only read Portuguese, but that does not seem likely.)

In the end, I will keep publishing in both languages, choosing language according to the topic. The question is which language to use in personal posts, or posts on subjects where I don't see a clear advantage of publishing in one language or the other. One possibility is to write everything in both languages, but that's a bit more work than I would like to have.

So I ask of ye, dear readers: which language do you prefer me to write in? Leave your opinion in the comments, and let's see how it goes.

3 comentários / comments

Loops and blocks in Hel

2019-04-11 17:11 -0300. Tags: comp, prog, pldesign, hel, fenius, in-english

Hel got a preliminary version of its main looping and non-local control flow primitives today: do, redo and return. They have characteristics similar to Common Lisp's block/return-from, Scheme's named let, and Clojure's loop/recur (and, one might say, Java's labeled blocks, continues and breaks), though they are not exactly the same as any of these. In this post, I will describe how they work.

`do`

do creates a labeled block with parameters and initial values. It has the syntax:

do name(var1=value1, var2=value2, ...) {
    body
}

What this does is to evaluate body in an environment containing the specified variables with the given values. The variable declaration section is like a function parameter declaration where all parameters are given default values. For example:

do block(x=1, y=2) {
    x+y
}

will return 3.

Within the block, name is bound to a tag for the block. This tag can be used with the redo and return commands.

`redo`

redo can be used to repeat the named block, with new values for the block variables. For example:

# Print all integers from 1 to `limit`.
let count_until(limit) = {
    do block(i=1) {             # First iteration will run with `i` = 1.
        if i <= limit {         # If we have not reached the limit yet...
            print(i)            # Print the current value of `i`...
            redo block(i=i+1)   # And repeat the block, with a new value of `i`.
        }
    }
}

This is analogous to Clojure's recur, except it does not have to be in tail position, and you can specify the label of the block you want to repeat (so you can have nested blocks and escape to outermost ones).

This is also similar to Scheme's named let, except that the new execution of the block replaces the current one, rather than behaving like a regular function call.

The names of the parameters in the redo call are optional; we could have written redo block(i+1) instead of redo block(i=i+1). This is analogous to the function call syntax.

`return`

return can be used to return a value immediately from a block. For example, suppose we have a foreach function which takes a list and a function, and applies the function to each element of the list in order:

let foreach(list, f) = {
    do block(list=list) {                # Start iterating with the full list
        if list != [] {                  # If the list is not empty yet...
            f(list.first)                # Apply the function to the first element...
            redo block(list=list.rest)   # And repeat the block for the remaining ones.
        }
    }
}

Now we want to write a function to test if a given element is in a list. We want to reuse foreach to do the iteration, but we want to stop the iteration (and get out of foreach immediately) when we find the element in the list. We can do this with return:

let is_member(searched, list) = {
    do out() {
        # Call `foreach` with an anonymous function, which will be called
        # for each element of the list.
        foreach(list, fn (item) {
            if item == searched {        # If the current element is the one we are searching...
                return true from out     # Return `true` immediately from the `out` block.
            }
        })
        # If we got here, it's because the element was not found.
        return false from out
    }
}

These constructs can be compared to Java's labeled blocks, continues and breaks. However, Hel blocks take parameters, which must be specified when redoing them (the equivalent of continue); and Hel blocks return a value, which must be specified when returning from them (the equivalent of break).

To do / open questions

Common Lisp has the notion of a default block (which is the block labelled nil). Some constructs, like return, return from the default block, so you can avoid naming the block if you are only using one. It would be nice to have something similar in Hel.

Currently the parameter/argument binding logic for blocks is the same one used for functions. This means that if one of the block's arguments is omitted from the redo call, it will acquire the initial value specified in the beginning of the block! This is most likely not what you want. Alternative behaviours would be to forbid omitting block arguments, or reusing the value in the current iteration rather than the initial value.

Perhaps instead of using special forms for redo and return, these could be methods of the tag object, so we would write, for example, block.redo(i=i+1) instead of redo block(i=i+1), and block.return(42) rather than return 42 from block. I like the special form better, especially for redo because the block name stays together with the arguments just like in the block declaration. It also allows the possibility of omitting the block name if we get default blocks in the future.

Comentários / Comments

Object model and dot syntax in Hel

2019-04-01 20:43 -0300. Tags: comp, prog, pldesign, hel, fenius, in-english

[Despite the date, this is not an April Fool's joke. This is mostly a mind dump for future reference.]

I have written about noun-centric vs. verb-centric OO before (in Portuguese), but the question surfaces now in the context of Hel's design.

Most mainstream OO programming languages are noun-centric: methods (verbs) belong to objects (nouns). When calling x.foo(y), the method to be called is determined by the (dynamic) class of x; the call can be conceptualized as sending the message foo(y) to the object x.

By contrast, Common Lisp and other languages influenced by the Common Lisp Object System (CLOS) are verb-centric: methods (verbs) are entities in their own right, which can be applied to objects (nouns). Methods of the same name are grouped under a generic function. The method calling syntax is typically the same as the regular function call syntax: (foo x y). The method invoked by a call to a generic function is determined by the (dynamic) classes of all arguments, not just x. New methods can be defined at any time, since they are independent from the class. The class definition, on the other hand, contains just the fields and the superclasses (and metaclass, and those sorts of thing), but no methods.

In Dylan, x.foo(y) is syntactic sugar for foo(x, y). This way you can have both the familiar method call notation and the verb-centric nature of CLOS.

Now, everything in language design is tradeoffs, and here we have some.

Namespacing

One of the main differences between the noun-centric and verb-centric models is in how they define namespaces for methods.

Suppose we define a File class in a module, with a method size() returning the file's size in bytes. In another module, we define a Circle class, with a method size() returning the circle's size in pixels. (Okay, we could have called the circle method radius or diameter(), but let's suppose the module was written by someone else and we don't control the name.)

In noun-centric OO, the class creates a namespace for its methods. someFile.size() and someCircle.size() are entirely different methods, because someFile and someCircle belong to different classes. By contrast, in verb-centric OO, these calls would be syntactic sugar for size(someFile) and size(someCircle); this would only work if there was a single generic function size encompassing both methods, which does not make much sense in this example (since size means something completely different in each class).

Common Lisp solves this problem by having the names belong to packages: variable names are symbols, and each symbol belongs to a package. In this case, each module/package would have its own symbol size, and there would be two distinct generic functions, both named size, but each by a distinct size. Due to the way the package system works in Common Lisp, you would not be able to import both at the same time: you would have to use a fully qualified symbol name to refer to at least one of them.

Guile does something different: if you import two generic functions with the same name into a module, they are merged into a new generic function combining the methods of both. In this case, even though each module defines its own generic function size, a module importing both would see a single generic function size which would accept both files and circles. May seem a bit weird from a conceptual standpoint, but it works nicely. Without Guile's trickery, the Schemely solution would be to rename one (or both) of the functions when importing (the equivalent of Python's from file import size as file_size). I don't know how Dylan handles this situation.

The flip side is that noun-centric OO provides a single namespace for all of a class' methods. This means that you have to be careful about overriding methods in subclasses. Suppose someone defines a class A and I create a subclass B inheriting from A and define a method foo on it. In the future, the author of class A decides to add a method foo to A. Now my class B inadvertently overrides the foo method of the superclass, just because it happens to have the same name as A's new foo method. Some noun-centric OO languages like C# require the explicit use of an override keyword on overriding methods to avoid this kind of accidental override. By contrast, in the CLOS world, my definition of the generic function foo would be unrelated to the new foo created by class A's author, so no conflict would ensue. (A package import conflict might happen, though. And if all of the symbols in the package where A is defined are imported into the package where B is defined, you might end up using the same symbol for both foos without even realizing. Yeah, packages are fun like that. But at least it's possible to have two different, non-conflicting foo methods.)

Another way in which noun-centric OO provides a namespace for methods is by separating method names from regular variables. This means I can write let size = file.size() without losing access to the size method. In Common Lisp this problem does not arise because functions/methods live in a different namespace from regular variables anyway, but I'm not willing to go that route. In Scheme, the local size would shadow the global method. (Again, I don't know how Dylan handles this.)

Yet another consequence of the namespacing thing is that, in the noun-centric model, you don't have to import a class' methods individually: if you have access to the class, you have access to all of its (public) methods. In the verb-centric model, generic functions are independent entities, and would have to be imported individually (or else you import all of them at once by importing the whole module (the equivalent of Python's from foo import *), thus polluting your module's namespace).

A possible counter-argument against the noun-centric model is that importing all of a class' methods is kind of an illusion: there are typically functions taking objects of a given class as arguments which are not methods of the class, and those would have to be imported manually anyway. In practice, though, the most common operations on a given object will be methods of the object, so this argument may not be very strong.

The last point brings an advantage of the verb-centric model: you can 'add' methods to a class without modifying its source, since the methods are independent entities that can be defined anywhere, just like regular functions. Some languages, such as Ruby, have "open classes" to which methods can be added at any time. One problem with this is that no matter where the method definitions are for a given class, they all share the same method namespace, so conflicts may happen more often. The other problem is that the set of methods available in a class depends on which modules have been loaded. This is also the case in the verb-centric model, but at least it's completely explicit: you only have access to a method if you import it. In the Ruby model, you see every method in a class regardless of where it was defined, which may create implicit module dependencies (i.e., I use a method defined elsewhere, but I don't import the defining module explicitly, it just happens to be available by the time my code runs).

If I understand correctly, Haskell's typeclasses offer an alternative model: you can instantiate a typeclass (i.e., implement an interface) anywhere, and even implement the same interface multiple times in different ways, but you only see the implementations if you import the implementing module. Transplanting this model to class definition, you might be able to add methods to a class anywhere, but would only see the new methods if you import the defining module. I'm not sure this would work; it seems plausible in a static world, but not really when you can obtain an object from anywhere and call a method on it without knowing its type (or worse, via reflection).

Conclusion

I intend to implement a rudimentary object model for Hel soon. I'm leaning towards plain old noun-centric OO, if only because it's easier to reach a class' methods (you don't have to import each method individually), and because it limits conflicts between local variables and method names. Let's see how it goes.

Comentários / Comments

Named parameters in Hel

2019-03-28 21:21 -0300. Tags: comp, prog, pldesign, hel, fenius, in-english

Hel acquired Python-like named parameters yesterday. This means that if you declare a function like:

let f(x, y) = x+y

you can call it as f(2, 3), or f(x=2, y=3), or f(2, y=3). It also got (also Python-like) rest parameters, i.e., you can declare a parameter like *args to collect all positional (non-named) arguments not captured by a previous parameter, and **kwargs to capture all named arguments not captured by a previous parameter.

(Unlike Python, the resulting kwargs variable is a list of (name, value) tuples, but that's because Hel does not have dictionaries yet. Also, I still have to implement support for *x and **x syntax at the call site, rather than just at function declaration site.)

But I wonder if this is the best approach to named parameters in Hel:

By allowing any parameter to be passed by name, all parameter names become part of the function's API, whether the programmer wants it or not. You cannot change the name of a function parameter without wondering if some other code depends on it.
This also means that the parameter names become part of the type of the function: fn (x) x+1 and fn (y) y+1 don't have the same type, because one of them can be called like f(x=5) and the other can't.
Currently Hel is interpreted, but I plan to make it compilable at some point. Having multiple ways to call the same function, when the function is not necessarily in the same module / known at compile time and might have been passed around as an argument, may make function calling conventions more complicated and more expensive.

So, are there alternatives for handling named parameters better suited to Hel's goals out there?

What other languages do?

Plenty of languages get by without named parameters at all, but that's not really what I'm after.

Common Lisp, Dylan, Scheme

In Common Lisp, functions have positional and keyword (named) parameters, but any given parameter is either positional or keyword: if you declare a function like (defun f (x y &key z) ...), you can call it like (f 1 2 :z 3), but not like (f :x 1 :y 2 :z 3). This means the function controls whether the name of a parameter is exposed or not (and actually the keyword exposed by the function need not be the same as the variable name used internally to store its value).

This makes the calling convention simpler. Conceptually, a function receives a list of arguments; keywords like :x are just values, and keyword arguments are just extra keyword value sequences in the list. An argument list can be assembled programmatically and passed to a function via apply. The call site (from an implementation point of view) does not need to know the function signature beforehand to call it. (Of course, performance is usually better when it does know the signature beforehand.)

One downside of this is that because keywords are plain values, it is easy to pass one as a positional argument by mistake, especially if the function supports both optional and keyword arguments. For example, if a function is declared (defun f (a &optional b c &key d) ...), calling (f 1 :c 2) will actually pass :c as the value for b, and 2 as the value for c. For this reason, it is considered good practice^{[by whom?]} not to use both optional and keyword arguments in the same function.

The other downside is that sometimes we do want to be able to pass the same arguments either with or without names. I feel this is especially the case with constructors, where I want to be able to call either Person(name="Hildur", age=23) or Person("Hildur", 23). I don't know. Constructors also have the characteristic that the parameter names are usually part of the interface anyway, because they are the same as the names of the object accessors.

Dylan seems to use the same scheme (heh) as Common Lisp.

Standard Scheme only supports positional parameters and a mechanism to collect rest arguments (like Python's *args) in a list. The various Scheme implementations tend to support variations of Common Lisp style argument lists.

Smalltalk, Objective-C, Swift

In Smalltalk and Objective-C, the parameter names are part of the name of a method. Using an example from Wikipedia, when you write:

'hello world' indexOf: $o startingAt: 6

the method is actually called indexOf:startingAt:, with the arguments interspersed with the name. This means all arguments are named, and it also means they cannot be reordered or omitted (though you can define a different method with different arguments, for example a separate indexOf: method, thus simulating optional arguments).

Swift is somewhat similar: by default, all parameters have a label, which must be used when calling the function; however, in Swift you can specify _ as the label to omit it. Arguments also have a fixed order. The parameter labels appear to be considered part of the function name too, so you can have different declarations of the same function name with different parameter labels. Argument names are not part of the type. I'm not sure how you specify which of multiple functions with different argument labels you want to refer to when using a function as a value.

Elixir, Ruby 1.x, Clojure

In Elixir, passing the last arguments of a function call in the form key: value, key: value, ... is syntactic sugar for passing a list [key: value, key: value, ...], which is itself syntactic sugar for a list of tuples [{:key, value}, {:key, value}, ...]. By the magic of pattern matching, if you do the same thing in the function parameter declaration, it will turn into a pattern that will match the list of tuples passed in as argument. But this also means that the list must be in the same order in the declaration and the call, and also means that the keywords are not optional. Alternatively, one can receive the whole list and parse it manually (or semi-manually with the help of a dictionary).

Ruby pre-2.0 seems to work similarly, except you get a dictionary instead of a list of pairs. Ruby 2.0 and after has actual keyword parameters. Unlike Python, a parameter is either positional or keyword; it cannot be called both ways.

Clojure's approach is a mix of Common Lisp and Elixir: to support keyword parameters, you declare a rest parameter which will collect the sequence of :keyword value items, but instead of specifying a variable as the parameter to receive the list, you can specify a dictionary pattern to destructure the list. The syntax is not exactly awesome, especially when declaring default values for the keys, but it works.

Ada

Ada is like Python in allowing any parameter to be passed by name or by position, as the caller desires. The names don't seem to be part of the type, so I don't know how the language handles named arguments when using a function as a value.

Conclusion

There is no real conclusion here. I will keep the Python-style calls for now, but I have to think more about this.

Computers, languages, and computer languages. Às vezes em Português, sometimes in English.

2019-05-13 16:41 -0300. Tags: hel, fenius, in-english

2019-05-07 21:49 -0300. Tags: hel, fenius, in-english

2019-05-03 22:13 -0300. Tags: lang, german, in-english

Genders, cases and numbers

The definite article

The indefinite article (ein)

Adjectives

Strong declension

Weak declension

Mixed declension

2019-04-30 15:30 -0300. Tags: comp, copyright, hel, fenius, in-english

File-level copyleft

Executables need not be under the MPL

(L)GPL compatibility

Patents and trademarks

Conclusions

Addendum

Addendum [2]

2019-04-29 14:38 -0300. Tags: comp, copyright, hel, fenius, in-english

Digression: (L)GPL 2 vs. 3

The LGPLv3

Possible solutions

2019-04-18 22:41 -0300. Tags: comp, prog, pldesign, lisp, scheme, hel, fenius, in-english

Modularizing the interpreter

Todos and remarks

2019-04-15 00:36 -0300. Tags: about, in-english, em-portugues

2019-04-11 17:11 -0300. Tags: comp, prog, pldesign, hel, fenius, in-english

do

redo

return

To do / open questions

2019-04-01 20:43 -0300. Tags: comp, prog, pldesign, hel, fenius, in-english

Namespacing

Conclusion

2019-03-28 21:21 -0300. Tags: comp, prog, pldesign, hel, fenius, in-english

What other languages do?

Common Lisp, Dylan, Scheme

Smalltalk, Objective-C, Swift

Elixir, Ruby 1.x, Clojure

Ada

Conclusion

Main menu

Recent posts

Recent comments

Tags

Elsewhere

Quod vide

`do`

`redo`

`return`