Impressions on R7RS-small: libraries, records, exceptions

2017-10-03 15:10 -0300. Tags: comp, prog, lisp, scheme, in-english

In the last post, I wrote a little bit about the historical context in which R7RS came about. In this post, I will comment on my impressions about specific features of the R7RS-small language.

First of all, I'd like to note that if you are going to read the R7RS-small report, you should also read the unofficial errata. As I read the document I spotted a few other errors not mentioned in the errata, but unfortunately I did not keep notes as I was reading. I'm not sure why a, um, revised version of the report is not published with the known errors corrected (a Revised^7.01 Report?), but alas, it isn't, so that's something to keep in mind.

In this post, I will talk mainly about the differences between R5RS and R7RS-small, since R7RS-small is more of an incremental extension of R5RS, rather than R6RS. This is not intended as a complete or exhaustive description of each feature; for that, consult the report.

Libraries/modules

R7RS introduced the concept of libraries (what some other systems call modules; I suppose R6RS and R7RS chose the name "library" to avoid conflict with the concept of modules in existing implementations). Library names are lists of symbols and non-negative integers, such as (scheme base), or (srfi 69). A library has a set of imports, a set of exported symbols, and code that constitutes the library. A library definition looks like this:

(define-library (foo bar)
  (import (scheme base)
          (scheme write))
  (export hello)
  (begin
    (define (hello)
      (display "Hello, world!\n"))))

Instead of putting the library code directly in the define-library form (inside a begin clause), it is also possible to include code from a different file with the (include "filename") directive (or include-ci for parsing the file case-insensitively; standard Scheme was case-insensitive until R5RS (inclusive)). This makes it easier to package R5RS code as an R7RS library, by including the older code within a library declaration. It's also a way to avoid writing all library code with two levels of indentation.

Imports can be specified or modified in a number of ways:

One can import a whole library: (import (scheme base))
Or just some identifiers from it: (import (only (some lib) someid1 someid2))
Or all but some identifiers: (import (except (some lib) unwanted-id1 unwanted-id2))
Or rename some identifiers: (import (rename (some lib) (original1 new1) (original2 new2)))
Or add a prefix to all identifiers: (import (prefix (some lib) somelib/))

These forms can be combined (e.g., you can import only some identifiers and add a prefix to them).

Exports just list all identifiers to be exported, but you can also write (rename internal-name exported-name) to export identifiers with a different name than they have within the library body.

Unlike R6RS, all library code directly embedded in the define-library form must be written within a begin clause. At first I found this kinda weird, but it has an interesting consequence: the library definition sublanguage does not have to know anything about the rest of the programming language. There is only a limited number of kinds of subforms that can appear within define-library, and parsing it does not require knowing about the values of any identifiers. This means that define-library can be more easily processed as data. One can imagine useful tools which read library definitions from files and, say, compute the dependencies of a program, among other possibilities.

In fact, R7RS does not classify define-library or its subforms as syntax forms, i.e., they are something apart from Scheme expressions. This also resolves a problem that would occur if define-library were an expression. The report specifies that the initial environment of a program is empty. So, how would I use import declarations before importing the library where import declaration syntax is defined? Of course one way around this would be to make (scheme base) available by default rather than start with the empty environment. But the solution adopted by R7RS means we don't have to import (scheme base) if we don't want to (for example, if we want to import (scheme r5rs) instead to package R5RS code as an R7RS library). (The report does define for convenience some procedures and syntax forms with the same name as corresponding library subforms, e.g., include.)

R7RS also standardized cond-expand (extended from SRFI 0). cond-expand is a mechanism somewhat like ifdefs in C for conditionally including code depending on whether the implementation defines specific feature symbols, or whether some library is available. This makes it possible to provide different implementations of a procedure (or a whole library) depending on the current implementation. One way we could use it is to write shims, or compatibility layer libraries to provide an uniform interface for features that are implemented differently by various implementations. For example, in Common Lisp, many implementations support threads, but they provide different interfaces. Bordeaux Threads is a library which provides a uniform API and maps those to the corresponding functions in each implementation it supports. Now we can do similar things in R7RS for those features that are supported everywhere but in incompatible ways (e.g., for networking).

Libraries and cond-expand are by far the most important addition in R7RS relative to R5RS. Even if we did not have any of the other features, we could package them as libraries and provide implementation-specific code for them via cond-expand.

Missing things

The report does not specify a mapping between library names and file names. I realize that it would be kinda hard to make everyone agree on this, but it is somewhat of a hurdle in distributing programs organized into libraries. Some implementations, such as Chibi, will look up a library named (foo bar) in a file named foo/bar.sld (where .sld stands for Scheme library definition), whereas CHICKEN will look it up at foo.bar.*. There is a project of a portable package manager for R7RS called Snow, which I think takes care of mapping packaged library files to implementation-specific names, but I haven't taken the time to check it out yet.

R7RS takes the excellent step of specifying that library names whose first component is the symbol srfi are reserved for implementing SRFIs, but then fails to specify how to name a specific SRFI. In practice, the few implementations I checked all agree on using (srfi n) as the name of the library implementing SRFI number n (i.e., I can write (import (srfi 69)) and remain reasonably portable), so this may turn out not to be a problem in practice.

Records

R7RS incorporates the define-record-type form from SRFI 9, for defining new record/struct types. It is a somewhat verbose form, which requires specifying the record constructor arguments and the names for each field accessor/getter and (optional) mutator/setter, but it's basically the least common denominator that any implementation which has some form of records can easily support. It looks like this:

(define-record-type <person> (make-person name age) person?
  (name person-name person-name-set!)
  (age person-age person-age-set!))

Here:

<person> is the record type name. R7RS-small does not specify any particular use for this name, but R7RS-large or some other SRFI may specify uses for it (e.g., to allow querying information about the record type).
(make-person name age) defines the constructor procedure and order of arguments for this record.
person? defines the name of a predicate which will answer true when given records created with make-person, and false for other Scheme values.
The clause (name person-name person-name-set!) declares a field called name, associated with a getter procedure person-name to extract the name field from a given person record, and a setter procedure person-name-set! for mutating the name field in a person record. The next clause does the same for the field age.

R5RS did not have any way to define new types disjunct from existing types. R6RS provides a more complex records facility, including both a syntactic and a procedural layer allowing reflection, but I don't know it well enough to comment. (I have read some comments on problems in the interaction between syntactically- and procedually-defined records, but I don't know the nature of the problems or how serious they are.)

Missing things

Reflection would be nice, or at least a way to convert a record into a vector or something like this (though I realize this might displease some people), but we could make libraries for that. Another thing that would be nice is for records to have a standard printed representation which could be printed out and read back again, but I realize there is a slightly complicated interaction here with the module system (the printed representation should be tagged with the record type in a way that will work regardless of which module it is read back in), and this might not even be desirable for implementation-internal types which happen to be defined in terms of define-record-type.

Exceptions

R7RS incorporates the exception handling mechanisms from R6RS, but not the condition types. Any value can be raised in an exception. The raise procedure raises a value as an exception object, or condition, to be caught by an exception handler. The guard form can be used to install an exception handler to be active during the evaluation of its body. The guard form names a variable to hold the captured condition, a sequence of cond-like clauses to determine what action to take given the condition, and a body to be executed. It looks like this:

(guard (err
        ((file-error? err) (display "Error opening file!\n"))
        ((read-error? err) (display "Error reading file!\n"))
        (else (display "Some other error\n")))
  (call-with-input-file "/etc/passwd"
    (lambda (file)
      (read-line file))))

If an else clause is not provided and no other clause of the guard form matches, the exception propagates up the stack until some handler catches it. If an exception is raised and caught by a guard clause, the value returned by the guard form is whatever is returned by the body of that clause.

Beside raise, R7RS also defines a procedure (error message irritants...), which raises an error object (satisfying the error-object? predicate) encapsulating an error message and a sequence of objects somehow related to the error (called "irritants"). It also defines the procedures error-object-mesage and error-object-irritants to extract the components of the error object.

R7RS does not define specific object types to represent errors; it only says that objects satisfying a given predicate must be raised in some circumstances. An implementation might define a record type for that, or just use lists where the first element represents the error type, or whatever is appropriate for that implementation.

At first I did not think exceptions were that important in the grand scheme of things (heh), since you can implement them on the top of continuations. (And indeed, exceptions in R6RS are in a separate library rather than the base language, although this does not mean much in R6RS because, if I understand correctly, all libraries are mandatory for R6RS implementations.) However, I then realized that until R5RS (inclusive), there was no standard way to signal an error in Scheme code, and perhaps more importantly, no standard way of catching errors. If portable libraries are to become more prominent, we will need a standard way of signalling and catching errors across code from different projects, so exceptions are a good add-on.

Beside raise, R7RS also defines raise-continuable, which raises an exception but, if the ~~guard~~ exception handler returns, it returns to the point where the exception was raised rather than exiting from the ~~guard~~ handler form. [Correction: this is how raise-continuable interacts with with-exception-handler, not guard. I'm still figuring how guard acts with respect to continuable exceptions.] On the top of this, something like Common Lisp's restarts can be implemented.

One side effect of having guard in the language is that now you can do control flow escapes without using call-with-current-continuation (call/cc for short). In theory this could be more efficient than capturing the fully general continuation just to escape from it once; in practice, some implementations may rely on call/cc to implement guard (the example implementation provided in the report does), so this performance advantage may not realize. But just having a construct to express a one-shot escape is already great, because it allows expressing this intent in the program, and potentially allows implementations to emit more efficient code when a full continuation is not required.

I was wondering if one could implement unwind-protect (a.k.a. try/finally) in terms of guard, and so avoid dynamic-wind for error-handling situations. Alas, I don't think this is possible in general, because the presence of raise-continuable means an error handler may execute even though control may still return to the guard body. I wish to write more about continuations in a future post.

Conclusion

Libraries (plus cond-expand), records and exceptions are the most important additions in R7RS-small relative to R5RS, and they are all a great step towards enabling code reuse and portability across implementations, while not constraining Scheme implementors unnecessarily. I am particularly happy about libraries and cond-expand, because this means we can start writing compatibility libraries to bridge differences between implementations without having to rely on a standardization process.

I have some other comments to make on I/O, bytevectors, and other parts of the standard library, but they can wait for a future post.

Comentários / Comments (1)

Domn Toño, 2017-10-18 15:04:04 -0200 #

Talvez fosse mais fácil dividir para conquistar, de tal maneira que a comunidade pudesse se tornar inteiramente individual. Após essa separação em diversas vertentes de Scheme, uma por pessoa, poder-se-ia proceder de uma dentre estas duas maneiras:
1) Em espírito de comunidade e fraternidade, os indivíduos poderiam se agrupar gradualmente, selecionando os melhores aspectos de suas respectivas variações de Scheme, até que surgisse um novo padrão mundial, que aqui chamaremos de R&InfinityRS.
2) Em espírito de humanidade tal como realmente é, cada um iria eliminando os outros até que apenas o seu padrão restasse. O chamado ciclo Highlander, aka "Só pode haver um".

Obrigado pela atenção!

Elmord's Magic Valley

Computers, languages, and computer languages. Às vezes em Português, sometimes in English.