Elmord's Magic Valley

Posts com a tag: `prog`

Why (not) boxes?

2018-08-28 23:28 -0300. Tags: comp, prog, pldesign, lisp, in-english

In the previous post I discussed an idea for dealing with mutable data in a Lisp-like programming language by using mutable boxes and immutable everything-else, with a bunch of optimizations. One of the usual suspects asked me what was the advantage of this scheme over just declaring things const as one would in a language like C/C++. At first I did not have an answer ready. This is one of those situations where you are so stuck in your own perspective that some questions don't even occur to you.¹ The immediate answer was that I was thinking in the context of a dynamically-typed language, so an immutability declaration like const² was out of the picture. But there is more to it.

If I were to give a really complete answer, I would have to begin answering why I want dynamic rather than static typing. I started this post originally by trying to explain exactly that, but there is way more to say about it than I have the energy to do right now. For now let's just take for granted that I'm designing this framework in the context of a dynamically-typed languaged.

But I could still have a dynamically-typed equivalent of the const declaration: just shift the constness to the dynamic type of the object. So vectors and other composite objects would have a flag indicating whether they are mutable or not. I discussed this possibility at the end of the previous post, but I also commented I didn't find that solution as satisfying. But why not?

Semantic clarity

One thing I like about the mutability-as-boxes model is that it seems to makes it easier to think "equationally" about mutability: instead of mutability being an inherent property of vectors (and other data structures), mutability is an 'embellishment' which can be added to any data structure (by putting it into a box), and it seems more or less obvious (to me, anyway) what will be the expected behavior when adding or removing mutability from something (or rather, when adding or removing something from mutability). For example, if vectors were inherently mutable or immutable, then I have to know what operations exist to convert one type into the other, and what happens to the original (if I make an immutable vector out of the mutable one, will the new vector reflect further changes in the original (like a C const reference), or is it an independent, never-changing copy?). Of course, when you learned the programming language you would learn about those details and be done with it, but the boxes model seems to suggest the answers by itself: if I extract a vector (immutable, like all vectors) from a mutable box, I would expect that further changes to the box contents won't affect the vector I just extracted (because changing the box contents means replacing one vector with another, not changing the vector itself); and if I have an (immutable) vector v around and I put it inside a box, I would expect that further changes to the box contents won't affect my original v (for the same reason: if I change the box contents, I'm replacing v with a new vector, so it's not v anymore). In fact, in the previous post I have mostly wondered about how to implement the model efficiently, rather than what the correct behavior of each operation should be, because that part did not seem to raise any questions.

There is a flip side to this: although it is easier (to me, anyway) to think about the semantics of the mutability operations, the optimizations required to make it work well make it harder to think about the performance of the written code. That's the sufficiently smart compiler problem: a sufficiently smart compiler (or runtime) can turn something that would in principle be expensive into something fast, but then you change a small thing in your code in a way that the optimization cannot handle, and suddenly the performance of your program drastically changes. You end up having to know which cases the implementation can optimize, which makes up for the semantic simplicity. Unless you can make sure the optimization will handle all 'reasonable' cases (for varying values of 'reasonable'), this can be a problem.

Equality

Object equality is a more complicated concept than one might expect. There are multiple notions of equality around – some languages have multiple operators for different kinds of equality (for example, == and === in JavaScript, or eq?, eqv? and equal? in Scheme). One type of equality that's given particular prominence in Scheme is the idea of object equivalence, embodied in the eqv? predicate: two objects are equivalent iff no operation (other than the equality predicates themselves) can tell them apart. Mutability is particularly important for object equivalence: two mutable objects (say, two vectors [1 2 3] and [1 2 3]), unless they are one and the same object in memory, are never equivalent, even if they have the same contents, because you can tell them apart by modifying one and seeing if the other changes as well (i.e., they might cease from having the same contents in the future). On the other hand, two immutable objects of the same type and with the same contents are equivalent, because there is in principle no operation that can tell them apart. (Of course the implementation might provide a function to return the address of each object in memory, which would allow us to tell the objects apart. But let's not concern ourselves with that.) Another notion of equality is that of equality of contents, embodied in the equal? predicate: two objects are equal if they are of the same type and have the same contents, even if they are mutable.

When you have a lookup data structure such as a dictionary, you have to decide which kind of equality you will use to compare the keys in the data structure with the key being looked up. Scheme hash table implementations typically require one to specify the equality operator explicitly, because strings are mutable, so you want equal? if your keys are strings, but in other cases you may want to distinguish objects that are not equivalent in the above sense, so you want eqv?.

But if you make strings and vectors immutable, you can compare them with eqv?, and the cases where you want to actually use equal? for hash table lookup mostly go away. And you generally don't want mutable keys in your hash tables anyway (because if you mutate the object that was the original key, typically your hash table stops working because now the key changed but is still hashed under the old key's hash); we tolerate that in Scheme only because strings (and lists, and vectors) are mutable and we want to be able to use them as keys. So if mutability is isolated by boxes, now we can make hash table lookup use object equivalence (eqv?) by default and not worry about explicitly choosing the right predicate for hash table lookup. (Having a sensible default predicate for hash table lookup is important, among other cases, if you want to have literal syntax for hash tables, i.e., if you want to be able to write a literal hash table like {"foo": 1, "bar": 2} in your code without having to say "hey, by the way, the keys are compared by equal? in this case".)

You can still use boxes as keys in a hash table. But since boxes are mutable, a box is only eqv? to the very same object in memory, so you have to use the same box object as the key when you store a value in the table and when you look the value up. This is actually useful if you want to store information about the box itself rather than the contents. But what if I want to look up based on the box contents? Well, then you unbox the contents and look it up! Which expresses intent far better, if you ask me. (This is not entirely equivalent to an equal?-based hash table lookup because you may have boxes inside boxes which would all have to be unboxed to achieve the same effect. Not that this is a very common use case for hash table keys.)

Could we not do the same thing with the mutability flag model? In that model, eqv? would check the mutability flag; objects with the mutability flag on would only be equivalent if they were one and the same, and objects with the mutability flag off would be compared for contents. It would work, but would not be as pretty, if you ask me. However, as long as mutability is easily visible (for example, mutable objects would be printed differently, say like ~[1 2 3] if the mutability flag is on), it could work fine.

Mutable boxes are useful on their own

In Scheme, variables are mutable: you can use (set! var value) on them to change their values. The problem is, variables are not first-class entities in Scheme, so you cannot pass them around directly. So if you want to share mutable state across functions, you have to put the variable in some place where all the interested functions can see it; I remember once having moved subfunctions into another function just so they could all share the same mutable variable. Alternatively, you can create a mutable data structure and pass it around to the relevant functions – and the box is the simplest mutable data structure you can use, if all you want is to share one single mutable cell around. So mutable boxes are useful even if you don't intend to make them the one single source of mutation in the language. And since they are already there, why not just go ahead and do just that? (I am aware that "why not?" is not exactly the most compelling argument out there.)

Another case: cons-cell based lists are somewhat annoying to use with mutation. Suppose you have a list in a variable x, and you pass it around and it ends up in a variable y in another part of the program. If you append things to the tail of x by mutating the tail, both x and y will see the new items, because the tail is reachable from both x and y. But if you append things to the front of x, y won't see the new items in the front, because the new elements are not reachable from the old list tail. If you put the whole mutable list inside a box and passed the box around, both x and y would have the same view of the mutable object. And if you took the list out of the box and put it on another variable z, it would become immutable, so either you see the same changes to the list as everyone else, or you isolate yourself from all subsequent mutations, but it will not happen that you will see some changes to the (tail of the) list and not others (to the front).

Conclusion

I hope I have been able to show why I find the mutability-as-boxes model appealing. I'm not saying it does not have problems (on the contrary, I have already said it has problems), I'm just trying to show what is the point of the whole thing.

_____

1 This is kinda disturbing when you think about it. How many other questions am I not asking?

2 Well, const is not really about the immutability of the data, it's more about the permission to modify a piece of data from a given reference. That is, if a function is declared as taking a const char * argument, that means that the function is not supposed to modify the data pointed to, but it does not mean that the region will not be changed through other references. In other words, it's about requiring something from the user of the reference, but not about providing a guarantee to the user of the reference. A true immutability declaration would both forbid the user from modifying the data and ensure to them that the data will not change during use. Immutability in a language like Rust works like this (except immutable is the default, and mutability is explicitly declared).

Comentários / Comments

An approach to mutation

2018-08-01 22:19 -0300. Tags: comp, prog, lisp, pldesign, in-english

I've been around lately with an idea for handling mutation in a new Lisp-like programming language. Most of these ideas are probably not new – in fact, while doing a little research I've found out about Clojure's transients, which embody some of the same ideas – and the parts that are possibly new are not necessary good. But I want to write this down for future reference, so there we go.

DISCLAIMER: This is one of those programming language design posts exploring a bunch of ideas and reaching no conclusion. Read at your own peril.

Some context

The problem with mutation is when the mutable data is shared with other parts of the program – especially when you don't know what parts of the program share the same data. For example, suppose you call a method blog_post.get_tags(), and it returns you a list ["comp", "prog", "lisp"] – can you mutate this list? For example, if I were to sort it, or remove elements from it, can I do it in-place, or I would be mutating a list used internally by the blog_post object and thus inadvertently affecting other parts of the program? Without looking at the method's source code, we don't know. If we wanted to be sure not to break anything, we would have to make a copy of the list and change the copy instead.

What if I am the person writing the get_tags() method? Should I always return a new copy of the list, wasting some memory and cycles but ensuring that whoever calls my function won't be able to affect the internal fields of the blog_post? Or should I always return the same list object, thus avoiding a new allocation but relying on the caller to do the right thing?

Now consider the strings inside that list. If I were to convert them to upper-case, should I do it in-place, or copy them first? In a language like C, this is the same problem as before: you have to know whether get_tags() gives you a copy of the original strings (which you can freely modify), or the internal strings used by blog_post (which you should not modify). But in a language like Java or Python, this problem does not come up: since strings are immutable in those languages, the only way to 'change' them is by making a new string, so modifying them in-place is not an option. On the other hand, the writer of the get_tags() method can now happily return the internal strings of the blog_post object, since they can be sure the strings cannot be modified by external code.

If you make all data structures immutable, you eliminate this problem – and that's the purely-functional approach, taken by languages like Haskell. Clojure is similar in making the core data types (such as lists, vectors and hashmaps) immutable, and having controlled forms of mutability. In traditional Lisps like Scheme and Common Lisp, on the other hand, most composite data types (including lists, vectors and strings) are mutable. The standards of those languages are careful in describing which functions always return freshly allocated data and which return values that may share parts with the function's arguments. This is basically part of the contract of those functions, which you have to know whenever you want to mutate values generated by them. The situation in traditional Lisps is somewhat aggravated by the fact that linked lists may share a tail: two lists (1 3 7 5) and (2 7 5) may actually share the same cons cells for the (7 5) part. In a mostly functional setting, this is okay, but if we want to mutate anything, we have to be extra careful not to be inadvertently changing something else. In this example, sorting the second list in place may end up messing the first list.

What I'm interested in is finding a middle ground between full immutability and full mutability. I want to be able to return immutable data from functions, so I can know the consumers of that data won't inadvertently change it, and I also want to be able to create mutable data which can be modified in place. It would also be nice to be able to use mutable data for temporary processing and make it immutable after we are ready. And I want to be able to tell at a glance if I'm dealing with mutable or immutable data.

So here is the idea…

The idea

First, we make all basic composite data types (lists, vectors, dictionaries, strings, etc.) immutable. Then we add a single mutable box type. Values of the box type have a single mutable field. This idea is at least as old as ML's ref type, so nothing new so far. I will use the notation &val to mean a box containing val, and the expression (set! box val) changes the contents of the box box to val. I will also use @ (read at) for the indexing function, so (@ vec idx) means the idxth element (starting at 0) from vector vec. (@ box) with no indices means to extract the box's contents.

So now we can make a mutable cell with an immutable vector inside, e.g., &[1 2 3]. We cannot mutate the vector directly, but we can replace the whole vector with another immutable vector. That may be elegant and all, but it's not as convenient as a mutable array, nor as efficient. There are some tricks we can play here, though.

The first trick is to make the assignment operator (set!) recognize vector indexing as its first argument, so if v is the vector-containing box &[1 2 3], we can write (set! (@ v 0) 42) to replace the vector [1 2 3] with the vector [42 2 3] inside the box. It looks like we are mutating the vector's first element, but actually we are replacing the whole original immutable vector with a new immutable vector with a different element at position 0.

This gets us convenience, but it's still inefficient: if I write a loop to mutate all elements of the vector, it will generate a fresh new vector on each iteration. But then comes the second trick: how can we tell the difference between a mutable cell with an immutable vector inside from an actual mutable vector? If we make the difference invisible to the programmer, then we can mutate the vector in-place as an optimization. So (set! (@ v 0) 42) syntactically looks like mutating a vector element, semantically means replacing the whole vector with a new one, but implementationally actually works by mutating the vector element anyway. I'm not sure about the wisdom of this double layer of self-cancelling illusions, but let's explore this idea further.

Let's call the naive implementation using a mutable box with an immutable vector inside, well, the naive implementation. And let's call the implementation which underlyingly uses a mutable vector to represent the box+vector combination the smart implementation (with the full understanding that it may actually be too smart for its own good, or maybe not smart enough to make this idea work well).

The most basic operation you can do with a box is extracting the contents. In the naive implementation, that's just returning the value inside. In the smart implementation, we must simulate this by copying the current contents of the mutable vector into a freshly allocated immutable vector and returning that. A user can then observe the difference between the two implementations by taking the contents of the same box twice and checking whether the results are eq? to each other, i.e., whether they are the same object in memory.

It seems to me that the solution to this problem is to ditch object identity for immutable objects from the language, i.e., get rid of the lower-level eq? operation (Python's is), or at least relegate it to a library of lower-level operations. Immutable objects should only be compared by its contents, not by identity: if I compare [1 2 3] with [1 2 3], it should not matter whether they are separate objects in memory or not: they have the same contents and that's what matters. The only way to tell two distinct objects with the same contents from each other (apart from eq?) is by mutating one of them and seeing if the other changes as well; but if the objects are immutable, this distinction disappears.

A possible optimization to reduce the amount of copying when extracting a box's contents is to return the actual underlying vector, but mark the box as copy-on-write, i.e., we postpone the copy to the next time we need to mutate the vector inside the box. If the box is not mutated afterwards, the vector is never copied. The problem with this may be performance: we need to check the copy-on-write flag before every change to the vector, and the whole point of these optimizations is performance. Sure, we avoid a copy, but we slow down every write to the vector. This is aggravated by the fact that this flag must be synchronized across threads, lest we end up with two threads making new copies and clobbering each others' view of the box.

Speaking of which, doesn't thread synchronization throw this whole idea out of the window anyway? Extracting the contents of a box must be an atomic operation, but someone might be mutating the underlying vector while we are copying it into a new immutable vector to return it. This is okay as long as we can guarantee that the resulting copy represents one possible atomic state of the box at the time of the extraction, but consider the following scenario:

The box b originally contains &[1 2 3 4 5].
Thread 1 evaluates (def v (@ b)). The implementation starts to copy the elements into a new immutable vector.
Right after thread 1 copies 1, thread 2 evaluates (set! (@ b 0) 10). Now b is &[10 2 3 4 5]. Thread 1 keeps on with its copying.
Thread 2 evaluates (set! (@ b 4) 50). Now b is &[10 2 3 4 50].
Thread 1 finishes copying. Because the copy saw the second mutation but not the first one, the resulting copy has [1 2 3 4 50].
However, [1 2 3 4 50] has never been a state of b: the only states b had were [1 2 3 4 5], [10 2 3 4 5], and [10 2 3 4 50]. Thus, the illusion that the box contains an immutable vector is broken.

To avoid this problem, the implementation would have to acquire a lock (or use some other form of thread synchronization) when extracting the contents of a box, thus slowing down what should be a cheap operation. The copy-on-write solution avoids the copy incoherence problem, because the copying happens from the now-immutable vector to the new mutable one, and not the opposite, so we know that the origin will not change mid-copy. But as we have seen, we need to ensure synchronization of the copy-on-write flag, so it's pretty much the same.

Maybe this synchronization requirement is a good thing: maybe we want copies to be atomic anyway, and this way the semantics of the language guarantees that. But maybe this is an unnecessary overhead most of the time.

Even if we go with the copy-on-extract solution, we can avoid copying in the case of extracting the object from the box and then discarding the box (e.g., if we want to create a mutable vector, do a bunch of mutating operations on it, and then make it immutable) by providing an (unbox! b) operation which returns the contents and sets the box contents to nil (or whatever other value to indicate that the box is "empty"). Because we know the vector will not be mutated again, we can just return the underlying vector and call it immutable. This is basically what Clojure's persistent! operation does (though I didn't know that when I had this idea).

Let's consider some other problems and optimizations.

Putting it back in the box

What about sequences of transformations? For example, suppose I implement a filter function, which takes a predicate function and a vector and returns a new vector containing only the elements for which the predicate function returns true, like this:

(def (filter pred vec)
  (let ([result &[]])
    ;; Collect satisfying items in the mutable result vector...
    (for [item in vec]
      (when (pred item)
        (push! item result)))
    ;; And then return the contents as an immutable vector.
    (unbox! result)))

What if I want to, say, filter a vector and then reverse it? If filter is written like this, I get an immutable vector back, so I would have to copy it into a mutable vector again just so I can reverse it. If only filter had not called unbox! at the end, I could have reversed it in-place without a new allocation! But if I don't unbox!, I will have to always manually unbox the result when I want to, and most of the time I do want an immutable result.

There is a possible trick to help us here: if we unbox a value just to immediately box it back again, we can actually keep using the same underlying storage with no copying. The problems with this optimization are: (1) We must be able to know that no other references to the object have been created between the unboxing and the re-boxing, and basically the only way to do this is with some sort of reference counting. Reference counting has its share of problems (cyclic data structures never reach count zero, we need to synchronize count updates across threads), so relying on an optimization which requires us to use reference counting is not good. (2) We need to make sure the reference count does not inadvertently rise above 1, which would preclude the optimization. Since there may be more going on under the scenes in the compiler/interpreter than reaches the eye, this would be an unreliable optimization, that sometimes does not trigger for non-obvious reasons.

An alternative to use reference counting would be to do this analysis at compile-time, either through some form of escape analysis (which is hard to do across functions), or some crazy type system with uniqueness types (like Rust's borrow checker), which does not mesh well with my goal of a dynamically typed language.

Nested data structures and sharing

What if I put nested data inside a box, like &[[1 2] [3 4]]? Should the sub-vectors become underlyingly mutable too? If so, when should we stop recursing through the structure, which could contain other composite data types in it? Should we do it lazily, using copy-on-write as we mutate the inner vectors? The implementation of this can get tricky. If I access (@ b 0), should I get an immutable [1 2], or should I get a mutable &[1 2] which shares the same memory as the original element, so that mutations on the &[1 2] vector are reflected on the &[[1 2] [3 4]] one?

I'm too tired to even analyse the possibilities right now.

Refactoring code to change data mutability

Suppose I have a data structure with a bunch of immutable fields, say, (Person name age) and I decide I want to make the age field mutable. In our conceptual framework, this means wrapping the value of the field in a mutable box, e.g., (Person "Hildur" &22), i.e., the field itself remains immutable, but its value is a mutable cell. That looks nice, and makes all mutability readily visible, but it also means that we have to change all code using the age field to extract the value from the box, even code that does not mutate the value.

Maybe this is a good thing: if the code was written under the assumption that the value does not change, maybe it is good that we have to revise everything when we turn it mutable. On the other hand, this makes it harder to try things out in code and run it to see what happens, and for I long time I have defended the ability to run incomplete (and even wrong) programs while prototyping. However, I also want to be able to run optional static checks, and it's easier to do so when code is explicit about its intentions. So there we go.

An alternative: mutability as a flag

An alternative to the mutability-as-boxes approach is just to make all composite data structures carry a 'mutable' flag. We can still use the notation &[1 2 3] to mean a vector with the mutable flag on. We can provide an operation like Clojure's persistent! which turns the mutability flag of an object off. An operation to turn it back on would be more dangerous, but might be useful for debugging purposes (though it's the kind of thing you can be certain will be abused if added to the language (though Lisps have traditionally always preferred to empower the user rather than impose decisions on them)).

In this scenario, the semantics of (set! (@ v 0) 42) is to actually modify the vector in-place, so we don't need the double illusion trick. If we want to return an immutable version of a mutable data structure without losing the mutability, we have to explicitly copy it. Perhaps more descriptive of intention, we may have a non-destructive persistent operation which returns an immutable version of a value, which may or may not actually involve a copy (it may actually use copy-on-write behind the scenes). Thread synchronization has to be done explicitly, otherwise you assume the risks of getting a partially-modified copy. This is somewhat unsatisfying, but inconsistencies across threads could happen with boxes anyway whenever you had to work with more than one box, so a better solution to synchronization is needed anyway.

(In)conclusion

This idea of using mutable boxes + immutable data structures + optimization tricks had been haunting me for a week and seemed really appealing at first, but thinking more deeply about it, it does have its share of problems. Maybe it's a cool idea anyway, maybe not. I have to think more about it. Said research will require more and abundant funding.

1 comentário / comment

Doing web stuff with Guile

2018-02-19 22:35 -0300. Tags: comp, prog, lisp, scheme, web, in-english

A few days ago I started working on Parenthetical Blognir, a rewrite of Blognir in Guile Scheme. In this post I'd like to talk a bit about some things I learned in the process.

Interactive development

I did the development using Geiser, an Emacs package for interactive Scheme development. It can connect with a running Scheme, and you can evaluate code from within Emacs, have completion based on the currently available bindings, etc.

The coolest part of this is being able to reevaluate function and class definitions while the program is running, and seeing the effects immediately. In this sense, GOOPS (the Guile Object Oriented Programming System, inspired by the Common Lisp Object System) is really cool in that you can redefine a class and the existing instances will automatically be updated to reflect the new class definitions (unlike, say, Python, in which if you redefine a class, it's an entirely new class, and instances of the old class will have no relationship with the new class).

One thing I realized is that you have to adapt your program a bit if you want to make the most of interactive development. For example, in Guile's web framework, there is a procedure (run-server handler), which starts up the web server and calls handler to serve each request. My request handler was a procedure named handle-request, so my code called (run-server handle-request). The problem is that this way run-server will be called with the value of handle-request at the time we started the server, and subsequent redefinitions of handle-request while the server is running will have no effect on the running server. Instead, I ended up writing something like:

(start-server (lambda (request body)
                (handle-request request body)))

I.e., instead of calling handle-request directly, the server will call the anonymous function which, when called, will call handle-request. In this way, it will use the value of handle-request at each time the anonymous function is called, so it will see changes to the definition.

Another thing to take into account is that there may be some bindings you don't want to redefine when you reload a module, e.g., variables holding program state (in my case, a variable holding the blog object). For that, Guile provides a define-once form, which only defines a variable if it doesn't already exist.

One gotcha I encountered was when using parameters, the Scheme world equivalent of dynamically-scoped variables. Parameters in Guile have per-thread independent values, and since the web server and the REPL run in different threads, they may see different values for the same parameter. I ended up not using parameters anyway for other reasons (more on that later).

`(web server)`

Guile comes with a web framework of sorts, though it is pretty bare-bones. (Actually the main thing I missed in it was having to parse the request query and POST data by hand. At least it does provide a function to percent-decode URL components.) It has a philosophy of pre-parsing all headers into structured data types as a way of avoiding programming errors. It's an interesting idea; I have mixed feelings about it, but I think it's a valid idea to build a framework on (after all, if you're going the trouble of making a new framework, you might as well try some new ideas rather than creating yet another run-of-the-mill web framework).

You start the server by calling run-server with a callback (as mentioned above). Whenever a new request comes, the callback will be called with a request object and the request body as a bytevector. The callback must return (at least¹) two values: a response object and a response body. Guile allows some shortcuts to be taken: Instead of a response object, you can pass an association list of response headers and the framework will automatically make a response object out of it. The response body may be either a string (which will be automatically encoded to the proper encoding, usually UTF-8), a bytevector, or a procedure; in the latter case, the procedure will be invoked with a port as an argument, and whatever you print to that port will be sent as the response body.

Rendering HTML

Guile comes with support for SXML, an S-expression based tree representation of XML. This means you can write things like:

(sxml->xml `(div (@ (class "foo"))
                 "Hello, world"))

and it will emit <div class="foo">Hello, world</div>. The nice thing is that strings appearing in the tree will be automatically escaped approprietely, so you don't have to worry about escaping (or forgetting to escape) data that may contain special characters, such as <, > or &.

That very feature was at first what led me not to want to use SXML, appealing though it was, to render Blognir pages. The reason is that post contents in Blognir come raw from a post file; I didn't want to parse the file HTML contents into SXML just to dump it again as HTML in the output², and I saw no way to insert a raw string in the middle of an SXML tree bypassing the escaping in the output. So I began this adventure by printing chunks of HTML by hand. At some points I needed to escape strings to insert them in the HTML, so I wrote a small wrapper function to call sxml->xml on a single string and return the escaped string (by default sxml->xml prints to a port rather than returning a string).

When I got to the post comments form, where I have to do a lot of escaping (because all field values have to be escaped), I decided to use sxml->xml for once, for the whole form, rather than escaping the individual strings. I found it so nice to use that I decided to look up the source code for sxml->xml to see if there wasn't a way to insert raw data in the SXML tree without escaping, so I could use it for the whole page, not just the form. And sure enough, I found out that if you put a procedure in the tree, sxml->xml will call that procedure and whatever it prints is emitted raw in the result. This feature does not seem to be documented anywhere. (In fact, the SXML overview Info page says (This section needs to be written; volunteers welcome.). Maybe that's up to me!) By that point I had already written most of the rest of the page by printing HTML chunks, and I did not go back and change everything to use SXML, but I would like to do so. I did use SXML afterwards for generating the RSS feeds though, with much rejoicing.

Parameters – or maybe not

Parameters are used for dynamically scoped values. They are used like this:

;; 23 is the initial value of the parameter.
(define current-value (make-parameter 23))

(define (print-value)
  (display (current-value))
  (newline))

(print-value)                           ;; prints 23

(parameterize ([current-value 42])
  (print-value))                        ;; prints 42

(print-value)                           ;; prints 23 again

My original plan was to create a bunch of parameters for holding information about the current request (current-query, current-post-data and the like), so I wouldn't have to pass them as arguments to every helper request handling function; I would just bind the parameters at the main handle-request function, and all functions called from handle-request would be able to see the parameterized values.

The problem with my plan is that instead of returning the response body as a string from handle-request, I was returning a procedure for the web framework to call. By the time the procedure was called, handle-request had already finished, and the parameterize form was not in effect anymore. Therefore the procedure saw the parameters with their initial value rather than the value they had when the procedure was created. Oops!

Because closures don't close over their dynamic scope (that's kinda the whole point of dynamic scope), parameters ended up not being very useful for me in this case. I just passed everything as, ahem, parameters (the conventional kind) to the subfunctions.

Performance tuning

Despite its crude design, the original Blognir is pretty fast; it takes around 1.7ms to generate the front page in my home machine. I got Parenthetical Blognir at around 3.3ms for now. I'm sure there are still optimizations that can be done, and I may still try some things out, but right now I don't have any pressing need to make things faster than that.

I did learn a few things about optimizing Guile programs in the process, though. I used the ab utility (package apache2-utils on Debian) to measure response times, and Guile's statistical profiler to see where the bottlenecks were. I did not keep notes on how much impact each change I did had on performance (and in many cases I changed multiple things at the same time, so I don't know the exact impact of each change), but I can summarize some of the things I learned.

This is standard advice and applies to every programming language, but do use the profiler to find out performance bottlenecks. They often come from unexpected places, and it is far better to see the numbers than trying to guess where the bottlenecks might be.
Some things are slow because they take up CPU themselves, and some things are slow because they generate garbage, so your program spends more time in the garbage collector. The procedure currently taking the most time in Parenthetical Blognir is %after-gc-thunk, and I still have not analysed it more carefully to determine where this is coming from.
Guile has two versions of format (the Guile counterpart to printf): one from the (ice-9 format) module, written in Scheme, which supports various format specifiers; and a bare-bones one, written in C, available under the name simple-format, which only supports the ~a and ~s format specifiers with no qualifiers.³ simple-format is much faster than the format from (ice-9 format), so if you only need the bare ~a and ~s formats and your program makes heavy use of them, I recommend using simple-format.
My program was spending quite a bit of time on read-string to read post contents. It only reads the contents of a post to dump it into the response, and read-string will waste some time decoding the input from UTF-8 to Guile's internal string type, dynamically growing a string to the right size to fit the post contents, just to immediately decode it back to UTF-8 and discard the string as garbage. I found it more efficient to write a little function which takes two ports (the open post file and the open response port) and just copy the raw bytes in fixed-size chunks from one port to the other:
```
(use-modules (rnrs bytevectors)
             (ice-9 binary-ports))

(define (pump-port in-port out-port)
  (let ([buffer (make-bytevector 65536)])
    (let loop ()
      (let ([count (get-bytevector-n! in-port buffer 0 65536)])
        (when (not (eof-object? count))
          (put-bytevector out-port buffer 0 count)
          (loop))))))
```
For extra garbage reduction, you can reuse the buffer across calls (but be careful if the code may be called from multiple threads).

Conclusion

In general, I liked the experience of rewriting the blog in Guile. It was the first time I did interactive Scheme development with Emacs (previously I had only used the REPL directly), and it was pretty cool. Some things could be better, but I see this more as an opportunity to improve things (whether by contributing to existing projects, by writing libraries to make some things easier, or just as things to take into account if/when I decide to have a go again at trying to make my own Lisp), rather than reason for complaining.

There are still a few features missing from the new blog system for feature parity with the current one, but it already handles all the important stuff (posts, comments, list of recent comments, filtering by tag, RSS feeds). I hope to be able to replace the current system with the new one Real Soon Now™.

_____

¹ If you return more than two values, the extra values will be passed back to the callback as arguments on the next call. You can use it to keep server state. If you want to use this feature, you can also specify in the call to run-server the initial state arguments to be passed in the first call to the callback.

² And my posts are not valid XML anyway; I don't close my <p> tags when writing running text, for instance.

³ There is also a format binding in the standard environment. It may point to either simple-format or the (ice-9 format) format, depending on whether (ice-9 format) has been loaded or not.

Comentários / Comments

Impressions on R7RS-small: libraries, records, exceptions

2017-10-03 15:10 -0300. Tags: comp, prog, lisp, scheme, in-english

In the last post, I wrote a little bit about the historical context in which R7RS came about. In this post, I will comment on my impressions about specific features of the R7RS-small language.

First of all, I'd like to note that if you are going to read the R7RS-small report, you should also read the unofficial errata. As I read the document I spotted a few other errors not mentioned in the errata, but unfortunately I did not keep notes as I was reading. I'm not sure why a, um, revised version of the report is not published with the known errors corrected (a Revised^7.01 Report?), but alas, it isn't, so that's something to keep in mind.

In this post, I will talk mainly about the differences between R5RS and R7RS-small, since R7RS-small is more of an incremental extension of R5RS, rather than R6RS. This is not intended as a complete or exhaustive description of each feature; for that, consult the report.

Libraries/modules

R7RS introduced the concept of libraries (what some other systems call modules; I suppose R6RS and R7RS chose the name "library" to avoid conflict with the concept of modules in existing implementations). Library names are lists of symbols and non-negative integers, such as (scheme base), or (srfi 69). A library has a set of imports, a set of exported symbols, and code that constitutes the library. A library definition looks like this:

(define-library (foo bar)
  (import (scheme base)
          (scheme write))
  (export hello)
  (begin
    (define (hello)
      (display "Hello, world!\n"))))

Instead of putting the library code directly in the define-library form (inside a begin clause), it is also possible to include code from a different file with the (include "filename") directive (or include-ci for parsing the file case-insensitively; standard Scheme was case-insensitive until R5RS (inclusive)). This makes it easier to package R5RS code as an R7RS library, by including the older code within a library declaration. It's also a way to avoid writing all library code with two levels of indentation.

Imports can be specified or modified in a number of ways:

One can import a whole library: (import (scheme base))
Or just some identifiers from it: (import (only (some lib) someid1 someid2))
Or all but some identifiers: (import (except (some lib) unwanted-id1 unwanted-id2))
Or rename some identifiers: (import (rename (some lib) (original1 new1) (original2 new2)))
Or add a prefix to all identifiers: (import (prefix (some lib) somelib/))

These forms can be combined (e.g., you can import only some identifiers and add a prefix to them).

Exports just list all identifiers to be exported, but you can also write (rename internal-name exported-name) to export identifiers with a different name than they have within the library body.

Unlike R6RS, all library code directly embedded in the define-library form must be written within a begin clause. At first I found this kinda weird, but it has an interesting consequence: the library definition sublanguage does not have to know anything about the rest of the programming language. There is only a limited number of kinds of subforms that can appear within define-library, and parsing it does not require knowing about the values of any identifiers. This means that define-library can be more easily processed as data. One can imagine useful tools which read library definitions from files and, say, compute the dependencies of a program, among other possibilities.

In fact, R7RS does not classify define-library or its subforms as syntax forms, i.e., they are something apart from Scheme expressions. This also resolves a problem that would occur if define-library were an expression. The report specifies that the initial environment of a program is empty. So, how would I use import declarations before importing the library where import declaration syntax is defined? Of course one way around this would be to make (scheme base) available by default rather than start with the empty environment. But the solution adopted by R7RS means we don't have to import (scheme base) if we don't want to (for example, if we want to import (scheme r5rs) instead to package R5RS code as an R7RS library). (The report does define for convenience some procedures and syntax forms with the same name as corresponding library subforms, e.g., include.)

R7RS also standardized cond-expand (extended from SRFI 0). cond-expand is a mechanism somewhat like ifdefs in C for conditionally including code depending on whether the implementation defines specific feature symbols, or whether some library is available. This makes it possible to provide different implementations of a procedure (or a whole library) depending on the current implementation. One way we could use it is to write shims, or compatibility layer libraries to provide an uniform interface for features that are implemented differently by various implementations. For example, in Common Lisp, many implementations support threads, but they provide different interfaces. Bordeaux Threads is a library which provides a uniform API and maps those to the corresponding functions in each implementation it supports. Now we can do similar things in R7RS for those features that are supported everywhere but in incompatible ways (e.g., for networking).

Libraries and cond-expand are by far the most important addition in R7RS relative to R5RS. Even if we did not have any of the other features, we could package them as libraries and provide implementation-specific code for them via cond-expand.

Missing things

The report does not specify a mapping between library names and file names. I realize that it would be kinda hard to make everyone agree on this, but it is somewhat of a hurdle in distributing programs organized into libraries. Some implementations, such as Chibi, will look up a library named (foo bar) in a file named foo/bar.sld (where .sld stands for Scheme library definition), whereas CHICKEN will look it up at foo.bar.*. There is a project of a portable package manager for R7RS called Snow, which I think takes care of mapping packaged library files to implementation-specific names, but I haven't taken the time to check it out yet.

R7RS takes the excellent step of specifying that library names whose first component is the symbol srfi are reserved for implementing SRFIs, but then fails to specify how to name a specific SRFI. In practice, the few implementations I checked all agree on using (srfi n) as the name of the library implementing SRFI number n (i.e., I can write (import (srfi 69)) and remain reasonably portable), so this may turn out not to be a problem in practice.

Records

R7RS incorporates the define-record-type form from SRFI 9, for defining new record/struct types. It is a somewhat verbose form, which requires specifying the record constructor arguments and the names for each field accessor/getter and (optional) mutator/setter, but it's basically the least common denominator that any implementation which has some form of records can easily support. It looks like this:

(define-record-type <person> (make-person name age) person?
  (name person-name person-name-set!)
  (age person-age person-age-set!))

Here:

<person> is the record type name. R7RS-small does not specify any particular use for this name, but R7RS-large or some other SRFI may specify uses for it (e.g., to allow querying information about the record type).
(make-person name age) defines the constructor procedure and order of arguments for this record.
person? defines the name of a predicate which will answer true when given records created with make-person, and false for other Scheme values.
The clause (name person-name person-name-set!) declares a field called name, associated with a getter procedure person-name to extract the name field from a given person record, and a setter procedure person-name-set! for mutating the name field in a person record. The next clause does the same for the field age.

R5RS did not have any way to define new types disjunct from existing types. R6RS provides a more complex records facility, including both a syntactic and a procedural layer allowing reflection, but I don't know it well enough to comment. (I have read some comments on problems in the interaction between syntactically- and procedually-defined records, but I don't know the nature of the problems or how serious they are.)

Missing things

Reflection would be nice, or at least a way to convert a record into a vector or something like this (though I realize this might displease some people), but we could make libraries for that. Another thing that would be nice is for records to have a standard printed representation which could be printed out and read back again, but I realize there is a slightly complicated interaction here with the module system (the printed representation should be tagged with the record type in a way that will work regardless of which module it is read back in), and this might not even be desirable for implementation-internal types which happen to be defined in terms of define-record-type.

Exceptions

R7RS incorporates the exception handling mechanisms from R6RS, but not the condition types. Any value can be raised in an exception. The raise procedure raises a value as an exception object, or condition, to be caught by an exception handler. The guard form can be used to install an exception handler to be active during the evaluation of its body. The guard form names a variable to hold the captured condition, a sequence of cond-like clauses to determine what action to take given the condition, and a body to be executed. It looks like this:

(guard (err
        ((file-error? err) (display "Error opening file!\n"))
        ((read-error? err) (display "Error reading file!\n"))
        (else (display "Some other error\n")))
  (call-with-input-file "/etc/passwd"
    (lambda (file)
      (read-line file))))

If an else clause is not provided and no other clause of the guard form matches, the exception propagates up the stack until some handler catches it. If an exception is raised and caught by a guard clause, the value returned by the guard form is whatever is returned by the body of that clause.

Beside raise, R7RS also defines a procedure (error message irritants...), which raises an error object (satisfying the error-object? predicate) encapsulating an error message and a sequence of objects somehow related to the error (called "irritants"). It also defines the procedures error-object-mesage and error-object-irritants to extract the components of the error object.

R7RS does not define specific object types to represent errors; it only says that objects satisfying a given predicate must be raised in some circumstances. An implementation might define a record type for that, or just use lists where the first element represents the error type, or whatever is appropriate for that implementation.

At first I did not think exceptions were that important in the grand scheme of things (heh), since you can implement them on the top of continuations. (And indeed, exceptions in R6RS are in a separate library rather than the base language, although this does not mean much in R6RS because, if I understand correctly, all libraries are mandatory for R6RS implementations.) However, I then realized that until R5RS (inclusive), there was no standard way to signal an error in Scheme code, and perhaps more importantly, no standard way of catching errors. If portable libraries are to become more prominent, we will need a standard way of signalling and catching errors across code from different projects, so exceptions are a good add-on.

Beside raise, R7RS also defines raise-continuable, which raises an exception but, if the ~~guard~~ exception handler returns, it returns to the point where the exception was raised rather than exiting from the ~~guard~~ handler form. [Correction: this is how raise-continuable interacts with with-exception-handler, not guard. I'm still figuring how guard acts with respect to continuable exceptions.] On the top of this, something like Common Lisp's restarts can be implemented.

One side effect of having guard in the language is that now you can do control flow escapes without using call-with-current-continuation (call/cc for short). In theory this could be more efficient than capturing the fully general continuation just to escape from it once; in practice, some implementations may rely on call/cc to implement guard (the example implementation provided in the report does), so this performance advantage may not realize. But just having a construct to express a one-shot escape is already great, because it allows expressing this intent in the program, and potentially allows implementations to emit more efficient code when a full continuation is not required.

I was wondering if one could implement unwind-protect (a.k.a. try/finally) in terms of guard, and so avoid dynamic-wind for error-handling situations. Alas, I don't think this is possible in general, because the presence of raise-continuable means an error handler may execute even though control may still return to the guard body. I wish to write more about continuations in a future post.

Conclusion

Libraries (plus cond-expand), records and exceptions are the most important additions in R7RS-small relative to R5RS, and they are all a great step towards enabling code reuse and portability across implementations, while not constraining Scheme implementors unnecessarily. I am particularly happy about libraries and cond-expand, because this means we can start writing compatibility libraries to bridge differences between implementations without having to rely on a standardization process.

I have some other comments to make on I/O, bytevectors, and other parts of the standard library, but they can wait for a future post.

1 comentário / comment

R5RS, R6RS, R7RS

2017-10-01 22:11 -0300. Tags: comp, prog, lisp, scheme, in-english

Over the last few days I have skimmed over R7RS, the Revised⁷ Report on [the Algorithmic Language] Scheme. I thought I'd write up some of my impressions about it, but I decided first to write a bit about the history and the context in which R7RS came about and the differing opinions in the Scheme community about R6RS and R7RS. In a future post, I intend to write up about my impressions of specific features of the standard itself.

The Scheme language was first described in a document named the "Report on the Algorithmic Language Scheme". Afterwards, a second version, called the "Revised Report on the Algorithmic Language Scheme", came out. The following version of the standard was called the "Revised Revised Report …", or "Revised² Report …" for short. Subsequent versions have kept this naming tradition, and the abbreviation RnRS (for some n) is used to refer to each version of the standard.

Up to (and including) R5RS, all versions of the standard were ratified only by unanimous approval of the Scheme Steering Committee. As a result, each iteration of the standard was a conservative extension of the previous version. R5RS defines a very small language: the whole document is just 50 pages. The defined language is powerful and elegant, but it lacks many functions that are typically expected from the standard library of a modern language and necessary for many practical applications. As a result, each Scheme implementation extended the standard in various ways to provide those features, but they did so in incompatible ways with each other, which made it difficult to write programs portable across implementations.

To amend this situation a bit, the Scheme community came up with the Scheme Requests for Implementation (SRFI) process. SRFIs are somewhat like RFCs (or vaguely like Python's PEPs): they are a way to propose new individual features that can be adopted by the various implementations, in a way orthogonal to the RnRS standardization process. A large number of SRFIs have been proposed and approved, and some are more or less widely supported by the various implementations.

R6RS attempted to address the portability problem by defining a larger language than the previous reports. As part of this effort, the Steering Committee broke up with the tradition of requiring unanimous approval for ratification, instead requring a 60% majority of approval votes. R6RS defines a much larger language than R5RS. The report was split into a 90 page report on the language plus a 71 page report on standard libraries (plus non-normative appendices and a rationale document). The report was ratified with 67 yes votes (65.7%) and 35 no votes (34.3%).

The new report caused mixed feelings in the community. Some people welcomed the new standard, which defined a larger and more useful language than the minimalistic R5RS. Others felt that the report speficied too much, reinvented features in ways incompatible with existing SRFIs, and set some things in stone too prematurely, among other issues.

In response to this divide, the Scheme Steering Committee decided to split the standard into a small language, more in line with the minimalistic R5RS tradition, and a large language, intended to provide, well, a larger language standardizing a larger number of useful features. The R7RS-small report was completed in 2013. The R7RS-large process is still ongoing, being developed in a more incremental way rather than as one big thing to be designed at once.

I think that the R6RS/R7RS divide in part reflects not only differing views on the nature of the Scheme language, but also differing views on the role of the RnRS standards, the Steering Committee, and the SRFI process. In a discussion I read these days, a person was arguing that R6RS was a more useful standard to them, because for most practical applications they needed hashtables, which R6RS standardized but R7RS did not. My first thought was "why should hashtables be included in the standard, if they are already provided by SRFI 69?". This person probably does not consider SRFIs to be enough to standardize a feature; if something is to be portable across implementations, it should go in the RnRS standard. In my (current) view, the RnRS standard should be kept small, and SRFIs are the place to propose non-essential extensions to the language. My view may be colored by the fact that I started using Scheme "for real" with CHICKEN, an implementation which not only supports a large number of SRFIs, but embraces SRFIs as the way various features are provided. For example, whereas many implementations provide SRFI 69 alongside their own hashtable functions, CHICKEN provides SRFI 69 as the one way of using hashtables. So, CHICKEN users may be more used to regard SRFIs as a natural place to get language extensions from, whereas users of some other implementations may see SRFIs as something more abstract and less directly useful.

I have already expressed my view on Scheme's minimalism here, so it's probably no surprise that I like R7RS better than R6RS. I don't necessarily think R6RS is a bad language per se (and I still have to stop and read the whole R6RS report some day), I just have a preference for the standardized RnRS language to be kept small. (I'm okay with a larger standard a la R7RS-large, as long as it remains separate from the small language standard, or at least that the components of the large language remain separate and optional.) I also don't like every feature of R7RS-small, but overall I'm pretty satisfied with it.

Comentários / Comments

On Scheme's minimalism

2017-09-14 19:34 -0300. Tags: comp, prog, lisp, scheme, pldesign, ramble, in-english

[This post started as a toot, but grew slightly larger than 500 characters.]

I just realized something about Scheme.

There are dozens, maybe hundreds, of Scheme implementations out there. It's probably one of the languages with the largest number of implementations. People write Schemes for fun, and/or to learn more about language implementations, or whatever. The thing is, if Scheme did not exist, those people would probably still be writing small Lisps, they would just not be Scheme. The fact that Scheme is so minimal means that the jump from implementing an ad-hoc small Lisp to implementing Scheme is not that much (continuations notwithstanding). So even though Scheme is so minimal that almost everything beyond the basics is different in each implementation, if there were not Scheme, those Lisps would probably still exist and not have even that core in common. From this perspective, Scheme's minimalism is its strength, and possibly one of the reasons it's still relevant today and not some forgotten Lisp dialect from the 1970s. It's also maybe one of the reasons R6RS, which departed from the minimalist philosophy, was so contentious.

Plus, that core is pretty powerful and well-designed. It spares each Lisp implementor from part of the work of designing a new language, by providing a solid basis (lexical scoping, proper closures, hygienic macros, etc.) from which to grow a Lisp. I'm not one hundred percent sold on the idea of first class continuations and multiple values as part of this core*, and I'm not arguing that every new Lisp created should be based on Scheme, but even if you are going to depart from that core, the core itself is a good starting point to depart from.

[* Though much of the async/coroutine stuff that is appearing in modern languages can be implemented on the top of continuations, so maybe their placement in that core is warranted.]

2 comentários / comments

Guile: primeiras impressões

2017-01-02 22:54 -0200. Tags: comp, prog, scheme, lisp, em-portugues

Até agora, as únicas implementações de Scheme com as quais eu tinha tido um contato mais extensivo foram o Chicken e, em menor grau, o Racket. Semana passada eu comecei a dar uma olhada no manual do Guile, o Scheme do Projeto GNU. So far, o Guile pareceu um Scheme bem bacaninha. Neste post, deixo registradas algumas das minhas impressões iniciais do Guile e coisas que eu achei interessantes até agora, com o caveat de que eu ainda não usei o Guile para nada na prática além de testar meia dúzia de coisas no REPL e escrever um ou outro script de meia dúzia de linhas.

Bytecode

Diferente do Chicken, o Guile não gera executáveis nativos; ao invés disso, ele compila para um bytecode próprio. Na verdade, a VM do Guile suporta não apenas Scheme, como também possui suporte preliminar a Emacs Lisp e ECMAScript (!), mas ainda não sei como essas coisas se integram. Em termos de performance, o Guile não parece ser nem lá nem cá, e imagino que seja comparável a outras linguagens interpretadas, como Python. Eu experimentei fazer uns benchmarks toscos, mas os resultados foram inconclusivos e requererão uma análise mais aprofundada, que eu não hei de fazer tão cedo.

Debugabilidade

Em termos de debugabilidade, o Guile ganha bonito do Chicken. Para começar, o Guile imprime (pasmem!) um stack trace quando ocorre um erro. O Chicken não imprime um stack trace pelo simples fato de que ele não usa uma pilha de chamadas da maneira convencional; quando ocorre um erro, o Chicken imprime um "histórico de chamadas", i.e., uma lista das últimas N chamadas, na ordem em que ocorreram, mas sem representar o aninhamento das chamadas, o que torna a depuração mais complicada. Além de mostrar uma pilha, o Guile ainda mostra os valores dos argumentos em cada chamada empilhada (algo cuja falta me incomoda bastante em Python) e, quando executado em modo interativo, cai num debugger que permite, entre outras coisas, inspecionar os valores das variáveis locais. Também é possível definir breakpoints e essas coisas que se espera de um debugger, mas não cheguei a olhar essa parte com calma.

Além disso, o Guile tende a detectar mais erros do que o Chicken. Por exemplo, o Chicken não reporta um erro se uma função é declarada com múltiplos parâmetros com o mesmo nome, ou se uma função é chamada com um keyword argument que ela não suporta.

(Não-)minimalismo

No Chicken há uma separação maior entre uma linguagem core pequena e extensões, que têm que ser importadas explicitamente em programas que as usam. (Por exemplo, no programa de adivinhações de um post anterior, foi necessário dar um (use extras) para ter acesso à função random.) No Guile, uma quantidade bem maior de funcionalidades (incluindo expressões regulares e a API POSIX) já está disponível mesmo sem fazer nenhum import. Nesse quesito, o Guile tem um feel um pouco mais "Common-Líspico" do que o Chicken. (Mas não muito; coisas como orientação a objetos requerem um import explícito.)

Um outro sentido em que o Guile é não-minimalista é que freqüentemente há multiplas APIs para fazer a mesma coisa. Em muitos casos, isso se deve ao fato de que uma API nova foi introduzida (freqüentemente uma SRFI, o que é um ponto positivo), mas a antiga foi mantida por compatibilidade. Por exemplo, para a definição de estruturas, o Guile suporta a SRFI-9, as APIs tradicionais do Guile (anteriores à SRFI-9) e a API de records do R6RS. Da mesma forma, o Guile suporta escopo dinâmico tanto por meio de fluids (a interface histórica) quanto por parameters (SRFI-39). (Os parameters são implementados em termos de fluids no Guile.)

O Guile parece ser bastante comprometido com compatibilidade com versões anteriores, o que tem o ponto bem positivo de que seu código provavelmente vai continuar funcionando nas versões futuras, mas isso vem com o custo de ter múltiplas APIs para as mesmas funcionalidades hanging around.

Módulos

Enquanto o Chicken faz uma distinção entre units (que são usadas para compilação separada de partes de um programa) e módulos (que são usados para isolar namespaces), no Guile um módulo serve a ambos os propósitos. Na verdade eu acho essa distinção que o Chicken faz bastante annoying (e aparentemente há quem queira deprecar as units no Chicken 5), e mui me alegrou saber que o Guile (1) possui um sistema de módulos; (2) que não é cheio de frescura (ou pelo menos as frescuras são opcionais); e (3) é fácil de usar.

O nome de um módulo em Guile é uma lista de símbolos, e um módulo de nome (foo bar) é procurado no arquivo load_path/foo/bar.scm. O load path default pode ser alterado através de um parâmetro da linha de comando, ou de uma variável de ambiente, ou setando %load-path e %load-compiled-path explicitamnte.

Não sei qual é a maneira convencional de escrever programas separados em múltiplos arquivos sem ter que instalá-los no load path. Imagino que uma maneira seja escrever um arquivo main que sete o load path para incluir o diretório do programa, e depois importar os demais componentes do programa. Outra maneira é dar include nos arquivos, mas isso não cria módulos com namespaces separados.

Threads

O Guile suporta threads nativas do sistema operacional, diferentemente do Chicken, que suporta apenas "green threads" (uma thread nativa rodando múltiplas threads lógicas cooperativamente). Além das APIs comuns para criação de threads, mutexes e toda essa bagulheira, o Guile também suporta uma API de futures, mantendo automaticamente uma pool de threads cujo tamanho é determinado por padrão pelo número de cores da máquina, e uma macro (parallel exp1 exp2 ...) que roda todas as expressões em paralelo e retorna o valor de cada uma, e um letpar, um "let paralelo" que avalia o valor de todas as variáveis em paralelo. Não sei quão útil isso é na prática, mas que é bem legal, é.

Orientação a objetos

O Guile vem com um sistema de orientação a objetos baseado em generic functions e multiple dispatch a la CLOS, chamado GOOPS. Ainda não olhei o GOOPS com calma, mas ele parece não ter todas as coisas que o CLOS tem (por exemplo, before, after e around methods), mas ele permite redefinir classes em tempo de execução (com atualização automática das instâncias existentes da classe), e parece ter algumas coisinhas a mais (e.g., provisões para mergear métodos de mesmo nome herdados de módulos diferentes).

Uma coisa muito legal do GOOPS em comparação com o CLOS é que ele permite transformar transparentemente uma função comum em uma função genérica. Por exemplo, você pode adicionar um método à função builtin +:

(define-method (+ (a <string>) (b <string>))
  (string-append a b))

Feito isso, agora você pode escrever (+ "a" "b"), e o resultado será "ab". O interessante disso é o define-method não sobrepõe o + existente com um + novo: ele modifica o + existente, e agora todo o código que usava + antes vai passar a usar esse + aumentado. Aparentemente isso só funciona para substituir funcionalidades não-padrão dos operadores; se você definir um método (+ (a <number>) (b <number>)) e tentar somar dois números, o Guile vai continuar usando a soma padrão ao invés da sua definição, acredito eu que porque a chamada a + é compilada para a instrução add da VM, que vai ignorar o método caso os argumentos sejam números. (O que de certa forma torna o fato de o + usar o método definido pelo usuário quando os argumentos não são números ainda mais impressive, pois, eu suponho, eles tiveram que "go out of their way" para fazer a instrução add da VM verificar se houve a adição de métodos ao + pelo usuário quando os argumentos não são números. Mas não sei o suficiente sobre a implementação do Guile para saber realmente o que está acontecendo por baixo dos panos.)

Setters

Uma coisa que eu achei meio chata no Guile com relação ao Chicken é que o Guile não suporta "de fábrica" usar set! em coisas que não sejam identificadores. Por exemplo, no Chicken é possível escrever coisas como (set! (car x) 42) ao invés de (set-car! x 42); o Guile, por padrão, não tem suporte a isso. O Guile tem suporte a "procedures with setters", através de uma API tradicional e da API da SRFI-17, e ao importar o módulo (srfi srfi-17) o set! passa a ser usável com car, cdr e vector-ref, mas tem um zilhão de outras funções similares (como hash-ref, array-ref, etc.) que não têm setters definidos. Nada fatal, e nada lhe impede de definir os setters para essas funções, mas seria legal se houvesse suporte nativo a setters para todas as funções em que faz sentido ter um setter, como é o caso no Chicken.

Bibliotecas

O Guile parece ter bem menos bibliotecas do que o Chicken, e certamente não possui um repositório centralizado de bibliotecas, como é o caso dos eggs do Chicken. (A documentação do guild, a interface para os utilitários de linha de comando do Guile, tais como guild compile, menciona planos de permitir instalar pacotes da Internet através do guild no futuro. Não sei como eles pretendem realizar isso, mas, da minha parte, eu acho que mais importante do que um repositório centralizado é uma maneira padronizada de empacotar programas/bibliotecas e descrever dependências de uma maneira que permita sua resolução automática na instalação. But I digress.)

Por outro lado, o Guile vem com uma porção de módulos de fábrica, e possui bindings para a Gtk e o GNOME. Ainda não as olhei com calma, mas pode ser uma solução interessante para criar aplicações com interface gráfica.

Unicode

No Chicken, por padrão, todas as strings são strings de bytes. Há um módulo/extensão/unit/library/whatever chamada utf8, que reimplementa diversas funções de manipulação de strings para assumirem que as strings estão codificadas em UTF-8 (as strings continuam sendo strings de bytes por baixo dos panos). Importar o utf8 não substitui, mas sim redefine, as funções padrão, então, pelo que eu entendo, importar utf8 no seu módulo não vai fazer os outros módulos do sistema que não importaram explicitamente utf8 passarem a funcionar magicamente com strings UTF-8.

No Guile, strings são Unicode nativamente (há um tipo separado para "byte vectors", que pode ser usado para guardar bytes literais não interpretados como caracteres). Portas (arquivos abertos) possuem um encoding associado, e o Guile faz a conversão de Unicode para o encoding da porta automaticamente. Não sei se isso não pode acabar incomodando na prática (o encoding default é determinado pelo locale, e modo de abertura de arquivos que depende do locale me dá um certo medo, mas talvez seja por trauma dos UnicodeDecodeError do Python 2, o que não é a mesma situação do Guile porque no Guile todas as strings são Unicode por padrão; e nada impede de setar o encoding manualmente ao abrir um arquivo).

Conclusão

No geral, o Guile me pareceu uma implementação bem legal de Scheme, e tem um monte de outros aspectos interessantes que eu não cheguei a mencionar nesse post (por exemplo, que ele foi feito para ser embutível em programas C, e que a API C é documentada juntamente com as funções correspondentes em Scheme, e que no geral a documentação do Guile é bem boa). Quero ver se o uso em projetos no futuro para ter uma experiência mais prática com ele.

2 comentários / comments

Some things I know about Clang and LLVM, part 1

2016-06-17 04:10 -0300. Tags: comp, prog, llvm, mestrado, in-english

In this post I'm going to talk about a few things about Clang and LLVM which I learned during my Master's and which might be useful to people new to Clang/LLVM.

The LLVM Project

According to its website, "The LLVM Project is a collection of modular and reusable compiler and toolchain technologies." The LLVM Project encompasses a number of sub-projects, the main ones being LLVM and Clang. Basically, LLVM is an infrastructure for code compilation, analysis and transformation. LLVM originally stood for "Low Level Virtual Machine", but it is not really a virtual machine, so nowadays "LLVM" is not considered an acronym anymore, it's just the name of the project. Clang (which is pronounced "clang", by the way, not "C-lang") is a C/C++/Objective C compiler which uses LLVM for code generation. The great things about Clang and LLVM are:

Unlike traditional compilers, they are designed as reusable components. You can use Clang and LLVM components as libraries to write your own program analysis/transformation tools, for instance, or you might extend Clang and LLVM with new passes/plugins, or use LLVM as a backend to generate machine code for your compiler. People have implemented JIT compilers (such as this) using LLVM as a basis for code generation, for example.
LLVM is designed around a well-defined intermediate language, called LLVM IR (Intermediate Representation), which is sort of an assembly-like language for an abstract machine. All code transformations at the LLVM level (apart from machine code generation) are implemented as transformations taking an LLVM IR program as input and producing a modified LLVM IR program as output. This makes it easier to add new passes, use them individually, combine them, inspect what each pass does, etc. LLVM IR has a textual representation (which you can print out or feed as input to LLVM), a bitcode representation which is more space-efficient, and an in-memory data structure representation (which is what the LLVM tools use internally).

Nowadays LLVM is quite popular as a compiler backend for various languages, such as Rust. The great thing about targeting LLVM for code generation is that it implements a large number of code optimizations. In fact, when Clang compiles a C/C++/ObjC program, it emits very naive, unoptimized code – most optimizations happen at the LLVM level. Because of this, any compiler targeting LLVM is able to use those same optimizations without having to do anything in particular (other than emitting code which LLVM is able to optimize – LLVM can't do magic, after all).

An example of a weirder project using Clang/LLVM is Emscripten, which implements an LLVM backend, the part which translates LLVM IR to machine code – except in this case the machine code is JavaScript.

Documentation, caveat, and scope of this post

The LLVM Project website has plenty of documentation (for suitable values of "plenty"), for LLVM and Clang. You should consult those for reference. The mailing lists (there are separate ones for the various projects) also have plenty of useful information (although I usually end up there by searching stuff on ~~Google~~ StartPage rather than going directly to the mailing list). As far as I can tell, people are quite helpful if you ask questions there (but I have never personally asked anything).

It's probably a good moment to warn that LLVM and Clang development moves quite fast, so it's probable some (or most) things in this post will be out of date sooner or later. So, when in doubt, consult the documentation. I will not attempt to duplicate the information in the documentation here, but rather will try to provide an overview of things I had to learn and some gotchas I found during the process. As of now, the current stable version of LLVM is 3.8, although I used initially 3.6 and later 3.7 for most of my Master's (which were the most current stable versions at the times).

Compiling LLVM and Clang

If you want to compile LLVM and Clang from source, you can find information in Getting Started with the LLVM System, Building LLVM with CMake, and Clang – Getting Started.

The main gotchas I found in the process were:

If you compile the source from the tarball (rather than the version on SVN/Git) using the standard ./configure; make; make install, it will compile a release build, not a debug build, even if you specify you want debug symbols to ./configure (and if you intend to write new passes/plugins or modify Clang/LLVM, you probably want a debug build). The solution is to compile with CMake instead of the standard Makefiles which come with the distribution. According to my notes here, the commands I used were:
```
tar -xvf llvm-3.6.1.src.tar.xz 

tar -xvf cfe-3.6.1.src.tar.xz 
mv cfe-3.6.1.src llvm-3.6.1.src/tools/clang

tar -xvf compiler-rt-3.6.1.src.tar.xz
mv compiler-rt-3.6.1.src llvm-3.6.1.src/projects/compiler-rt

mkdir build
cd build

cmake -GNinja -DCMAKE_BUILD_TYPE=Debug ../llvm-3.6.1.src
ninja     # or ninja -j1
```
ninja (package ninja-build in Debian) is a build program similar to make. You can pass cmake the argument -G"Unix Makefiles" instead of -GNinja, and then it will generate classical Makefiles instead of Ninja files and you can run make instead of ninja, but then you don't get to think of this video every time you type ninja in the terminal.
By default, ninja will run multiple compilation jobs in parallel. This is all fine and dandy, and you probably want do to that if you have a multi-core machine, except linking Clang eats a lot of memory, and with four linkers in parallel it will likely eat up all your RAM (I froze my 8GB RAM machine a couple times due to that). This only happens during linking, at the very end of the compilation process. A couple of things help here:
- I very much recommend disabling swap space (sudo swapoff -a) during compilation. This way the compilation process will die when it eats up all your memory instead of freezing your system. After compilation you can reenable swap with sudo swapon -a.
- After the compilation process dies from lack of memory (or alternatively, if you interrupt it manually after your system starts to freeze), you can restart it with ninja -j1, which tells ninja to run a single compilation job. This way, you can run most of the process in parallel, and then restart it with a single job for the memory intensive last part.
- Another thing that may help is using the gold linker, which consumes somewhat less memory than the standard ld linker. It comes with recent versions of GNU Binutils, so you probably already have it on your system under the name ld.gold, but it is not used by default. You can set up a symlink in /usr/bin from ld to ld.gold to use it (you'll probably want to rename the old ld first).

If you just want to use the LLVM/Clang infrastructure, rather than modifying it, you may not need to compile it from source; you can install your distribution's development packages for LLVM and Clang instead (e.g., llvm-3.7-dev and libclang-3.7-dev on Debian). Then you can compile your pass/plugin/whatever against those.

Interfacing with Clang/LLVM

As far as I can tell, the primary interface with LLVM is the C++ API. There are C bindings to it too, but I don't know how common it is to use them. Besides C, there are bindings for OCaml, Python and Go, as well as third-party ones for Haskell, Rust, and maybe others. I can't attest to their stability or completeness (I remember trying to compile the OCaml bindings and failing miserably, but I didn't really try hard enough).

For Clang, there is a number of interfaces, the most stable of which (as in "the one that changes the least across Clang versions") is the C LibClang. There is also the Plugin interface and LibTooling, both of which are based on C++ and provide finer-grained control over the generated AST.

LLVM IR as an interface

If you want to use LLVM from a language for which there are no bindings (and you don't want to write the bindings yourself), an alternative is to communicate with LLVM by parsing and emitting LLVM IR directly, rather than using LLVM's APIs. This is what I did for my Master's software, which I wrote in Scheme. If you intend to take LLVM IR code as input (e.g., for writing a code analysis/transformation), you will have to write an LLVM IR parser, which is somewhat annoying (LLVM IR syntax could be quite a bit more regular, if you ask me), but is not particularly hard. If you don't need to read LLVM IR code, but only emit it (for example, if you are using LLVM as a backend for a compiler), then you don't need a parser, you just need to be able to print valid LLVM IR code. The drawback of this approach rather than using a binding is that you will have an extra overhead from converting your data structures to textual LLVM IR, and then feeding it to LLVM (typically invoked as a separate program (usually the opt tool)), which will then reconstruct it as the in-memory LLVM IR representation, rather than generating the in-memory representation directly and running the LLVM routines as library calls in the same process. On the other hand, that's exactly what a traditional compiler (such as GCC) does when calling the assembler, which takes textual assembly code as input (usually piped into it), so it's not like you're necessarily going to have an unacceptable overhead from this.

If you are writing an LLVM IR transformation in this way, and you want to run it as if it were a pass during compilation of a C/C++ program, you'll have to do some tricks. If you want to run your transformation after all other LLVM IR passes, then your life is simple: you can run clang -S -emit-llvm -o - (your normal arguments) to tell Clang to generate "assembly" code rather than an executable (-S), to emit LLVM IR rather than assembly, to output to stdout rather than a file (-o -), and use your normal compilation flags and arguments. Then you can pipe the LLVM IR output into your program (or make your program call clang and read its output via a pipe), transform it as you wish, and then pipe the result back into Clang with clang -x ir - (more arguments) to finish compilation, where -x ir - tells Clang to read code in LLVM IR language from stdin, and (more arguments) will typically include -o executable-name.

If you need to take the output from Clang before any optimization passes are run, things are slightly more tricky. Even if you run Clang with -O0 some LLVM passes may still run. Worse, if you do that, Clang will not include within the LLVM IR code information needed by the optimization passes, such as type information used by type-based alias analysis (TBAA), which means that if you try to do something like clang -O0 ... | your-pass | clang -O3 ..., the result won't be as optimized as if you had directly run clang -O3 on the source, because clang -O0 will lose information which is needed by some of the optimizations performed by clang -O3. The solution is:

clang -S -emit-llvm -Xclang -disable-llvm-optzns -o - -O3 (your normal arguments)

This will make sure Clang includes all information required by optimizations, but stops Clang from invoking the optimizations themselves. Then you can feed this into clang -x ir - -O3 later and optimizations will work properly. (-Xname option passes the option to the compilation subprocess name. Note also that -x ir will apply to all inputs specified afterwards in the command line, not just the -; if you need to pass, say, an extra C file to be combined with the result of your transformation, then you have to specify -x c filename.)

As far as I know, there is no way to simply intercalate a new external pass (i.e., one implemented as an external program) into the process, like "I just want to run:

clang -O3 -lsomelibrary -o hello hello.c

but with this new pass intercalated"; if you want your "compiler+pass" to accept the same arguments as the standard compiler, you'll have to write a routine or script to do some juggling of the arguments passed to each call to the compiler, to get something like:

clang -S -emit-llvm -Xclang -disable-llvm-optzns -o - -O3 hello.c |
     your-pass |
     clang -x ir - -O3 -l somelibrary -o hello

This is another drawback of using an external program and communicating purely via the IR, rather than writing a real LLVM IR pass (which I guess you could intercalate with some -Xclang option or something, I don't really know).

If you need to run specific LLVM passes on an LLVM IR program, you can use the opt tool. For example, if you want to run the reg2mem pass, you can add opt -S -reg2mem in the pipeline. You can run opt -help for a list of available passes. (-S tells opt to emit textual LLVM IR, rather than bitcode.)

End of Transmission

That's it for today. In the next post, I intend to talk a bit about the LLVM IR language itself.

4 comentários / comments

Lisp without cons cells

2016-05-28 13:14 -0300. Tags: comp, prog, pldesign, lisp, ramble, in-english

Okay, I'm gonna write this down now to distract myself for a while before I get back to Master's stuff.

In a recent post I talked about the problem of cross-process garbage collection, and suggested wrapping objects in a reference-counted container when crossing process boundaries as a possible solution, but I remarked that this would have a large overhead when passing many small objects. The prime example would be passing a linked list, as (at least naively) every node of the list would get wrapped as the elements of the list are accessed.

Now, I particularly cared about this case because the linked list (based on cons cells) is a very prominent data structure in Lisp. And although they have some nice properties (they are conceptually simple, you can insert and remove elements into the middle/end of a list by mutating the cdrs), they also are not exactly the most efficient data structure in the world: half the memory they use is just for storing the "next" pointer (which fills processor cache), whereas in a vector you just need a header of constant size (indicating the vector size and other metadata) and the rest of the memory used is all payload. Also, vectors have better locality. On the other hand, "consing" (i.e., nondestructively inserting) an element into a vector is O(n), because you have to copy the whole vector, and even destructive insertion may require a whole copy every once in a while (when you exceed the current capacity of the vector). I've been wondering for a long time: could you make a Lisp based on a data structure that is halfway between a linked list and a vector?

If we are to allow the common Lisp idioms with this new kind of list, it has to support consing and taking the tail of the list efficiently. (Another possibility is to replace the common idioms with something else. That is much more open-ended and requires more thought.)

What I've been thinking of as of late is roughly a linked list of vectors, with some bells and whistles; each vector would be a chunk of the list. Each vector/chunk would have a header containing: (1) the number of elements in the chunk; (2) a link to the next chunk; (3) an index into the next chunk. Then comes the payload. So, for example, if you have the list (w x y z), and you want to append the list (a b c) on the front of it, you'd get a structure like this (the | separates graphically the header from the payload; it does not represent anything in memory):

[3 * 0 | a b c]
   |
   `->[4 * 0 | w x y z]
         |
         `-> ø

The reason for the index is that now you can return the tail of a list lst without the first n elements by returning a vector chunk with 0 length and a pointer into lst with index n: [0 lst n | ]. If the n is greater than the size of the first chunk (e.g., if you want to drop 5 elements from the (a b c w x y z) list above), we must follow the "next" pointers until we find the chunk where the desired tail begins. This is likely to be more efficient than the cons cell case, because instead of following n "next" pointers, you follow the number of chunks, subtracting the length of the skipped chunk from n each time. In the worst case, where there is one chunk for each element, the performance is the same as for cons cells, at least in number of pointers traversals. (We must only allow empty chunks, like the [0 lst n | ] example, at the beginning of a list, never in the middle of a chunk sequence. This ensures worst-case cons-like behavior. If we allowed empty chunks anywhere, reaching the nth element of a list could require arbitrarily many chunk traversals.)

One problem with this is that now (cdr lst) allocates memory (it creates a [0 lst 1 | ] chunk and returns it), unlike the cons cell case, where cdr never allocates memory (it just returns the value of the cell's "next" pointer). One possible solution is to try to make the result of cdr go in the stack rather than being heap-allocated, to reduce the overhead (the compiler could special-case cdr somehow to make it return multiple values rather than a new chunk, and build the chunk on the fly in the caller if it turns out to be necessary.) Another way around this would be to return a pointer into the middle of a chunk instead of a new chunk. I see two ways of achieving this:

Change the layout of the vector such that it begins with the payload, followed by a special value that marks the end of the payload, followed by the pointer to the next chunk. Now we don't need an index into the next chunk anymore, because we can just make the "next" pointer point into the middle of the next chunk. The drawback is that now finding the end of the chunk requires traversing the whole chunk, rather than looking at the length at the header. This would still be faster than with cons cells, because you're traversing adjacent elements in memory.
Require all chunks to have a fixed size and fixed memory alignment (say, every chunk has 64 bytes and is allocated at an address that is a multiple of 64). Then you can always find the header by zeroing out the last bits of the address.
Third idea that occurred to me now: if all chunks are, say, 16-byte aligned, you can encode an index 0~15 into the lower-order bits of the pointer.

All these have drawbacks. First, you need to know that the pointer you have is a pointer to a cons cell to be able to safely do the pointer arithmetic. (The fixed-size chunks case is simpler to solve: you zero out the pointer and see if it points to a chunk type tag.) Also, pointers into the middle of objects complicate garbage collection (and even more reference counting, I think). Finally, if you fix the size of chunks some of the advantages of using chunks in first place go away; if I allocate a 1000-element list at once, that should get me a single 1000-element chunk.

Or should it? Another problem here is that now garbage collection / reference counting can only collect whole chunks. If you choose your chunks badly, you may end up holding memory for longer than necessary. For instance, if you have a 1000-element list and at some point your program takes tails until it only remains with a reference to the last three elements, and the list was made out of a single 1000-element chunk, now you're stuck with a huge chunk most of which is unused – and more, all the elements in it are held from being collected too. Maybe we'd need a heuristic: if the tail size you want is less than some threshold size of the chunk, the system would return a copy of the tail rather than the tail. This would mess with mutability (you'd never know if the tail list you got shares storage with the original), but maybe immutable lists are the way to go anyway.

The other problem to solve is how to make cons efficient: the classical Lisp cons adds (non-destructively) one element to the front of an existing list, and we don't want to create a new chunk per cons invocation, otherwise the chunks just degenerate into cons cells. One idea I had is to allocate chunks with a least a certain amount of elements. For example, if you create a list with just a, you'd get a chunk with a few blank spaces (and enough metadata to know what is blank and what isn't; this could be an extra header element, or just a distinguished value meaning "blank"): [4 ø 0 | _ _ _ a]. Now, when you cons a new element x into that list, cons would check if there is a space immediately before the a in the existing chunk, and mutate it in place: [4 ø 0 | _ _ x a]. This won't mess with the program's view of the list because so far it only had references to the already filled part of the list. The problem with this is if you have multiple threads wanting to cons onto the same list at the same time: we must ensure only one of them gets to mutate the chunk. For example, say one thread want to cons x onto the list (a), and another thread wants to cons y onto the same list (a). We must make sure that only one gets to mutate the chunk in place ([4 ø 0 | _ _ x a]), and the other one will fail and fall back to either by copying the chunk and then mutating the copy, or by creating a new chunk that points to the old one ([4 [4 ø _ _ x a] 3 | _ _ _ y]; note that outer chunk points into the inner chunk with an index 3, skipping the first 3 elements, including the x added by the other thread). This could have a synchronization overhead. I'm not sure if it would be significant, though, because all you need is a compare-and-swap: "try to write into this space if it is blank". You don't need a lock because you don't need to wait anyone: if this first try fails (i.e., if the other thread got the space first), the space won't be available anymore, so you must immediately fall back to creating a new chunk rather than waiting for anything.

A possible side-effect of all of this is that now vectors as a separate data structure may not be necessary: you just allocate an n-element list at once, and it will largely have the same performance as an n-element vector. Well, unless we make lists immutable, then we may need (mutable) vectors. And lists still have some arithmetic overhead to find the position of the element (because in general we don't know that the list is a single chunk when performing an access, we have to find that out), so vectors may still be advantageous in many circumstances.

Now, back to (trying to) work.

[Update: Apparently I reinvented a half-hearted version of VLists. Also, I didn't mention that, but the Lisp Machine had a feature similar in spirit (but not in implementation) called CDR coding, which used a special tag in cons cells to mean that the rest of the list itself rather than a pointer to it was stored at the cdr place, thus saving one pointer and gaining locality. In the Lisp Machine, every memory object was tagged, so this special tag came more or less for free, which is generally not the case for modern architectures.]

Comentários / Comments

C is in an identity crisis, and some thoughts on undefined behavior

2016-05-19 23:11 -0300. Tags: comp, prog, c, pldesign, ramble, in-english

So, stories about undefined behavior have been making rounds again in my Twitter and RSS feeds (two things I was supposed not to be using, but anyway), which brought me some new thoughts and some other thoughts I meant to blog about ages ago but forgot about them.

The most recent one was this comment on Hacker News (via @pcwalton, via @jamesiry), which presents the following code, which is supposed to take a circular linked list, take note of the head of the list, and walk around the list freeing each node until it finds the node that points back to the head (and thus the end of the list):

void free_circularly_linked_list(struct node *head) {
  struct node *tmp = head;
  do {
    struct node *next = tmp->next;
    free(tmp);
    tmp = next;
  } while (tmp != head);
}

This looks (to me) as good C code as it gets. However, this code triggers undefined behavior: after the first iteration of the loop frees the node pointed to by head, it is undefined behavior to perform the tmp != head comparison, even though head is not dereferenced.

I don't know what is the rationale behind this. Maybe that would make it possible to run C in a garbage-collected environment where as soon as an object is freed, all references to it are zeroed out. (The fact that no one has ever done this (as far as I know) is mere detail. The fact that in a garbage-collected environment free would likely be a no-op is a mere detail too.)

The feeling I had after I read this is that C is in a kind of identity crisis: C allows you to do all sorts of unsafe operations because (I'd assume) it's supposed to let you do the kind of bit-bashing you often want to do in low-level code; at the same time, modern standards forbid that very bit-bashing. What is the point of programming in C anymore?

[Addendum: To be more clear, what is the purported goal of the C language? The feeling I have is that it has moved from its original function as a "higher-level assembly" that is good for systems programming, and is trying to serve a wider audience more preoccupied with performance, but in doing so it is not serving either audience very well.]

And the standards forbid these operations in the worst possible way: by claiming that the behavior is undefined, i.e., claiming that compilers are free to do whatever the hell they please with code perfoming such operations. Compilers keep becoming better and better at exploiting this sort of undefinedness to better "optimize" code (for speed, anyway). Meanwhile, they keep breaking existing code, and opening new and shiny security vulnerabilities in programs. The NSA probably loves this.

At this point, I'm beginning to think that C does not serve its purpose well anymore. The problem is that there seems to be no real alternative available. Maybe Rust can be it, although I don't really know how well Rust does in the bit-twiddling camp (e.g., can you easily perform bitwise and/or with a pointer in Rust? Well, come to think of it, even C does not allow that; you have to cast to an integer first.)

* * *

The other undefined behavior I've been reading about lately is signed overflow. In C, signed overflow is undefined, which means that code like:

if (value + increment < value) {
    printf("Overflow occurred! Aborting!\n");
    exit(1);
}
else {
    printf("No overflow; proceeding normally\n");
    value += increment;
}

is broken, because the compiler is likely to optimize the overflow check and the then branch away and just leave the else branch. I have seen two rationales given for that:

Pointer arithmetic. In the good old times, an int and a pointer used to have the same size. People happily used ints as array indices. Array indexing is just pointer arithmetic, and in some architectures (like x86), you can often perform the pointer arithmetic plus load in a single instruction.

Then came 64-bit architectures. For reasons I don't really get (compatibility?), on x86-64 and other 64-bit architectures ints remained 32-bit even though pointers became 64-bit. The problem now is that transformations that assumed integers and pointers to be the same size don't work anymore, because now their point of overflow is different. For example, suppose you had code like:

void walk_string(char *s) {
    for (int i=0; s[i]; i++) {
        do_something(s[i]);
    }
}

Usually, the compiler would be able to replace this with:

void walk_string(char *s) {
    for (; *s; s++) {
        do_something(*s);
    }
}

which is potentially more efficient. If ints and pointers have the same size, then this transformation is okay regardless of overflow, because the int would only overflow at the same point the pointer would anyway. Now, if ints are supposed to wrap at 32-bits but pointers wrap at 64-bits, then this transformation is not valid anymore, because the pointer version does not preserve the overflow behavior of the original. By making signed overflow undefined, the problem is sidestepped entirely, because now at the point of overflow the compiler is free to do whatever the hell it pleases, so the fact that the overflow behavior of the original is not preserved does not matter.

Now, there is a number of things wrong in this scenario:

Now ints cannot be used to index into an arbitrarily-sized array, only 2³²-element ones at most. In any sane language (I'd argue), the standard integer type would be one encompassing the full range of valid array indices. (Common Lisp explicitly requires fixnums to have at least that range, for instance.)
Unsigned overflow is defined. That's good, but at the same time it seems kinda strange that we have been using signed integers for array indices all this time, to the point that the undefinedness was chosen precisely to enable optimization of this usage, even though, come to think of it, it seems kinda wrong to use a signed integer in this circumstance (if your array takes more than half the memory space, this would be wrong, for instance).
[Update: Speaking of the devil...]

Optimizations based on "real math". The other reason I am aware of for making signed overflow undefined is to enable optimizations based on the mathematical properties of actual mathematical integers. An example is assuming that x+1 > x, for instance (which is what breaks the overflow test mentioned before). Another example is assuming that in a loop like:

for (i=0; i<=limit; i++) { ... }

the halting condition i<=limit will eventually be true, and therefore the loop will finish; if i were defined to overflow, then this loop would be infinite when limit == INT_MAX. Knowing that a loop terminates enables some optimizations. The linked article mentions enabling use of loop-specific instructions which assume termination in some architectures. Another advantage of knowing that a loop terminates is enabling moving code around, because non-termination is an externally-visible effect and you may not be able to move code across the boundaries of an externally-visible event ^{[please clarify]}. Now, something that occurred to me back when I read that post is that it assumes a dichotomy between either treating overflow as undefined, or defining it to wrap around. But there are other possibilities not explored here. I don't necessarily claim that they are feasible or better, but it's interesting to think on what optimizations they would enable or preclude. For instance:

One possibility is to actually make integers behave like their mathematical counterparts. This is what Common Lisp and Python do, for instance, by automatically promoting fixnums to bignums when they exceed the representation capacity of fixnums. In this case, all optimizations based on actual arithmetic are not only possible but safe, because the language integers actually respect those properties. The drawback is that now you have to deal with bignums, and check whether a number is a fixnum or a bignum before performing arithmetic.
Another possibility is to make integer overflow trap, i.e., raise an exception, abort execution, or something along those lines. In this case, an assumption like x+1 > x is true as long as x+1 does not trap. This might not enable the same amount of optimization as claiming undefined behavior for overflow, but at least it enables knowing that a loop terminates (either by hitting the termination condition or by aborting on overflow). Although aborting on overflow is an externally-visible event too, so I don't know if you would be able to perform the aforementioned code moving. Actually, I'd need to see an example of code moving that would be benefited by knowing about termination. I need to research this. [I rememeber seeing some code involving multiple threads and the observable behavior of one thread from another, but (1) I'd have to find it; (2) I'd have to be convinced that the mentioned optimization would be a good idea anyway.]

Now, the problem with the trapping semantics is that you have to check for overflow on every operation. This could be costly, but there are people working on making it more efficient. Seriously, this is the kind of thing that would be trivial if only architectures would help a little bit. Having the processor trap on overflow (either by having special trapping arithmetic instructions, or by having a special mode/flag which would enable trapping) would make this essentially costless, I think. Another nice-to-have would be a set of arithmetic instructions which treated the lower bits of words specially as flags, and trapped when the flags were not, say, all zeros. This could drastically reduce the cost of having fixnums and bignums in the language; the instructions would trap on non-fixnums and invoke a handler to perform the bignum arithmetic (or even find out that the operands are not numbers at all, and signal a dynamic type error), and perform the fast integer arithmetic when the operands had the fixnum flag. Alas, unfortunately we cannot just invent our own instructions, as we typically want to be able to use our programming languages on existing platforms. We could try to lobby Intel/AMD, though (as if).

(At this point I'm not really thinking about semantics for C anymore, just about the possible semantics for integers in a new programming language. Even if, say, Clang incorporated an efficient mechanism for trapping integer overflow, the standard would still say that signed overflow is undefined, so I'm not sure there is much hope for C in this situation.)

Computers, languages, and computer languages. Às vezes em Português, sometimes in English.

Posts com a tag: prog

2018-08-28 23:28 -0300. Tags: comp, prog, pldesign, lisp, in-english

Semantic clarity

Equality

Mutable boxes are useful on their own

Conclusion

2018-08-01 22:19 -0300. Tags: comp, prog, lisp, pldesign, in-english

Some context

The idea

Putting it back in the box

Nested data structures and sharing

Refactoring code to change data mutability

An alternative: mutability as a flag

(In)conclusion

2018-02-19 22:35 -0300. Tags: comp, prog, lisp, scheme, web, in-english

Interactive development

(web server)

Rendering HTML

Parameters – or maybe not

Performance tuning

Conclusion

2017-10-03 15:10 -0300. Tags: comp, prog, lisp, scheme, in-english

Libraries/modules

Missing things

Records

Missing things

Exceptions

Conclusion

2017-10-01 22:11 -0300. Tags: comp, prog, lisp, scheme, in-english

2017-09-14 19:34 -0300. Tags: comp, prog, lisp, scheme, pldesign, ramble, in-english

2017-01-02 22:54 -0200. Tags: comp, prog, scheme, lisp, em-portugues

Bytecode

Debugabilidade

(Não-)minimalismo

Módulos

Threads

Orientação a objetos

Setters

Bibliotecas

Unicode

Conclusão

2016-06-17 04:10 -0300. Tags: comp, prog, llvm, mestrado, in-english

The LLVM Project

Documentation, caveat, and scope of this post

Compiling LLVM and Clang

Interfacing with Clang/LLVM

LLVM IR as an interface

End of Transmission

2016-05-28 13:14 -0300. Tags: comp, prog, pldesign, lisp, ramble, in-english

2016-05-19 23:11 -0300. Tags: comp, prog, c, pldesign, ramble, in-english

Main menu

Recent posts

Recent comments

Tags

Elsewhere

Quod vide

Posts com a tag: `prog`

`(web server)`