Why (not) boxes? :: Elmord's Magic Valley

Why (not) boxes?

2018-08-28 23:28 -0300. Tags: comp, prog, pldesign, lisp, in-english

In the previous post I discussed an idea for dealing with mutable data in a Lisp-like programming language by using mutable boxes and immutable everything-else, with a bunch of optimizations. One of the usual suspects asked me what was the advantage of this scheme over just declaring things const as one would in a language like C/C++. At first I did not have an answer ready. This is one of those situations where you are so stuck in your own perspective that some questions don't even occur to you.¹ The immediate answer was that I was thinking in the context of a dynamically-typed language, so an immutability declaration like const² was out of the picture. But there is more to it.

If I were to give a really complete answer, I would have to begin answering why I want dynamic rather than static typing. I started this post originally by trying to explain exactly that, but there is way more to say about it than I have the energy to do right now. For now let's just take for granted that I'm designing this framework in the context of a dynamically-typed languaged.

But I could still have a dynamically-typed equivalent of the const declaration: just shift the constness to the dynamic type of the object. So vectors and other composite objects would have a flag indicating whether they are mutable or not. I discussed this possibility at the end of the previous post, but I also commented I didn't find that solution as satisfying. But why not?

Semantic clarity

One thing I like about the mutability-as-boxes model is that it seems to makes it easier to think "equationally" about mutability: instead of mutability being an inherent property of vectors (and other data structures), mutability is an 'embellishment' which can be added to any data structure (by putting it into a box), and it seems more or less obvious (to me, anyway) what will be the expected behavior when adding or removing mutability from something (or rather, when adding or removing something from mutability). For example, if vectors were inherently mutable or immutable, then I have to know what operations exist to convert one type into the other, and what happens to the original (if I make an immutable vector out of the mutable one, will the new vector reflect further changes in the original (like a C const reference), or is it an independent, never-changing copy?). Of course, when you learned the programming language you would learn about those details and be done with it, but the boxes model seems to suggest the answers by itself: if I extract a vector (immutable, like all vectors) from a mutable box, I would expect that further changes to the box contents won't affect the vector I just extracted (because changing the box contents means replacing one vector with another, not changing the vector itself); and if I have an (immutable) vector v around and I put it inside a box, I would expect that further changes to the box contents won't affect my original v (for the same reason: if I change the box contents, I'm replacing v with a new vector, so it's not v anymore). In fact, in the previous post I have mostly wondered about how to implement the model efficiently, rather than what the correct behavior of each operation should be, because that part did not seem to raise any questions.

There is a flip side to this: although it is easier (to me, anyway) to think about the semantics of the mutability operations, the optimizations required to make it work well make it harder to think about the performance of the written code. That's the sufficiently smart compiler problem: a sufficiently smart compiler (or runtime) can turn something that would in principle be expensive into something fast, but then you change a small thing in your code in a way that the optimization cannot handle, and suddenly the performance of your program drastically changes. You end up having to know which cases the implementation can optimize, which makes up for the semantic simplicity. Unless you can make sure the optimization will handle all 'reasonable' cases (for varying values of 'reasonable'), this can be a problem.

Equality

Object equality is a more complicated concept than one might expect. There are multiple notions of equality around – some languages have multiple operators for different kinds of equality (for example, == and === in JavaScript, or eq?, eqv? and equal? in Scheme). One type of equality that's given particular prominence in Scheme is the idea of object equivalence, embodied in the eqv? predicate: two objects are equivalent iff no operation (other than the equality predicates themselves) can tell them apart. Mutability is particularly important for object equivalence: two mutable objects (say, two vectors [1 2 3] and [1 2 3]), unless they are one and the same object in memory, are never equivalent, even if they have the same contents, because you can tell them apart by modifying one and seeing if the other changes as well (i.e., they might cease from having the same contents in the future). On the other hand, two immutable objects of the same type and with the same contents are equivalent, because there is in principle no operation that can tell them apart. (Of course the implementation might provide a function to return the address of each object in memory, which would allow us to tell the objects apart. But let's not concern ourselves with that.) Another notion of equality is that of equality of contents, embodied in the equal? predicate: two objects are equal if they are of the same type and have the same contents, even if they are mutable.

When you have a lookup data structure such as a dictionary, you have to decide which kind of equality you will use to compare the keys in the data structure with the key being looked up. Scheme hash table implementations typically require one to specify the equality operator explicitly, because strings are mutable, so you want equal? if your keys are strings, but in other cases you may want to distinguish objects that are not equivalent in the above sense, so you want eqv?.

But if you make strings and vectors immutable, you can compare them with eqv?, and the cases where you want to actually use equal? for hash table lookup mostly go away. And you generally don't want mutable keys in your hash tables anyway (because if you mutate the object that was the original key, typically your hash table stops working because now the key changed but is still hashed under the old key's hash); we tolerate that in Scheme only because strings (and lists, and vectors) are mutable and we want to be able to use them as keys. So if mutability is isolated by boxes, now we can make hash table lookup use object equivalence (eqv?) by default and not worry about explicitly choosing the right predicate for hash table lookup. (Having a sensible default predicate for hash table lookup is important, among other cases, if you want to have literal syntax for hash tables, i.e., if you want to be able to write a literal hash table like {"foo": 1, "bar": 2} in your code without having to say "hey, by the way, the keys are compared by equal? in this case".)

You can still use boxes as keys in a hash table. But since boxes are mutable, a box is only eqv? to the very same object in memory, so you have to use the same box object as the key when you store a value in the table and when you look the value up. This is actually useful if you want to store information about the box itself rather than the contents. But what if I want to look up based on the box contents? Well, then you unbox the contents and look it up! Which expresses intent far better, if you ask me. (This is not entirely equivalent to an equal?-based hash table lookup because you may have boxes inside boxes which would all have to be unboxed to achieve the same effect. Not that this is a very common use case for hash table keys.)

Could we not do the same thing with the mutability flag model? In that model, eqv? would check the mutability flag; objects with the mutability flag on would only be equivalent if they were one and the same, and objects with the mutability flag off would be compared for contents. It would work, but would not be as pretty, if you ask me. However, as long as mutability is easily visible (for example, mutable objects would be printed differently, say like ~[1 2 3] if the mutability flag is on), it could work fine.

Mutable boxes are useful on their own

In Scheme, variables are mutable: you can use (set! var value) on them to change their values. The problem is, variables are not first-class entities in Scheme, so you cannot pass them around directly. So if you want to share mutable state across functions, you have to put the variable in some place where all the interested functions can see it; I remember once having moved subfunctions into another function just so they could all share the same mutable variable. Alternatively, you can create a mutable data structure and pass it around to the relevant functions – and the box is the simplest mutable data structure you can use, if all you want is to share one single mutable cell around. So mutable boxes are useful even if you don't intend to make them the one single source of mutation in the language. And since they are already there, why not just go ahead and do just that? (I am aware that "why not?" is not exactly the most compelling argument out there.)

Another case: cons-cell based lists are somewhat annoying to use with mutation. Suppose you have a list in a variable x, and you pass it around and it ends up in a variable y in another part of the program. If you append things to the tail of x by mutating the tail, both x and y will see the new items, because the tail is reachable from both x and y. But if you append things to the front of x, y won't see the new items in the front, because the new elements are not reachable from the old list tail. If you put the whole mutable list inside a box and passed the box around, both x and y would have the same view of the mutable object. And if you took the list out of the box and put it on another variable z, it would become immutable, so either you see the same changes to the list as everyone else, or you isolate yourself from all subsequent mutations, but it will not happen that you will see some changes to the (tail of the) list and not others (to the front).

Conclusion

I hope I have been able to show why I find the mutability-as-boxes model appealing. I'm not saying it does not have problems (on the contrary, I have already said it has problems), I'm just trying to show what is the point of the whole thing.

_____

1 This is kinda disturbing when you think about it. How many other questions am I not asking?

2 Well, const is not really about the immutability of the data, it's more about the permission to modify a piece of data from a given reference. That is, if a function is declared as taking a const char * argument, that means that the function is not supposed to modify the data pointed to, but it does not mean that the region will not be changed through other references. In other words, it's about requiring something from the user of the reference, but not about providing a guarantee to the user of the reference. A true immutability declaration would both forbid the user from modifying the data and ensure to them that the data will not change during use. Immutability in a language like Rust works like this (except immutable is the default, and mutability is explicitly declared).

Elmord's Magic Valley

Computers, languages, and computer languages. Às vezes em Português, sometimes in English.

Why (not) boxes?

2018-08-28 23:28 -0300. Tags: comp, prog, pldesign, lisp, in-english

Semantic clarity

Equality

Mutable boxes are useful on their own

Conclusion

Comentários / Comments (0)

Deixe um comentário / Leave a comment

Main menu

Recent posts

Recent comments

Tags

Elsewhere

Quod vide