I've been around lately with an idea for handling mutation in a new Lisp-like programming language. Most of these ideas are probably not new – in fact, while doing a little research I've found out about Clojure's transients, which embody some of the same ideas – and the parts that are possibly new are not necessary good. But I want to write this down for future reference, so there we go.
DISCLAIMER: This is one of those programming language design posts exploring a bunch of ideas and reaching no conclusion. Read at your own peril.
The problem with mutation is when the mutable data is shared with other parts of the program – especially when you don't know what parts of the program share the same data. For example, suppose you call a method
blog_post.get_tags(), and it returns you a list
["comp", "prog", "lisp"] – can you mutate this list? For example, if I were to sort it, or remove elements from it, can I do it in-place, or I would be mutating a list used internally by the
blog_post object and thus inadvertently affecting other parts of the program? Without looking at the method's source code, we don't know. If we wanted to be sure not to break anything, we would have to make a copy of the list and change the copy instead.
What if I am the person writing the
get_tags() method? Should I always return a new copy of the list, wasting some memory and cycles but ensuring that whoever calls my function won't be able to affect the internal fields of the
blog_post? Or should I always return the same list object, thus avoiding a new allocation but relying on the caller to do the right thing?
Now consider the strings inside that list. If I were to convert them to upper-case, should I do it in-place, or copy them first? In a language like C, this is the same problem as before: you have to know whether
get_tags() gives you a copy of the original strings (which you can freely modify), or the internal strings used by
blog_post (which you should not modify). But in a language like Java or Python, this problem does not come up: since strings are immutable in those languages, the only way to 'change' them is by making a new string, so modifying them in-place is not an option. On the other hand, the writer of the
get_tags() method can now happily return the internal strings of the
blog_post object, since they can be sure the strings cannot be modified by external code.
If you make all data structures immutable, you eliminate this problem – and that's the purely-functional approach, taken by languages like Haskell. Clojure is similar in making the core data types (such as lists, vectors and hashmaps) immutable, and having controlled forms of mutability. In traditional Lisps like Scheme and Common Lisp, on the other hand, most composite data types (including lists, vectors and strings) are mutable. The standards of those languages are careful in describing which functions always return freshly allocated data and which return values that may share parts with the function's arguments. This is basically part of the contract of those functions, which you have to know whenever you want to mutate values generated by them. The situation in traditional Lisps is somewhat aggravated by the fact that linked lists may share a tail: two lists
(1 3 7 5) and
(2 7 5) may actually share the same cons cells for the
(7 5) part. In a mostly functional setting, this is okay, but if we want to mutate anything, we have to be extra careful not to be inadvertently changing something else. In this example, sorting the second list in place may end up messing the first list.
What I'm interested in is finding a middle ground between full immutability and full mutability. I want to be able to return immutable data from functions, so I can know the consumers of that data won't inadvertently change it, and I also want to be able to create mutable data which can be modified in place. It would also be nice to be able to use mutable data for temporary processing and make it immutable after we are ready. And I want to be able to tell at a glance if I'm dealing with mutable or immutable data.
So here is the idea…
First, we make all basic composite data types (lists, vectors, dictionaries, strings, etc.) immutable. Then we add a single mutable box type. Values of the box type have a single mutable field. This idea is at least as old as ML's
ref type, so nothing new so far. I will use the notation
&val to mean a box containing
val, and the expression
(set! box val) changes the contents of the box
val. I will also use
@ (read at) for the indexing function, so
(@ vec idx) means the
idxth element (starting at 0) from vector
(@ box) with no indices means to extract the box's contents.
So now we can make a mutable cell with an immutable vector inside, e.g.,
&[1 2 3]. We cannot mutate the vector directly, but we can replace the whole vector with another immutable vector. That may be elegant and all, but it's not as convenient as a mutable array, nor as efficient. There are some tricks we can play here, though.
The first trick is to make the assignment operator (
set!) recognize vector indexing as its first argument, so if
v is the vector-containing box
&[1 2 3], we can write
(set! (@ v 0) 42) to replace the vector
[1 2 3] with the vector
[42 2 3] inside the box. It looks like we are mutating the vector's first element, but actually we are replacing the whole original immutable vector with a new immutable vector with a different element at position
This gets us convenience, but it's still inefficient: if I write a loop to mutate all elements of the vector, it will generate a fresh new vector on each iteration. But then comes the second trick: how can we tell the difference between a mutable cell with an immutable vector inside from an actual mutable vector? If we make the difference invisible to the programmer, then we can mutate the vector in-place as an optimization. So
(set! (@ v 0) 42) syntactically looks like mutating a vector element, semantically means replacing the whole vector with a new one, but implementationally actually works by mutating the vector element anyway. I'm not sure about the wisdom of this double layer of self-cancelling illusions, but let's explore this idea further.
Let's call the naive implementation using a mutable box with an immutable vector inside, well, the naive implementation. And let's call the implementation which underlyingly uses a mutable vector to represent the box+vector combination the smart implementation (with the full understanding that it may actually be too smart for its own good, or maybe not smart enough to make this idea work well).
The most basic operation you can do with a box is extracting the contents. In the naive implementation, that's just returning the value inside. In the smart implementation, we must simulate this by copying the current contents of the mutable vector into a freshly allocated immutable vector and returning that. A user can then observe the difference between the two implementations by taking the contents of the same box twice and checking whether the results are
eq? to each other, i.e., whether they are the same object in memory.
It seems to me that the solution to this problem is to ditch object identity for immutable objects from the language, i.e., get rid of the lower-level
eq? operation (Python's
is), or at least relegate it to a library of lower-level operations. Immutable objects should only be compared by its contents, not by identity: if I compare
[1 2 3] with
[1 2 3], it should not matter whether they are separate objects in memory or not: they have the same contents and that's what matters. The only way to tell two distinct objects with the same contents from each other (apart from
eq?) is by mutating one of them and seeing if the other changes as well; but if the objects are immutable, this distinction disappears.
A possible optimization to reduce the amount of copying when extracting a box's contents is to return the actual underlying vector, but mark the box as copy-on-write, i.e., we postpone the copy to the next time we need to mutate the vector inside the box. If the box is not mutated afterwards, the vector is never copied. The problem with this may be performance: we need to check the copy-on-write flag before every change to the vector, and the whole point of these optimizations is performance. Sure, we avoid a copy, but we slow down every write to the vector. This is aggravated by the fact that this flag must be synchronized across threads, lest we end up with two threads making new copies and clobbering each others' view of the box.
Speaking of which, doesn't thread synchronization throw this whole idea out of the window anyway? Extracting the contents of a box must be an atomic operation, but someone might be mutating the underlying vector while we are copying it into a new immutable vector to return it. This is okay as long as we can guarantee that the resulting copy represents one possible atomic state of the box at the time of the extraction, but consider the following scenario:
&[1 2 3 4 5].
(def v (@ b)). The implementation starts to copy the elements into a new immutable vector.
1, thread 2 evaluates
(set! (@ b 0) 10). Now
&[10 2 3 4 5]. Thread 1 keeps on with its copying.
(set! (@ b 4) 50). Now
&[10 2 3 4 50].
[1 2 3 4 50].
[1 2 3 4 50]has never been a state of
b: the only states
[1 2 3 4 5],
[10 2 3 4 5], and
[10 2 3 4 50]. Thus, the illusion that the box contains an immutable vector is broken.
To avoid this problem, the implementation would have to acquire a lock (or use some other form of thread synchronization) when extracting the contents of a box, thus slowing down what should be a cheap operation. The copy-on-write solution avoids the copy incoherence problem, because the copying happens from the now-immutable vector to the new mutable one, and not the opposite, so we know that the origin will not change mid-copy. But as we have seen, we need to ensure synchronization of the copy-on-write flag, so it's pretty much the same.
Maybe this synchronization requirement is a good thing: maybe we want copies to be atomic anyway, and this way the semantics of the language guarantees that. But maybe this is an unnecessary overhead most of the time.
Even if we go with the copy-on-extract solution, we can avoid copying in the case of extracting the object from the box and then discarding the box (e.g., if we want to create a mutable vector, do a bunch of mutating operations on it, and then make it immutable) by providing an
(unbox! b) operation which returns the contents and sets the box contents to
nil (or whatever other value to indicate that the box is "empty"). Because we know the vector will not be mutated again, we can just return the underlying vector and call it immutable. This is basically what Clojure's
persistent! operation does (though I didn't know that when I had this idea).
Let's consider some other problems and optimizations.
What about sequences of transformations? For example, suppose I implement a
filter function, which takes a predicate function and a vector and returns a new vector containing only the elements for which the predicate function returns true, like this:
(def (filter pred vec) (let ([result &]) ;; Collect satisfying items in the mutable result vector... (for [item in vec] (when (pred item) (push! item result))) ;; And then return the contents as an immutable vector. (unbox! result)))
What if I want to, say, filter a vector and then reverse it? If
filter is written like this, I get an immutable vector back, so I would have to copy it into a mutable vector again just so I can reverse it. If only
filter had not called
unbox! at the end, I could have reversed it in-place without a new allocation! But if I don't
unbox!, I will have to always manually unbox the result when I want to, and most of the time I do want an immutable result.
There is a possible trick to help us here: if we unbox a value just to immediately box it back again, we can actually keep using the same underlying storage with no copying. The problems with this optimization are: (1) We must be able to know that no other references to the object have been created between the unboxing and the re-boxing, and basically the only way to do this is with some sort of reference counting. Reference counting has its share of problems (cyclic data structures never reach count zero, we need to synchronize count updates across threads), so relying on an optimization which requires us to use reference counting is not good. (2) We need to make sure the reference count does not inadvertently rise above 1, which would preclude the optimization. Since there may be more going on under the scenes in the compiler/interpreter than reaches the eye, this would be an unreliable optimization, that sometimes does not trigger for non-obvious reasons.
An alternative to use reference counting would be to do this analysis at compile-time, either through some form of escape analysis (which is hard to do across functions), or some crazy type system with uniqueness types (like Rust's borrow checker), which does not mesh well with my goal of a dynamically typed language.
What if I put nested data inside a box, like
&[[1 2] [3 4]]? Should the sub-vectors become underlyingly mutable too? If so, when should we stop recursing through the structure, which could contain other composite data types in it? Should we do it lazily, using copy-on-write as we mutate the inner vectors? The implementation of this can get tricky. If I access
(@ b 0), should I get an immutable
[1 2], or should I get a mutable
&[1 2] which shares the same memory as the original element, so that mutations on the
&[1 2] vector are reflected on the
&[[1 2] [3 4]] one?
I'm too tired to even analyse the possibilities right now.
Suppose I have a data structure with a bunch of immutable fields, say,
(Person name age) and I decide I want to make the
age field mutable. In our conceptual framework, this means wrapping the value of the field in a mutable box, e.g.,
(Person "Hildur" &22), i.e., the field itself remains immutable, but its value is a mutable cell. That looks nice, and makes all mutability readily visible, but it also means that we have to change all code using the
age field to extract the value from the box, even code that does not mutate the value.
Maybe this is a good thing: if the code was written under the assumption that the value does not change, maybe it is good that we have to revise everything when we turn it mutable. On the other hand, this makes it harder to try things out in code and run it to see what happens, and for I long time I have defended the ability to run incomplete (and even wrong) programs while prototyping. However, I also want to be able to run optional static checks, and it's easier to do so when code is explicit about its intentions. So there we go.
An alternative to the mutability-as-boxes approach is just to make all composite data structures carry a 'mutable' flag. We can still use the notation
&[1 2 3] to mean a vector with the mutable flag on. We can provide an operation like Clojure's
persistent! which turns the mutability flag of an object off. An operation to turn it back on would be more dangerous, but might be useful for debugging purposes (though it's the kind of thing you can be certain will be abused if added to the language (though Lisps have traditionally always preferred to empower the user rather than impose decisions on them)).
In this scenario, the semantics of
(set! (@ v 0) 42) is to actually modify the vector in-place, so we don't need the double illusion trick. If we want to return an immutable version of a mutable data structure without losing the mutability, we have to explicitly copy it. Perhaps more descriptive of intention, we may have a non-destructive
persistent operation which returns an immutable version of a value, which may or may not actually involve a copy (it may actually use copy-on-write behind the scenes). Thread synchronization has to be done explicitly, otherwise you assume the risks of getting a partially-modified copy. This is somewhat unsatisfying, but inconsistencies across threads could happen with boxes anyway whenever you had to work with more than one box, so a better solution to synchronization is needed anyway.
This idea of using mutable boxes + immutable data structures + optimization tricks had been haunting me for a week and seemed really appealing at first, but thinking more deeply about it, it does have its share of problems. Maybe it's a cool idea anyway, maybe not. I have to think more about it. Said research will require more and abundant funding.
Today the acquisition of GitHub by Microsoft was announced. Lots of people (myself included) have expressed their concerns about the consequences of this acquisition. Some people worried that Microsoft would end up ruining GitHub the same way it did to Skype. Some worried that it could be part of a traditional embrace, extend, extinguish move to kill off competition. Some worried that they could change the terms of service to grant themselves more rights over the code that is hosted on GitHub (that was my first worry, and I've seen it reflected elsewhere on Mastodon). At the same time, many people have countered that the new GitHub CEO will be a person friendly to open-source (Nat Friedman, founder of Xamarin), that Microsoft's stance towards open source has changed a lot in the last years, and that probably not much will change in the GitHub service.
I have reflected for a while about what should I do, and I decided that I will move my projects away from GitHub.
Here is my take on this: It does not matter whether Microsoft will ruin GitHub or not. To keep my personal projects on GitHub would be endorsing Microsoft, and I don't want to do this. It's not (only) that the company has a terrible track of mistreating the FOSS community. It's not (only) about what the company has done in the past. It's that in the present Microsoft still regularly mistreats its users, by pushing spyware on them (in the form of Windows 10 telemetry), by pushing ads on them (in the Windows 10 equivalent of the start menu), by de-decentralizing Skype and thus facilitating surveillance, by taking control away from users over their own systems (e.g., by forcing updates upon users and making it really difficult to disable automatic reboot), and so on. (This article has a nice list of bad things Microsoft has done in the past, which should be kept in mind too, but my point is that even if you ignore the past and look just at the present, there's plenty of reasons to dislike Microsoft.)
I don't want to endorse or support this company in any way, and I feel that keeping my projects there (and by implication requiring other people to use their services if they want to create issues, contribute code, etc.) is a form of endorsement. Therefore my decision is to move away.
I won't do this immediately, though. Although I have already decided that I will move away from GitHub, I still haven't decided where I will move to. The acquisition is not yet complete, and I will take my time to consider my options – whether I will move to some other code hosting service such as GitLab or BitBucket, or whether I will host my own server (and if so, which server software I will use), or whether I will go bare-bones and just have plain Git repositories and home pages for each project. I'm also wondering if this might be a good time to move away from Git and give Fossil a try.
I also may keep my GitHub account (assuming they don't make radical changes to the terms of service or the nature of the service), so I can open and comment on issues of projects still hosted there, and so I can point to whatever new place I decide to host my stuff on in my profile.
Leitouros e leitouras,
Assim como eu me livrei da UFRGS, a UFRGS se livrou de mim, e nas próximas semanas a minha página pessoal antiga (inf.ufrgs.br/~vbuaraujo) deixará de funcionar.
Peço humildemente àqueles que têm links para este blog que os atualizem para o endereço atual (https://elmord.org/blog/).
Meu e-mail da INF seguirá funcionando (ou assim me hão prometido).
Modern Android versions come with a security (?) mechanism which requires the user to log in with a Google account previously used on the phone after a factory reset.
Picture source: Softpedia
There are dozens of different instructions for how to bypass this protection on the web, but none of the ones I tried to follow worked on the phone in question, an Asus ZenFone Go (ZB452KG). There are probably multiple firmware versions around for this phone model alone, so what works on one phone may not work on another. However, the various tutorials I found helped me discover a method that did work on my phone.
Ingredients: You will need a non-Google e-mail account to add to the phone, as well as the IMAP and STMP settings for your e-mail provider. You can remove this account after we're finished.
The goal: The goal of this exercise is to disable the Google apps in the phone settings: if they are disabled, the setup wizard will skip the account validation step. The trick is how to get to the phone settings before unlocking it.
Follow the setup wizard until you get to the account validation screen. I recommend that, at the beginning of the wizard, you choose English as language, so that the Google app names all begin with "Google" and so are listed all together and are easier to find.
Select the e-mail field to open up the keyboard, tap the "⋮" button, select "Share", and "GMail".
Click "Skip" at the initial GMail screen. GMail will ask you to add an account. Configure your non-Google account here.
When you get to the compose message screen, tap the "⋮" button on the top right, choose "Settings", hit "⋮" again, and "Manage accounts". It will say "You're about to go to the Settings app …"; hit "Continue". Now we're at the phone settings!
Select "Apps", go to the "All" tab, and select the "Google Account Manager" app. Select "Force stop", and then "Disable". (You'll be asked to confirm these actions.)
Note: I actually disabled a bunch of other apps with names beginning with "Google" (such as "Google Services Framework") when I did this; I think this is not strictly necessary, but you can try it if disabling just "Google Account Manager" does not work.
Go back to the beginning of the setup wizard and follow the normal setup steps. Now at the point it would ask you for a Google account, it should ask for your name and surname instead. If this happens, it worked! Just finish the wizard and you'll have your phone unlocked.
If it does not work, try to go to "Settings > Apps > All" again and force stop the "Setup Wizard" applications (there are two of them on my phone), and then try to restart the wizard (you may have to turn your phone off and on again for that).
If you wish, go back to Settings and re-enable the apps you disabled. You can now also remove your IMAP e-mail account if desired.
I did the development using Geiser, an Emacs package for interactive Scheme development. It can connect with a running Scheme, and you can evaluate code from within Emacs, have completion based on the currently available bindings, etc.
The coolest part of this is being able to reevaluate function and class definitions while the program is running, and seeing the effects immediately. In this sense, GOOPS (the Guile Object Oriented Programming System, inspired by the Common Lisp Object System) is really cool in that you can redefine a class and the existing instances will automatically be updated to reflect the new class definitions (unlike, say, Python, in which if you redefine a class, it's an entirely new class, and instances of the old class will have no relationship with the new class).
One thing I realized is that you have to adapt your program a bit if you want to make the most of interactive development. For example, in Guile's web framework, there is a procedure (run-server handler), which starts up the web server and calls handler to serve each request. My request handler was a procedure named
handle-request, so my code called
(run-server handle-request). The problem is that this way
run-server will be called with the value of
handle-request at the time we started the server, and subsequent redefinitions of
handle-request while the server is running will have no effect on the running server. Instead, I ended up writing something like:
(start-server (lambda (request body) (handle-request request body)))
I.e., instead of calling
handle-request directly, the server will call the anonymous function which, when called, will call
handle-request. In this way, it will use the value of
handle-request at each time the anonymous function is called, so it will see changes to the definition.
Another thing to take into account is that there may be some bindings you don't want to redefine when you reload a module, e.g., variables holding program state (in my case, a variable holding the blog object). For that, Guile provides a
define-once form, which only defines a variable if it doesn't already exist.
One gotcha I encountered was when using parameters, the Scheme world equivalent of dynamically-scoped variables. Parameters in Guile have per-thread independent values, and since the web server and the REPL run in different threads, they may see different values for the same parameter. I ended up not using parameters anyway for other reasons (more on that later).
Guile comes with a web framework of sorts, though it is pretty bare-bones. (Actually the main thing I missed in it was having to parse the request query and POST data by hand. At least it does provide a function to percent-decode URL components.) It has a philosophy of pre-parsing all headers into structured data types as a way of avoiding programming errors. It's an interesting idea; I have mixed feelings about it, but I think it's a valid idea to build a framework on (after all, if you're going the trouble of making a new framework, you might as well try some new ideas rather than creating yet another run-of-the-mill web framework).
You start the server by calling
run-server with a callback (as mentioned above). Whenever a new request comes, the callback will be called with a request object and the request body as a bytevector. The callback must return (at least1) two values: a response object and a response body. Guile allows some shortcuts to be taken: Instead of a response object, you can pass an association list of response headers and the framework will automatically make a response object out of it. The response body may be either a string (which will be automatically encoded to the proper encoding, usually UTF-8), a bytevector, or a procedure; in the latter case, the procedure will be invoked with a port as an argument, and whatever you print to that port will be sent as the response body.
Guile comes with support for SXML, an S-expression based tree representation of XML. This means you can write things like:
(sxml->xml `(div (@ (class "foo")) "Hello, world"))
and it will emit
<div class="foo">Hello, world</div>. The nice thing is that strings appearing in the tree will be automatically escaped approprietely, so you don't have to worry about escaping (or forgetting to escape) data that may contain special characters, such as
That very feature was at first what led me not to want to use SXML, appealing though it was, to render Blognir pages. The reason is that post contents in Blognir come raw from a post file; I didn't want to parse the file HTML contents into SXML just to dump it again as HTML in the output2, and I saw no way to insert a raw string in the middle of an SXML tree bypassing the escaping in the output. So I began this adventure by printing chunks of HTML by hand. At some points I needed to escape strings to insert them in the HTML, so I wrote a small wrapper function to call
sxml->xml on a single string and return the escaped string (by default
sxml->xml prints to a port rather than returning a string).
When I got to the post comments form, where I have to do a lot of escaping (because all field values have to be escaped), I decided to use
sxml->xml for once, for the whole form, rather than escaping the individual strings. I found it so nice to use that I decided to look up the source code for
sxml->xml to see if there wasn't a way to insert raw data in the SXML tree without escaping, so I could use it for the whole page, not just the form. And sure enough, I found out that if you put a procedure in the tree,
sxml->xml will call that procedure and whatever it prints is emitted raw in the result. This feature does not seem to be documented anywhere. (In fact, the SXML overview Info page says (This section needs to be written; volunteers welcome.). Maybe that's up to me!) By that point I had already written most of the rest of the page by printing HTML chunks, and I did not go back and change everything to use SXML, but I would like to do so. I did use SXML afterwards for generating the RSS feeds though, with much rejoicing.
Parameters are used for dynamically scoped values. They are used like this:
;; 23 is the initial value of the parameter. (define current-value (make-parameter 23)) (define (print-value) (display (current-value)) (newline)) (print-value) ;; prints 23 (parameterize ([current-value 42]) (print-value)) ;; prints 42 (print-value) ;; prints 23 again
My original plan was to create a bunch of parameters for holding information about the current request (
current-post-data and the like), so I wouldn't have to pass them as arguments to every helper request handling function; I would just bind the parameters at the main
handle-request function, and all functions called from
handle-request would be able to see the parameterized values.
The problem with my plan is that instead of returning the response body as a string from
handle-request, I was returning a procedure for the web framework to call. By the time the procedure was called,
handle-request had already finished, and the
parameterize form was not in effect anymore. Therefore the procedure saw the parameters with their initial value rather than the value they had when the procedure was created. Oops!
Because closures don't close over their dynamic scope (that's kinda the whole point of dynamic scope), parameters ended up not being very useful for me in this case. I just passed everything as, ahem, parameters (the conventional kind) to the subfunctions.
Despite its crude design, the original Blognir is pretty fast; it takes around 1.7ms to generate the front page in my home machine. I got Parenthetical Blognir at around 3.3ms for now. I'm sure there are still optimizations that can be done, and I may still try some things out, but right now I don't have any pressing need to make things faster than that.
I did learn a few things about optimizing Guile programs in the process, though. I used the
ab utility (package apache2-utils on Debian) to measure response times, and Guile's statistical profiler to see where the bottlenecks were. I did not keep notes on how much impact each change I did had on performance (and in many cases I changed multiple things at the same time, so I don't know the exact impact of each change), but I can summarize some of the things I learned.
This is standard advice and applies to every programming language, but do use the profiler to find out performance bottlenecks. They often come from unexpected places, and it is far better to see the numbers than trying to guess where the bottlenecks might be.
Some things are slow because they take up CPU themselves, and some things are slow because they generate garbage, so your program spends more time in the garbage collector. The procedure currently taking the most time in Parenthetical Blognir is
%after-gc-thunk, and I still have not analysed it more carefully to determine where this is coming from.
Guile has two versions of
format (the Guile counterpart to
printf): one from the
(ice-9 format) module, written in Scheme, which supports various format specifiers; and a bare-bones one, written in C, available under the name
simple-format, which only supports the
~s format specifiers with no qualifiers.3
simple-format is much faster than the
(ice-9 format), so if you only need the bare
~s formats and your program makes heavy use of them, I recommend using
My program was spending quite a bit of time on
read-string to read post contents. It only reads the contents of a post to dump it into the response, and
read-string will waste some time decoding the input from UTF-8 to Guile's internal string type, dynamically growing a string to the right size to fit the post contents, just to immediately decode it back to UTF-8 and discard the string as garbage. I found it more efficient to write a little function which takes two ports (the open post file and the open response port) and just copy the raw bytes in fixed-size chunks from one port to the other:
(use-modules (rnrs bytevectors) (ice-9 binary-ports)) (define (pump-port in-port out-port) (let ([buffer (make-bytevector 65536)]) (let loop () (let ([count (get-bytevector-n! in-port buffer 0 65536)]) (when (not (eof-object? count)) (put-bytevector out-port buffer 0 count) (loop))))))
For extra garbage reduction, you can reuse the buffer across calls (but be careful if the code may be called from multiple threads).
In general, I liked the experience of rewriting the blog in Guile. It was the first time I did interactive Scheme development with Emacs (previously I had only used the REPL directly), and it was pretty cool. Some things could be better, but I see this more as an opportunity to improve things (whether by contributing to existing projects, by writing libraries to make some things easier, or just as things to take into account if/when I decide to have a go again at trying to make my own Lisp), rather than reason for complaining.
There are still a few features missing from the new blog system for feature parity with the current one, but it already handles all the important stuff (posts, comments, list of recent comments, filtering by tag, RSS feeds). I hope to be able to replace the current system with the new one Real Soon Now™.
If you return more than two values, the extra values will be passed back to the callback as arguments on the next call. You can use it to keep server state. If you want to use this feature, you can also specify in the call to
run-server the initial state arguments to be passed in the first call to the callback.
And my posts are not valid XML anyway; I don't close my
<p> tags when writing running text, for instance.
There is also a
format binding in the standard environment. It may point to either
simple-format or the
format, depending on whether
(ice-9 format) has been loaded or not.
EXWM is a window manager written in Emacs Lisp. I think this is the craziest thing I've seen written in Emacs Lisp so far, and yet it moves. It basically turns Emacs into a tiling window manager. Your windows become Emacs buffers, and you can manage them with the usual Emacs commands for splitting windows, changing focus, switching buffers, and so on. (I learned about it here.)
As a window manager, I don't think it does anything very interesting compared to other tiling window managers. Its real power comes from being integrated into Emacs. This means I can always use Emacs commands no matter what window I am in (for example, if I want to open a file, I can hit
C-x C-f no matter what program currently has focus).
This also means it can be customized and scripted in Emacs Lisp. For example, one thing I did with it is make it display "urgent" windows (those that would usually blink in the taskbar) in my Emacs mode line. So far, that's not very interesting, because with a conventional desktop environment I would already have windows highlighted in the taskbar. But what I have also done is customize it so that some windows are detected as "urgent" even though they don't set the urgency window manager hint. For example, I have windows with titles like
(1) Skype (i.e., windows with unread messages) tagged as urgent as well.
Another nice trick you can do with EXWM is send fake keypresses to windows. For example, one thing I did was to make a variant of Emacs'
insert-char command (which allows entering characters by their Unicode name or hex codepoint) which can be called from any window, by asking for the character name, putting it into the clipboard, and then sending a fake
C-v to the application.
My EXWM config file is here. It has grown a bit complex, and some things are still a bit kludgy/glitchy, but I've been using it for some 3-4 months for now. Take the parts you like from it.
Org is an Emacs mode for managing structured data in plain-text format, though that description doesn't really do justice to the thing. It can manage to-do lists, agendas, handle tables in awesome ways, and many more things. It can also export files to various formats, including Beamer presentations, which I've written about before. I'm still learning how to use it and all of its features, and I'm trying to use it for more things, including blogging. (I wrote some kludgy code to export Org files to blog posts, but I found out afterwards that it would be better to create a new export backend inheriting from the built-in HTML export backend. Still have to learn more about this though.)
One cool feature of Org mode is
org-capture, a command for taking notes with little flow interruption from what you are currently doing. Once you have it all configured, you can hit something like
C-c c j to create an entry using the journal template. The entry will be pre-filled according to the template, and can include, for example, a link to the place you called the command from. For example, if you call it from a
w3m buffer, the new note will contain a link to the web page you were visiting. If you call it from some source code, it will create a link to the place you were in the source code file. Org can recognize a variety of different buffer types, and create links appropriate to the context you called it from. You can easily make it recognize new kinds of context by defining new functions and adding them to
The most immediately observable advantage of using Org-capture in conjunction with EXWM is that you can call it from anywhere, not just regular Emacs buffers, because now Emacs commands work from any window. No matter whether you are seeing a file or reading something in Firefox, you can just type
C-c c j to take a note. I find this really nice.
Another advantage is that because you can add new functions to
org-store-link-functions, and all your windows are now Emacs buffers, you can actually make
org-capture recognize the context of non-Emacs windows too. This is especially useful for browser windows: you can make the link inserted in the note reflect the page you are visiting. Although I'm not aware of a clean way to extract the current URL from a browser window, you can make do by faking the keypresses of
C-l (to select the address bar) followed by
C-c (to copy the contents to the clipboard), and then reading the clipboard contents from Emacs. Like this:
;; Grab address from the browser. (defun elmord-exwm-get-firefox-url () (exwm-input--fake-key ?\C-l) (sleep-for 0.05) ; Wait a bit for the browser to respond. (exwm-input--fake-key ?\C-c) (sleep-for 0.05) (gui-backend-get-selection 'CLIPBOARD 'STRING)) ;; org-store-link functions must either return nil (if they don't recognize ;; the context), or call `org-store-link-props' with the appropriate link ;; properties and return non-nil. (defun elmord-exwm-org-store-link () (when (and (equal major-mode 'exwm-mode) (member exwm-class-name '("Firefox" "Firefox-esr"))) (org-store-link-props :type "http" :link (elmord-exwm-get-firefox-url) :description exwm-title))) ; Use window title as link description. ;; Finally, we add the new function to the list of known store-link functions. (add-to-list 'org-store-link-functions 'elmord-exwm-org-store-link)
I find this really cool.
I have more things I'd like to write about Emacs, but that's it for now.
Gandalf: "Excellent work, Frodo. You defeated Sauron and now... hm! Now everyone can have happy lives forever. Thanks to you."
Frodo: "I know. I just wish... I could forget."
Leitouros e leitouras,
Pous que o infindável mestrado findou-se. Ainda falta a homologação da secretaria e da comissão da pós (e é por isso que é um finish-output e não um close), mas, tendo recebido o joinha da banca, orientadores e biblioteca, creio que só o que me resta agora é começar a rodar o garbage collector mental para desentupir o cérebro, e fazer a dança da vitória.
A monografia, para quem (insanamente) quiser ver, está aqui.
Eu não sou muito fã de resoluções de ano novo, por uma variedade de motivos, mas como resolução tentativa de ano novo eu escolho a seguinte:
Preocupar-se com as coisas que importam, e não se preocupar com as coisas que não importam.
Obviamente, o que importa e o que não importa é inteiramente subjetivo. Mas isso não importa.
Desejo a todos um feliz ano novo, e espero em breve voltar a escrever mais por aqui.
First of all, I'd like to note that if you are going to read the R7RS-small report, you should also read the unofficial errata. As I read the document I spotted a few other errors not mentioned in the errata, but unfortunately I did not keep notes as I was reading. I'm not sure why a, um, revised version of the report is not published with the known errors corrected (a Revised7.01 Report?), but alas, it isn't, so that's something to keep in mind.
In this post, I will talk mainly about the differences between R5RS and R7RS-small, since R7RS-small is more of an incremental extension of R5RS, rather than R6RS. This is not intended as a complete or exhaustive description of each feature; for that, consult the report.
R7RS introduced the concept of libraries (what some other systems call modules; I suppose R6RS and R7RS chose the name "library" to avoid conflict with the concept of modules in existing implementations). Library names are lists of symbols and non-negative integers, such as (scheme base), or (srfi 69). A library has a set of imports, a set of exported symbols, and code that constitutes the library. A library definition looks like this:
(define-library (foo bar) (import (scheme base) (scheme write)) (export hello) (begin (define (hello) (display "Hello, world!\n"))))
Instead of putting the library code directly in the define-library form (inside a begin clause), it is also possible to include code from a different file with the (include "filename") directive (or include-ci for parsing the file case-insensitively; standard Scheme was case-insensitive until R5RS (inclusive)). This makes it easier to package R5RS code as an R7RS library, by including the older code within a library declaration. It's also a way to avoid writing all library code with two levels of indentation.
Imports can be specified or modified in a number of ways:
These forms can be combined (e.g., you can import only some identifiers and add a prefix to them).
Exports just list all identifiers to be exported, but you can also write (rename internal-name exported-name) to export identifiers with a different name than they have within the library body.
Unlike R6RS, all library code directly embedded in the define-library form must be written within a begin clause. At first I found this kinda weird, but it has an interesting consequence: the library definition sublanguage does not have to know anything about the rest of the programming language. There is only a limited number of kinds of subforms that can appear within define-library, and parsing it does not require knowing about the values of any identifiers. This means that define-library can be more easily processed as data. One can imagine useful tools which read library definitions from files and, say, compute the dependencies of a program, among other possibilities.
In fact, R7RS does not classify define-library or its subforms as syntax forms, i.e., they are something apart from Scheme expressions. This also resolves a problem that would occur if define-library were an expression. The report specifies that the initial environment of a program is empty. So, how would I use import declarations before importing the library where import declaration syntax is defined? Of course one way around this would be to make (scheme base) available by default rather than start with the empty environment. But the solution adopted by R7RS means we don't have to import (scheme base) if we don't want to (for example, if we want to import (scheme r5rs) instead to package R5RS code as an R7RS library). (The report does define for convenience some procedures and syntax forms with the same name as corresponding library subforms, e.g., include.)
R7RS also standardized cond-expand (extended from SRFI 0). cond-expand is a mechanism somewhat like ifdefs in C for conditionally including code depending on whether the implementation defines specific feature symbols, or whether some library is available. This makes it possible to provide different implementations of a procedure (or a whole library) depending on the current implementation. One way we could use it is to write shims, or compatibility layer libraries to provide an uniform interface for features that are implemented differently by various implementations. For example, in Common Lisp, many implemenetation support threads, but they provide different interfaces. Bordeaux Threads is a library which provides a uniform API and maps those to the corresponding functions in each implementation it supports. Now we can do similar things in R7RS for those features that are supported everywhere but in incompatible ways (e.g., for networking).
Libraries and cond-expand are by far the most important addition in R7RS relative to R5RS. Even if we did not have any of the other features, we could package them as libraries and provide implementation-specific code for them via cond-expand.
The report does not specify a mapping between library names and file names. I realize that it would be kinda hard to make everyone agree on this, but it is somewhat of a hurdle in distributing programs organized into libraries. Some implementations, such as Chibi, will look up a library named (foo bar) in a file named foo/bar.sld (where .sld stands for Scheme library definition), whereas CHICKEN will look it up at foo.bar.*. There is a project of a portable package manager for R7RS called Snow, which I think takes care of mapping packaged library files to implementation-specific names, but I haven't taken the time to check it out yet.
R7RS takes the excellent step of specifying that library names whose first component is the symbol srfi are reserved for implementing SRFIs, but then fails to specify how to name a specific SRFI. In practice, the few implementations I checked all agree on using (srfi n) as the name of the library implementing SRFI number n (i.e., I can write (import (srfi 69)) and remain reasonably portable), so this may turn out not to be a problem in practice.
R7RS incorporates the define-record-type form from SRFI 9, for defining new record/struct types. It is a somewhat verbose form, which requires specifying the record constructor arguments and the names for each field accessor/getter and (optional) mutator/setter, but it's basically the least common denominator that any implementation which has some form of records can easily support. It looks like this:
(define-record-type <person> (make-person name age) person? (name person-name person-name-set!) (age person-age person-age-set!))
R5RS did not have any way to define new types disjunct from existing types. R6RS provides a more complex records facility, including both a syntactic and a procedural layer allowing reflection, but I don't know it well enough to comment. (I have read some comments on problems in the interaction between syntactically- and procedually-defined records, but I don't know the nature of the problems or how serious they are.)
Reflection would be nice, or at least a way to convert a record into a vector or something like this (though I realize this might displease some people), but we could make libraries for that. Another thing that would be nice is for records to have a standard printed representation which could be printed out and read back again, but I realize there is a slightly complicated interaction here with the module system (the printed representation should be tagged with the record type in a way that will work regardless of which module it is read back in), and this might not even be desirable for implementation-internal types which happen to be defined in terms of define-record-type.
R7RS incorporates the exception handling mechanisms from R6RS, but not the condition types. Any value can be raised in an exception. The raise procedure raises a value as an exception object, or condition, to be caught by an exception handler. The guard form can be used to install an exception handler to be active during the evaluation of its body. The guard form names a variable to hold the captured condition, a sequence of cond-like clauses to determine what action to take given the condition, and a body to be executed. It looks like this:
(guard (err ((file-error? err) (display "Error opening file!\n")) ((read-error? err) (display "Error reading file!\n")) (else (display "Some other error\n"))) (call-with-input-file "/etc/passwd" (lambda (file) (read-line file))))
If an else clause is not provided and no other clause of the guard form matches, the exception propagates up the stack until some handler catches it. If an exception is raised and caught by a guard clause, the value returned by the guard form is whatever is returned by the body of that clause.
Beside raise, R7RS also defines a procedure (error message irritants...), which raises an error object (satisfying the error-object? predicate) encapsulating an error message and a sequence of objects somehow related to the error (called "irritants"). It also defines the procedures error-object-mesage and error-object-irritants to extract the components of the error object.
R7RS does not define specific object types to represent errors; it only says that objects satisfying a given predicate must be raised in some circumstances. An implementation might define a record type for that, or just use lists where the first element represents the error type, or whatever is appropriate for that implementation.
At first I did not think exceptions were that important in the grand scheme of things (heh), since you can implement them on the top of continuations. (And indeed, exceptions in R6RS are in a separate library rather than the base language, although this does not mean much in R6RS because, if I understand correctly, all libraries are mandatory for R6RS implementations.) However, I then realized that until R5RS (inclusive), there was no standard way to signal an error in Scheme code, and perhaps more importantly, no standard way of catching errors. If portable libraries are to become more prominent, we will need a standard way of signalling and catching errors across code from different projects, so exceptions are a good add-on.
Beside raise, R7RS also defines raise-continuable, which raises an exception but, if the
guard exception handler returns, it returns to the point where the exception was raised rather than exiting from the guard handler form. [Correction: this is how raise-continuable interacts with with-exception-handler, not guard. I'm still figuring how guard acts with respect to continuable exceptions.] On the top of this, something like Common Lisp's restarts can be implemented.
One side effect of having guard in the language is that now you can do control flow escapes without using call-with-current-continuation (call/cc for short). In theory this could be more efficient than capturing the fully general continuation just to escape from it once; in practice, some implementations may rely on call/cc to implement guard (the example implementation provided in the report does), so this performance advantage may not realize. But just having a construct to express a one-shot escape is already great, because it allows expressing this intent in the program, and potentially allows implementations to emit more efficient code when a full continuation is not required.
I was wondering if one could implement unwind-protect (a.k.a. try/finally) in terms of guard, and so avoid dynamic-wind for error-handling situations. Alas, I don't think this is possible in general, because the presence of raise-continuable means an error handler may execute even though control may still return to the guard body. I wish to write more about continuations in a future post.
Libraries (plus cond-expand), records and exceptions are the most important additions in R7RS-small relative to R5RS, and they are all a great step towards enabling code reuse and portability across implementations, while not constraining Scheme implementors unnecessarily. I am particularly happy about libraries and cond-expand, because this means we can start writing compatibility libraries to bridge differences between implementations without having to rely on a standardization process.
I have some other comments to make on I/O, bytevectors, and other parts of the standard library, but they can wait for a future post.
Over the last few days I have skimmed over R7RS, the Revised⁷ Report on [the Algorithmic Language] Scheme. I thought I'd write up some of my impressions about it, but I decided first to write a bit about the history and the context in which R7RS came about and the differing opinions in the Scheme community about R6RS and R7RS. In a future post, I intend to write up about my impressions of specific features of the standard itself.
The Scheme language was first described in a document named the "Report on the Algorithmic Language Scheme". Afterwards, a second version, called the "Revised Report on the Algorithmic Language Scheme", came out. The following version of the standard was called the "Revised Revised Report …", or "Revised² Report …" for short. Subsequent versions have kept this naming tradition, and the abbreviation RnRS (for some n) is used to refer to each version of the standard.
Up to (and including) R5RS, all versions of the standard were ratified only by unanimous approval of the Scheme Steering Committee. As a result, each iteration of the standard was a conservative extension of the previous version. R5RS defines a very small language: the whole document is just 50 pages. The defined language is powerful and elegant, but it lacks many functions that are typically expected from the standard library of a modern language and necessary for many practical applications. As a result, each Scheme implementation extended the standard in various ways to provide those features, but they did so in incompatible ways with each other, which made it difficult to write programs portable across implementations.
To amend this situation a bit, the Scheme community came up with the Scheme Requests for Implementation (SRFI) process. SRFIs are somewhat like RFCs (or vaguely like Python's PEPs): they are a way to propose new individual features that can be adopted by the various implementations, in a way orthogonal to the RnRS standardization process. A large number of SRFIs have been proposed and approved, and some are more or less widely supported by the various implementations.
R6RS attempted to address the portability problem by defining a larger language than the previous reports. As part of this effort, the Steering Committee broke up with the tradition of requiring unanimous approval for ratification, instead requring a 60% majority of approval votes. R6RS defines a much larger language than R5RS. The report was split into a 90 page report on the language plus a 71 page report on standard libraries (plus non-normative appendices and a rationale document). The report was ratified with 67 yes votes (65.7%) and 35 no votes (34.3%).
The new report caused mixed feelings in the community. Some people welcomed the new standard, which defined a larger and more useful language than the minimalistic R5RS. Others felt that the report speficied too much, reinvented features in ways incompatible with existing SRFIs, and set some things in stone too prematurely, among other issues.
In response to this divide, the Scheme Steering Committee decided to split the standard into a small language, more in line with the minimalistic R5RS tradition, and a large language, intended to provide, well, a larger language standardizing a larger number of useful features. The R7RS-small report was completed in 2013. The R7RS-large process is still ongoing, being developed in a more incremental way rather than as one big thing to be designed at once.
I think that the R6RS/R7RS divide in part reflects not only differing views on the nature of the Scheme language, but also differing views on the role of the RnRS standards, the Steering Committee, and the SRFI process. In a discussion I read these days, a person was arguing that R6RS was a more useful standard to them, because for most practical applications they needed hashtables, which R6RS standardized but R7RS did not. My first thought was "why should hashtables be included in the standard, if they are already provided by SRFI 69?". This person probably does not consider SRFIs to be enough to standardize a feature; if something is to be portable across implementations, it should go in the RnRS standard. In my (current) view, the RnRS standard should be kept small, and SRFIs are the place to propose non-essential extensions to the language. My view may be colored by the fact that I started using Scheme "for real" with CHICKEN, an implementation which not only supports a large number of SRFIs, but embraces SRFIs as the way various features are provided. For example, whereas many implementations provide SRFI 69 alongside their own hashtable functions, CHICKEN provides SRFI 69 as the one way of using hashtables. So, CHICKEN users may be more used to regard SRFIs as a natural place to get language extensions from, whereas users of some other implementations may see SRFIs as something more abstract and less directly useful.
I have already expressed my view on Scheme's minimalism here, so it's probably no surprise that I like R7RS better than R6RS. I don't necessarily think R6RS is a bad language per se (and I still have to stop and read the whole R6RS report some day), I just have a preference for the standardized RnRS language to be kept small. (I'm okay with a larger standard a la R7RS-large, as long as it remains separate from the small language standard, or at least that the components of the large language remain separate and optional.) I also don't like every feature of R7RS-small, but overall I'm pretty satisfied with it.
Copyright © 2010-2018 Vítor Bujés Ubatuba De Araújo
O conteúdo deste blog, a menos que de outra forma especificado, pode ser utilizado segundo os termos da licença Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International.
Powered by Blognir.