Elmord's Magic Valley

Software, lingüística e rock'n'roll. Às vezes em Português, sometimes in English.

Chez Scheme vs. SBCL: a comparison

2019-11-14 11:06 -0300. Tags: comp, prog, lisp, scheme, in-english

Back at the beginning of the year, when I started working on what would become Fenius (which I haven't worked on for a good while now; sorry about that), I was divided between two languages/platforms for writing the implementation: Chez Scheme (a Scheme implementation) and SBCL (a Common Lisp implementation). I ended up choosing Chez Scheme, since I like Scheme better. After using Chez for a few months now, however, I've been thinking about SBCL again. In this post, I ponder the two options.

Debugging and interactive development

The main reason I've been considering a switch is this: my experience with interactive development with Chez has been less than awesome. The stack traces are uninformative: they don't show the line of code corresponding to each frame (rather, they show the line of code of the entire function, and only after you ask to enter debug mode, inspect the raise continuation, and print the stack frames), and you can't always inspect the values of parameters/local variables. The recommended way to debug seems to be to trace the functions you want to inspect; this will print each call to the function (with arguments) and the return value of each call. But you must do it before executing the function; it won't help you interpret the stack trace of an exception after the fact.

The interaction between Chez and Geiser (an Emacs mode for interactive development with Scheme) often breaks down too: sometimes, trying to tab-complete an identifier will hang Emacs. From my investigation, it seems that what happens is that the Chez process will enter the debugger, but Geiser is unaware of that and keeps waiting for the normal > prompt to appear. Once that happens, it's pretty much stuck forever (you can't tab-complete anymore) until you restart Chez. There is probably a solution to this; I just don't know what it is.

As I have mentioned before, Chez has no concept of running the REPL from inside a module (library in Scheme terminology), which means you can't call the private functions of a module from the REPL. The solution is… not to use modules, or to export everything, or split the code so you can load the module code without the module-wrapping form.

By contrast, SBCL works with SLIME, the Superior Lisp Interaction Mode for Emacs. SLIME lets you navigate the stack trace, see the values of local variables by pressing TAB on the frame, you can press v to jump to the code corresponding to a stack frame (right to the corresponding expression, not just the line), among other features. Common Lisp is more committed to interactive development than Scheme in general, so this point is a clear win for SBCL.

(To be fair, Guile Scheme has pretty decent support for interactive development. However, Guile + Geiser cannot do to stack traces what SBCL + SLIME can.)

Execution speed

In my experience, SBCL and Chez are both pretty fast – not at the "as fast as hand-crafted C" level, but pretty much as fast as I could desire for a dynamically-typed, garbage-collected, safe language. In their default settings, Chez actually often beats SBCL, but SBCL by default generates more debugger-friendly code. With all optimizations enabled, Chez and SBCL seem to be pretty much the same in terms of performance.

One advantage SBCL has is that you can add type annotations to code to make it faster. Be careful with your optimization settings, though: if you compile with (optimize (safety 0)), "[a]ll declarations are believed without assertions", i.e., the compiler will generate code that assumes your types are correct, and will produce undefined behavior (a.k.a. nasal demons) in case it is not.

Startup time and executable size

This one is complicated. In my machine, Chez compiles a "hello world" program to a 837-byte .so file, which takes about 125ms to run – a small but noticeable startup time. A standalone binary compiled with chez-exe weighs in at 2.7MB and takes 83ms – just barely noticeable.

As for SBCL, a "hello world" program compiles to a 228-byte .fasl file, which runs in 15ms, which is really good. The problem is if the file loads libraries. For instance, if I add this to the beginning of the "hello world":

(require 'asdf)        ;; to be able to load ASDF packages
(require 'cl-ppcre)    ;; a popular regex library

…now the script takes 422ms to run, which is very noticeable.

SBCL can also generate standalone executables, which are actually dumps of the whole running SBCL image: you can load all the libraries you want and generate an executable with all of them preloaded. If we do that, we're back to the excellent 15ms startup time – but the executable has 45MB, because it contains a full-fledged SBCL in it (plus libraries). It's a bit of a bummer if you intend to create multiple independent command-line utilities, for example. Also, I guess it's easier to convince people to download a 2.7MB file than a 45MB one when you want them to try out your fancy new application, though that may not be that much of a concern these days. (The binary compresses down to 12MB with gzip, and 7.6MB with xz.)

Another worry I have is memory consumption (which is a concern in cheap VPSes such as the one running this blog, for instance): running a 45MB binary will use at least 45MB of RAM, right? Well, not necessarily. When you run an executable, the system does not really load all of the executable's contents into memory: it maps the code (and other) sections of the executable into memory, but they will actually only be loaded from the disk to RAM as the memory pages are touched by the process. This means that most of those 45MB might never actually take up RAM.

In fact, using GNU time (not the shell time builtin, the one in /usr/bin, package time on Debian) to measure maximum memory usage, the SBCL program uses 19MB of RAM, while the Chez program uses 27MB. So the 45MB SBCL binary is actually more memory-friendly than the 2.7MB Chez one. Who'd guess?

Available libraries

Common Lisp definitely has the edge here, with over a thousand libraries (of varying quality) readily available via Quicklisp. There is no central repository or catalog of Chez (or Scheme) libraries, and there are not many Chez libraries that I'm aware of (although I wish I had learned about Thunderchez earlier).

[Addendum (2019-11-16): @Caonima67521344 on Twitter points out there is the Akku package manager for Chez and other Schemes.]

The language itself

This one is a matter of personal taste, but I just like Scheme better than Common Lisp. I like having a single namespace for functions and variables (which is funny considering I was a big fan of Common Lisp's two namespaces back in the day), and not having to say funcall to call a function stored in a variable. I like false being distinct from the empty list, and for cdr of the empty list to be an error rather than nil. I like Scheme's binding-based modules better than Common Lisp's symbol-based packages (although Chez modules are annoying to use, as I mentioned before; Guile is better in this regard). Common Lisp's case insensitivity by default plus all standard symbols being internally uppercase is a bit annoying too. Scheme has generally more consistent names for things as well. I used to dislike hygienic macros back in the day, but nowadays, having syntax-case available to break hygiene when necessary, I prefer hygienic macros as well.

And yet… Common Lisp and Scheme aren't that different. Most of those things don't have a huge impact in the way I code. (Well, macros are very different, but anyway.) One things that does have an impact is using named let and recursion in Scheme vs. loops in Common Lisp: named let (similar to Clojure's loop/recur) is one of my favorite Scheme features, and I use it all the time. However, it is not difficult to implement named let as a macro in Common Lisp, and if you only care about tail-recursive named let (i.e., Clojure's loop/recur), it's not difficult to implement an efficient version of it in Common Lisp as a macro. Another big difference is call/cc (first class continuations) in Scheme, but I pretty much never use call/cc in my code, except possibly as escape continuations (which are equivalent to Common Lisp's return).

On the flip side, Common Lisp has CLOS (the Common Lisp Object System) in all its glory, with generic functions and class redefinition powers and much more. Guile has GOOPS, which provides many of the same features, but I'm not aware of a good equivalent for Chez.

Conclusion

As is usually the case when comparing programming languages/platforms, none of the options is an absolute winner in all regards. Still, for my use case and for the way I like to program, SBCL looks like a compelling option. I'll have to try it for a while and see how it goes, and tell you all about it in a future post.

Comentários / Comments

A bunch of things I learned while fighting androids

2019-11-10 23:06 -0300. Tags: comp, android, in-english

I recently had to bypass Android's Factory Reset Protection again, this time for a Samsung Galaxy J4. The procedure at the end turned out to be relatively simple (find a way to get to a browser from the initial screen, download a pair of APKs, finish the Google account login with a random Google account, uninstall the APKs). However, due to the circumstances I was operating in, I spent a lot of time figuring out how to share an internet connection from my laptop with a second Android phone so I could share it with the J4 using the second phone as a wi-fi hostspot. This post documents what I learned.

Bypassing FRP on the Samsung Galaxy J4

There are dozens of videos on YouTube explaining how to do it. I will summarize the information here.

Part 1: Getting to a browser

Part 2: Installing a bunch of APKs

That's it.

Internet sharing shenanigans

Sharing your Android phone's internet connection with your computer is pretty easy: you just enable USB tethering on the phone, and everything magically works (at worst, you have to call dhclient YOUR-USB-INTERFACE on Linux if you don't have NetworkManager running). Doing the opposite, i.e., sharing your (Linux) computer connection with the phone, has to be done manually. Here is how it goes (I'm assuming a rooted phone; mine runs LineageOS 14.1 (Android 7)):

Now, to share the internet connection:

We still have to set DNS. Android does not seem to have a resolv.conf file; I found multiple ways you can set DNS (using 1.0.0.1 and 1.1.1.1 as the DNS servers in the examples):

The last one is the only one that worked for me – and it requires two DNS servers as arguments.

By now, you should have a working internet connection on your phone (you can try it in the browser, for example).

If you want to share it with other devices via wi-fi, you can now enable Wi-Fi Hotspot on the phone. However, there is another weird thing here: for some reason, my phone would reject all DNS queries coming from the other devices. The 'solution' I ended up using was to redirect all requests to port 53 (DNS) coming from other devices to the DNS server I wanted to use:

iptables -t nat -A PREROUTING -p tcp --dport 53 -j DNAT --to-destination 1.0.0.1:53
iptables -t nat -A PREROUTING -p udp --dport 53 -j DNAT --to-destination 1.0.0.1:53

This will skip the Android builtin DNS server entirely, and send DNS requests directly to the desired DNS server.

Comentários / Comments

Random remarks on Old Chinese type-A/type-B syllables

2019-09-25 22:06 -0300. Tags: lang, old-chinese, in-english

Every now and then Academia.edu throws an interesting paper suggestion into my inbox. Today I got a paper titled A Hypothesis on the origin of Old Chinese pharyngealization, by Laurent Sagart and William Baxter (2016). [Note: the linked paper is a draft, not the final published article. I don't know if there is any difference in the content between the draft and the final version.]

This post contains some observations and impressions about the paper. I should note as a disclaimer that I'm not a specialist in Old Chinese at all; I'm just this random person on the internet who has read a bunch of papers and watched the videos of Baxter & Sagart's 2007 workshop I mentioned in a previous post. This post should be seen as my personal notes while trying to understand and thinking about this subject.

I have to say that despite being a huge fan of Baxter & Sagart's work, this paper did not convince me. In fact, it actually weakened a bit my previous belief in the B&S pharyngeal hypothesis. Anyway, here we go.

[P.S.: By the end of the post, I get re-convinced about the pharyngeal hypothesis. This post ended up very rambly.]

* * *

Old Chinese is traditionally reconstructed as having two types of syllables, called type A and type B. Type B syllables are characterized by having a /-j-/ medial in Middle Chinese; type A ones are characterized by the lack of such palatal medial. Traditional reconstructions (e.g., Karlgren's) reconstruct this /-j-/ back into Old Chinese. More recent reconstructions have put this in doubt, though.

These are some known facts about type A and type B syllables:

These are some post-Karlgren theories about the type A/B distinction:

Back in 2007 (at the B&S workshop), B&S notated type A syllables by doubling the initial, as a way to indicate them without committing to any particular realization (pharyngeal or otherwise), or to whether this was a feature of the initial or of the whole syllable. Norman considered pharyngealization to be a feature of the whole syllable, rather than the initial. B&S seems to have shifted towards considering it a feature of the initial; the paper in consideration here explicitly argues for pharyngealization to be a feature of the initial, coming from previous /Cʕ/ (consonant + pharyngeal) clusters. (The paper argues that "type-A and type-B syllables seem to rhyme with each other freely in Old Chinese poetry, which would be unexpected if pharyngealization was a feature of the rhyme as well as the onset". That is a good point, though it might just be that pharyngealization was not considered relevant in rhyming.)

In that system, every consonant has a plain and a pharyngealized version. At first sight, this looks a bit crazy, but that's not much different from, say, Irish having a palatalized and a velarized version of every consonant. There are some suspicious combinations, though; in particular, the pharyngealized glottal stop /ʔˤ/ does not look very convincing. It would not seem so problematic to me if pharyngealization were a feature of the syllable as a whole, since it would still be clearly articulated in the vowel; as a feature of the initial, it does not seem very likely.

The paper argues that these pharyngealized consonants come from pre-Old-Chinese /Cʕ/ clusters. The paper says these are clusters with a "pharyngeal fricative". One thing I just learned is that the /ʕ/ symbol can be used for either a pharyngeal fricative or an approximant; I only knew the approximant usage.

The paper further argues that these clusters come from previous CVʕV- syllables, i.e., the development was of the form:

nuʕup > nʕup > nˤup

It argues then that the corresponding long/short distinction in Lushai comes from the loss of the middle /ʕ/ and subsequent fusion of the identical vowels.

This is motivated by parallel developments in Austronesian and Austroasiatic. Proto-Austronesian seems to have had a constraint against single-syllable words: whenever a single-syllable CVC root would appear by itself (without affixes), it would surface as CV(ʔ)VC instead, with a long vowel 'interrupted' by a glottal stop, as a way to enforce the two-syllable constraint. The paper proposes the same constraint was present in a language ancestral to Proto-Sino-Tibetan. (It does not explicitly claim that this ancestral language would be the parent of Proto-Sino-Tibetan and Austronesian and Austroasiatic.) Syllables of the CV(ʔ)VC or CV(ʕ)VC type would then lead to pharyngealized, type A syllables in Old Chinese on the one hand, and long vowels after loss of the mid-consonant in Lushai.

I'm not very convinced by this idea. For one, there is no direct evidence for the /ʕ/ phoneme in Sino-Tibetan as far as I know. Of course, this was the same argument used against the laryngeal hypotheses in Proto-Indo-European during Saussure's lifetime, until Hittite was discovered which did partially preserve a laryngeal phoneme. The same could be true of the posited /ʕ/ phoneme. I'm not sure the case for /ʕ/ in Proto-Sino-Tibetan is as strong, though.

We are trying to account for a length distinction on one side, and type A/B on the other. Paper footnote 4 says: "Starostin accounted for the correlation by reconstructing a parallel length distinction for Old Chinese long vowels in type A and short vowels in type B. While this reconstruction makes sense of the apparent correlation with Lushai, there is no direct Chinese evidence for it, and it does not help explain the Hàn-time sound changes described above, which affected type A and type B differently."

The first thing to note is that there is no direct Chinese evidence for pharyngealized consonants either. Now for the Hàn-time sound changes referenced. I quote:

Inspired by the treatment in Norman (1994), Baxter and Sagart (2014) assign pharyngealization to OC type-A words, and absence of pharyngealization to type-B words. The main argument for reconstructing pharyngealization is a set of sound changes that occurred during the Hàn period (206 BCE – 220 CE), which affected type-A syllables and type-B syllables differently: original high vowels remain high in type-B syllables, but are lowered in type-A syllables; and original low vowels, which are raised in certain environments in type-B syllables, remain low in type-A syllables. Also, initial consonants often underwent palatalization in type-B syllables, but escaped such palatalization in type-A syllables. Reconstructing pharyngealization in the onset of type-A syllables seems to provide a plausible explanation for these differences, more so than any of the alternative proposals.1

Let's summarize the first part:

If we interpret A/B type as length, we would say that long vowels are lowered, and short vowels are raised. Would it be too crazy to consider length itself as the influencing factor? Vulgar Latin has also shifted vowels based on length; however, Vulgar Latin shows the opposite development: it is the short vowels that get lowered. Moreover, type A (i.e., long) prevents palatalization. Here we could perhaps make a better parallel to Vulgar Latin, where short /e/, /o/ become diphthongized /je/, /we/ (< /wo/) in Spanish. However, type B syllables palatalize regardless of the vowel, so the parallel breaks down again. Maybe B&S is right about pharyngealization after all.

The above quote has a footnote:

Moreover, at least one Hàn-dynasty commentator describes the difference between a type-A syllable and a type-B syllable by stating that the type-A syllable is “inside and deep” (nèi ér shēn 內而深), while the type-B syllable is “outside and shallow” (wài ér qiǎn 外而淺); “inside and deep” seems a natural way to describe the retraction of the tongue root that characterizes pharyngealization. See Baxter & Sagart (2014:72–73).

At the same time, Wikipedia has the following quote:

Pulleyblank initially proposed that type B syllables had longer vowels.[89] Later, citing cognates in other Sino-Tibetan languages, Starostin and Zhengzhang independently proposed long vowels for type A and short vowels for type B.[90][91][92] The latter proposal might explain the description in some Eastern Han commentaries of type A and B syllables as huǎnqì 緩氣 'slow breath' and jíqì 急氣 'fast breath' respectively.[93]

It is hard to make sense of these ancient quotes. It also makes one contemplate how much information, small clues and indirect evidence is out there for reconstructing Old Chinese, and wonder how much an amateur like me can hope to grasp about this subject.

The paper finishes with a discussion of the correlation between Lushai length and Chinese A/B type. The whole argument of the paper hinges on there being such a correlation, so they decided to check how much evidence there is for the correlation. After filtering candidates to avoid problematic cases, they get to 43 comparanda in Proto-Kuki-Chin and Old Chinese, and present the following table:

PKC long PKC short
Chinese type A 6 6
Chinese type B 5 26

They conclude that the correlation is statistically significant. One thing stands out to me, though: although PKC short and Chinese type B seem to strongly correlate, there does not seem to be a strong correlation at all between PKC long and Chinese type A, which is a bit disturbing. While this may be an effect of the small sample and the fact that there are more short (32) than long (11) words in the sample, and more type B (31) than type A (12), there may also be something meaningful going on here.

Let's interpret this table:

One way to interpret this is that there is a feature in Proto-Sino-Tibetan (PST) whose presence triggers type B in Old Chinese and short vowels in PKC, but whose absence does not influence the syllable's type. This would turn type B the marked element again, which is unsatisfying, and would also turn short vowels the marked element, which is even less satisfying.

The correlation may be less direct. For example, it might be that PST had both length and pharyngealization (or something that yields length and pharyngealization as reflexes), but only long syllables could be pharyngealized. Then short PST syllables would yield PKC short and Chinese type B, but long syllables could get either type A or B. However, this would imply that long PST syllables don't always yield long PKC syllables. It might just be so, or it might be that the PST feature that enabled length and pharyngealization (or whatever was type A/B) distinctions was a third one, say, only syllables of a certain kind could carry those distinctions. The absence of that feature would yield type B and PKC short, but its presence enabled syllables to go either way.

Conclusions

The main argument of the paper is to show that long vowels interrupted by a pharyngeal element were the origin of both Old Chinese type A syllables (argued to have pharyngealized initials) and Lushai long vowels (after loss of the pharyngeal element). The fact that the correlation only seems to appear between short vowels and type B, but not long vowels and type A, suggests that long vowels and type A do not share a common origin, only perhaps a common enabling environment (i.e., they can occur in the same environments, but are distinct features, in Proto-Sino-Tibetan). In my opinion, this undermines the motivation for reconstructing /CVʕV-/ roots for Sino-Tibetan.

Pharyngealization still seems a compelling explanation for the phenomena observed with type A syllables. However, it is not clear to me there is any good reason to consider it a feature of the initial (like the article proposes) rather than the whole syllable (as in Norton's original proposal).

Comentários / Comments

From Thunderbird to Liferea as a feed reader

2019-09-20 18:04 -0300. Tags: comp, unix, mundane, in-english

I've recently switched from Thunderbird to Liferea as my RSS feed reader. Thunderbird was randomly failing to update feeds at times*, and I thought it might be a good idea to use separate programs for e-mail and RSS for a change, so I went for Liferea. (I considered Elfeed too, but Elfeed does not support folders, only tags. In principle, tags can do everything folders can and more; the problem is that Elfeed cannot show a pane with all tags and the number of unread articles with each tag, the way Thunderbird or Liferea (or your average mail client) can do with folders.)

Liferea is pretty good, although I miss some shortcuts from Thunderbird, and sometimes shortcuts don't work (because focus is on some random widget). Here are some tips and tricks.

Importing feeds from Thunderbird to Liferea

Thunderbird can export the feed list in OPML format (right click on the feed folder, click Subscribe…, then Export). You can then import that on Liferea (Subscriptions > Import Feed List). No surprises here.

The tray icon

Liferea comes with a number of plugins (Tools > Plugins). By default, it comes with the Tray Icon (GNOME Classic) plugin enabled, which, unsurprisingly, creates a tray icon for Liferea. The problem with this for me is that whenever the window is 'minimized', Liferea hides the window entirely; you can only bring it back by clicking on the tray icon. I believe the idea is so that the window does not appear in the taskbar and the tray, but this interacts badly with EXWM, where switching workspaces or replacing Liferea with another buffer in the same Emacs 'window' counts as minimizing it, and after that it disappears from the EXWM buffer list. The solution I used is to disable the tray icon plugin.

Playing media

Liferea has a Media Player plugin to play media attachments/enclosures (such as in podcast feeds). To use it on Debian, you must have the gir1.2-gstreamer-1.0 package installed (it is a 'Recommends' dependency, not a mandatory one).

Alternatively, you can set Liferea to run an arbitrary command to open a media enclosure; the command will receive the enclosure URL as an argument. You can use VLC for that. The good thing about it is that VLC will start playing the stream immediately; you don't have to wait for it to download completely before playing it. The bad thing is that once it finishes playing the stream, the stream is gone; if you play it again, it will start downloading again. Maybe there is a way to configure this in VLC, but the solution I ended up using was to write a small script to start the download, wait a bit, and start VLC on the partially downloaded file. This way, the file will be fully downloaded and can be replayed (and moved elsewhere if you want to preserve it), but you don't have to wait for the download to finish.

#!/bin/bash
# download-and-play-media.sh

# Save file in a temporary place.
file="/tmp/$(date "+%Y%m%d-%H%M%S").media"
# Start download in a terminal so we can see the progress.
x-terminal-emulator -e wget "$1" -O "$file" &
# Wait for the file to be non-empty (i.e, for the download to start).
until [[ -s "$file" ]]; do
    sleep 1
done
# Wait a bit for the file to fill.
sleep 2
# Play it.
vlc "$file"

Miscellaneous tips

Caveats

So far I had two UI-related problems with Liferea:

Conclusion

Overall, I'm pretty satisfied with Liferea. There are a few problems, but so far I like it better than Thunderbird for feed reading.

_____

* I suspect the problem was that Thunderbird was trying to DNS-resolve the domains for a huge number (perhaps all) of feeds at the same time, and some of the requests were being dropped by the network. I did not do a very deep investigation, though.

Comentários / Comments

Emacs performance, profiling, and garbage collection

2019-09-13 00:13 -0300. Tags: comp, emacs, in-english

This week I finally got around to upgrading my system and my Emacs packages, including EXWM. Everything went fine, except for one problem: every time I loaded a XKB keymap, EXWM would hang for 10–20 seconds, with CPU usage going up. I opened an issue on the EXWM repository, but I decided to investigate a bit more.

After learning the basic commands for profiling Emacs Lisp code, I started the profiler (M-x profiler-start), loaded a new keymap, and generated a report (M-x profiler-report). It turned out that 73% of the CPU time during the hangup was spent on garbage collection. I tried the profiler again, now starting it in cpu+mem mode rather than the standard cpu mode. From the memory report, I learned that Emacs/EXWM was allocating around ~500MB of memory during the keyboard loading (!), apparently handling X MapNotify events.

I did not go far enough to discover why so much memory was being allocated. What I did discover though is that Emacs has a couple of variables that control the behavior of the garbage collector.

gc-cons-threshold determines how many bytes can be allocated without triggering a garbage collection. The default value is 800000 (i.e., ~800kB). For testing, I set it to 100000000 (i.e., ~100MB). After doing that, the keyboard loading freeze fell from 10–20s to about 2–3s. Not only that, but after setting it near the top of my init.el, Emacs startup time fell by about half.

Now, I've seen people warn that if you set gc-cons-threshold too high, Emacs will garbage collect less often, but each garbage collection will take longer, so it may cause some lag during usage, whereas the default setting will cause more frequent, but less noticeable garbage collections (unless you run code causing an unusually large number of allocations, as in this case with EXWM). However, I have been using it set to 100MB for a couple of days now, and I haven't noticed any lag; I just got a faster startup and less EXWM hangup. It may well depend on your Emacs usage patterns; you may try different values for this setting and see how it works for you.

Another recomendation I have seen elsewhere is to set gc-cons-threshold high and then set an idle timer to run garbage-collect, so Emacs would run it when idle rather than when you're using it, or setting a hook so it would run when unfocused. I did not try that, and I suspect it wouldn't work for my use case: since Emacs runs my window manager, I'm pretty much always using it, and it's never unfocused anyway. Yet another recommendation is to bind gc-cons-threshold temporarily around the allocation-intensive code (that comes from the variable's own documentation), or to set it high on startup and back to the original value after startup is finished. Those don't work easily for the XKB situation, since Emacs does not know when a XKB keymap change will happen (unless I wrote some Elisp to raise gc-cons-threshold, call XKB, and set it back after a while, which is more complicated than necessary).

Comentários / Comments

More blogging, less twittering

2019-09-09 22:33 -0300. Tags: life, mind, about, in-english

It's been a while since I last posted on this blog. I've been travelling, and also interviewing for a new job; I'll have more to say about that in the future. Now that things have settled down a bit, I intend to start blogging more again.

Aside from that, I intend to start spending less time on Twitter. I've long been in a love/hate relationship with Twitter, and the hate side of it is starting to win out, for a variety of reasons:

Mastodon isn't much better. It does not have some of Twitter's misfeatures, such as trying to push a non-chronological timeline onto users, and it has the advantages of being decentralized and community-oriented, but the experience is very similar to that of Twitter. I think these problems are related to the media of microblogging, rather than a specific platform.

Traditional blogs can be addictive too. I follow hundreds of blogs via RSS, and sometimes I feel like refreshing my feeds every now and then to see if there is something new. But blog posts typically take more time to write, so updates are not so frequent, so the incentive to keep refreshing them several times a day is not so strong. It's also easier to keep track of unread posts and read them at a later time. Finally, blog posts tend to be much more informationally nutritious than tweets or toots.

At the same time I start to use Twitter less, I would like to start posting more here. I still have to figure out what to do with content that seems too short for a blog post – for example, if I just want to share a link to a video, or a little command I learned. One solution is to accumulate those and post them in weekly installments. Another is just to go ahead and post short posts, but I don't really want to pollute the blog with a plethora of micro-posts. Yet another is to create a separate page for the micro-posts. I'm currently more inclined towards the first option.

That's it, folks. Have a nice week!

Comentários / Comments

As flautas do Céu

2019-07-07 02:25 -0300. Tags: philosophy, translation, em-portugues

Esses dias eu traduzi uma passagem do Zhuangzi (a primeira história do segundo capítulo), combinando elementos de diversas traduções para o inglês. Posto-a aqui para a posteridade.

Zi-Qi da Fronteira Sul estava sentado, debruçado sobre sua mesa baixa. Olhou para o céu e exalou lentamente – ausente e distante, como se tivesse perdido sua companhia. Yan Cheng Zi-You, que aguardava de pé diante dele, perguntou: "O que é isso? Pode o corpo tornar-se como madeira seca? Pode a mente tornar-se como cinzas extinguidas? O homem debruçado sobre a mesa agora não é aquele que estava debruçado antes!"

Zi-Qi disse: "Fazes bem em perguntar, Yan. Eu acabo de perder a mim mesmo. Entendes isso? Podes ter ouvido as flautas dos homens, mas não ouviste as flautas da Terra; podes ter ouvido as flautas da Terra, mas não ouviste as flautas do Céu."

Zi-You disse: "Aventuro-me a perguntar o significado disto."

Zi-Qi disse: "A Grande Massa [da natureza] emite um sopro vital, e seu nome é vento. Se ele não sopra, nada acontece; mas quando ele sopra, então as dez mil fendas ressoam ferozmente. Já não o ouviste soprar e soprar? Na floresta de uma montanha que se agita e chacoalha, há imensas árvores de cem palmos de largura com cavidades e aberturas, como narizes, como bocas, como ouvidos, como jarros, como copos, como vasos, como fissuras, como sulcos. O vento que sopra nelas ruge como ondas, assovia como flechas, berra, suspira, grita, ronca, geme, uiva. O vento à frente canta "yiii", o vento que segue canta "wuuu". Uma brisa leve produz uma pequena resposta; um vendaval produz uma grande resposta. E quando a ventania se acalma, a multidão de fendas torna-se vazia. Já não viste esse sacudir, esse chacoalhar?"

Zi-You disse: "As flautas da Terra são estas que acabaste de descrever; as flautas dos homens são tubos de bambu dispostos lado a lado. Aventuro-me a perguntar sobre as flautas do Céu."

Zi-Qi disse: "Os sons do sopro sobre as dez mil coisas são diferentes, mas ele apenas traz à tona as propensões naturais das próprias coisas, cada uma tomando para si o que lhe é apropriado; quem é que as sopra?"

Comentários / Comments

Functional record updates in Fenius, and other stories

2019-06-16 17:33 -0300. Tags: comp, prog, pldesign, fenius, in-english

Fenius now has syntax for functional record updates! Records now have a with(field=value, …) method, which allows creating a new record from an existing one with only a few fields changed. For example, if you have a record:

fenius> record Person(name, age)
<class `Person`>
fenius> let p = Person("Hildur", 22)
Person("Hildur", 22)

You can now write:

fenius> p.with(age=23)
Person("Hildur", 23)

to obtain a record just like p but with a different value for the age field. The update is functional in the sense that the p is not mutated; a new record is created instead. This is similar to the with() method in dictionaries.

Another new trick is that now records can have custom printers. Printing is now performed by calling the repr_to_port(port) method, which can be overridden by any class. Fenius doesn't yet have much of an I/O facility, but we can cheat a bit by importing the functions from the host Scheme implementation:

fenius> record Point(x, y)
<class `Point`>
fenius> import chezscheme

# Define a new printing function for points.
fenius> method Point.repr_to_port(port) = {
            chezscheme.fprintf(port, "<~a, ~a>", this.x, this.y)
        }

# Now points print like this:
fenius> Point(1, 2)
<1, 2>

A native I/O API is coming Real Soon Now™.

Comentários / Comments

Questions, exclamations, and binaries

2019-06-03 21:39 -0300. Tags: comp, prog, pldesign, fenius, in-english

I'm a bit tired today, so the post will be short.

ready? go!

In Scheme, it is conventional for procedures returning booleans to have names ending in ? (e.g., string?, even?), and for procedures which mutate their arguments to have names ending in ! (e.g., set-car!, reverse!). This convention has also been adopted by other languages, such as Ruby, Clojure and Elixir.

I like this convention, and I've been thinking of using it in Fenius too. The problem is that ? and ! are currently operator characters. ? does not pose much of a problem: I don't use it for anything right now. !, however, is a bit of a problem: it is part of the != (not-equals) operator. So if you write a!=b, it would be ambiguous whether the ! should be interpreted as part of an identifier a!, or as part of the operator !=. So my options are:

What do you think? Which of these you like best? Do you have other ideas? Feel free to comment.

Binaries available

In other news, I started to make available a precompiled Fenius binary (amd64 Linux), which you can try out without having to install Chez Scheme first. You should be aware that the interpreter is very brittle at this stage, and most error messages are in terms of the underlying implementation rather than something meaningful for the end user, so use it at your own peril. But it does print stack traces in terms of the Fenius code, so it's not all hopeless.

Comentários / Comments

Pattern matching and AST manipulation in Fenius

2019-05-30 19:40 -0300. Tags: comp, prog, pldesign, fenius, in-english

Fenius has pattern matching! This means you can now write code like this:

record Rectangle(width, height)
record Triangle(base, height)
record Circle(radius)

let pi = 355/113    # We don't have float syntax yet :(

let area(shape) = {
    match shape {
        Rectangle(width, height) => width * height
        Triangle(base, height) => base * height / 2
        Circle(radius) =>  pi * radius * radius
    }
}

print(area(Rectangle(4, 5)))
print(area(Triangle(3, 4)))
print(area(Circle(10)))

More importantly, you can now pattern match over ASTs (abstract syntax trees). This is perhaps the most significant addition to Fenius so far. It means that the code for the for macro from this post becomes:

# Transform `for x in items { ... }` into `foreach(items, fn (x) { ... })`.
let for = Macro(fn (ast) {
    match ast {
        ast_match(for _(var) in _(items) _(body)) => {
            ast_gen(foreach(_(items), fn (_(var)) _(body)))
        }
    }
})

This is a huge improvement over manually taking apart the AST and putting a new one together, and it basically makes macros usable.

It still does not handle hygiene: it won't prevent inserted variables from shadowing bindings in the expansion site, and will break if you shadow the AST constructors locally. But that will come later. (The AST constructors will move to their own module eventually, too.)

The _(var) syntax is a bit of a hack. I wanted to use some operator, like ~var or $var, but the problem is that all operators in Fenius can be interpreted as either infix or prefix depending on context, so in for $var would be interpreted as an infix expression for $ var, and you would have to parenthesize everything. One solution to this is to consider some operators (like $) as exclusively prefix. I will think about that.

How does it work?

I spent a good while hitting my head against the whole meta-ness of the ast_match/ast_gen macros. In fact I'm still hitting my head against it even though I have already implemented them. I'll try to explain them here (to you and to myself).

ast_match(x) is a macro that generates a pattern that would match the AST for x. So, for example, ast_match(f(x)) generates a pattern that would match the AST for f(x). Which pattern is that? Well, it's:

Call(_, Identifier(_, `f`), [Identifier(_, `x`)])

That's what you would have to write on the left-hand side of the => in a match clause to match the AST for f(x). (The _ patterns are to discard the location information, which is the first field of every AST node. ast_gen is just like ast_match but does not discard location information.) So far, so good.

But here's the thing: that's not what the macro has to output. That's what you would have to write in the source code. The macro has to output the AST for the pattern. This means that where the pattern has, say, Identifier, the macro actually has to output the AST for that, i.e., Identifier(nil, `Identifier`). And for something like:

Identifier(_, `f`)

i.e., a call to the Identifier constructor, the macro has to output:

Call(nil, Identifier(nil, `Identifier`),
          [Identifier(nil, `_`), Constant(nil, `f`)])

and for the whole AST of f(x), i.e.:

Call(_, Identifier(_, `f`), [Identifier(_, `x`)])

the macro has to output this monstrosity:

Call(nil, Identifier(nil, `Call`),
     [Identifier(nil, `_`),
      Call(nil, Identifier(nil, `Identifier`),
                [Identifier(nil, `_`), Constant(nil, `f`)]),
      Array(nil, [Call(nil, Identifier(nil, `Identifier`),
                            [Identifier(nil, `_`), Constant(nil, `x`)])])])

All of this is to match f(x). It works, is all encapsulated inside the ast_* macros (so the user doesn't have to care about it), and the implementation is not even that much code, but it's shocking how much complexity is behind it.

Could it have been avoided? Perhaps. I could have added a quasiquote pattern of sorts, which would be treated especially by match; when matching quasiquote(ast), the matching would happen against the constructors of ast itself, rather than the code it represents. Then I would have to implement separate logic for quasiquote outside of a pattern (e.g., on the right-hand side). In the end, I think it would require much more code. ast_match/ast_gen actually share all the code (they call the same internal meta-expand function, with a different value for a "keep location information" boolean argument), and requires no special-casing in the match form: from match's perspective, it's just a macro that expands to a pattern. You can write macros that expand to patterns and use them in the left-hand side of match too.

(I think I'll have some observations on how all of this relates/contrasts to Lisp in the future, but I still have not finished digesting them, and I'm tracking down some papers/posts I read some time ago which were relevant to this.)

Missing things

The current pattern syntax has no way of matching against a constant. That is:

match false {
    true => "yea"
    false => "nay"
}

binds true (as a variable) to false and returns "yea". I still haven't found a satisfactory way of distinguishing variables from constants (which are just named by identifiers anyway). Other languages do various things:

One thing that occurred to me is to turn all constructors into calls (i.e., you'd write true() and false(), not only in patterns but everywhere), which would make all patterns unambiguous, but that seems a bit annoying.

Rust's solution seems the least intrusive, but Fenius does not really have a syntactically separate class of "constructors" (as opposed to just variables bound to a constant value), and considering all bound variables as constants in patterns makes patterns too fragile (if you happen to add a global variable – or worse, a new function in the base library – with the same name as a variable currently in use in a pattern, you break the pattern). I'll have to think more about it. Suggestions and comments, as always, are welcome.

Another missing thing is a way to debug patterns: I would like to be able to activate some kind of 'debug mode' for match which showed why a pattern did not match. I think this is feasible, but we'll see in the future.

Comentários / Comments

Main menu

Posts recentes

Comentários recentes

Tags

em-portugues (213) comp (133) prog (65) life (46) in-english (44) unix (33) pldesign (32) lang (31) random (28) about (26) mind (24) lisp (23) mundane (22) web (17) fenius (17) ramble (16) img (13) rant (12) hel (12) scheme (10) privacy (10) freedom (8) academia (7) esperanto (7) copyright (7) music (7) bash (7) lash (7) home (6) mestrado (6) shell (6) conlang (5) misc (5) emacs (4) politics (4) book (4) php (4) worldly (4) editor (4) latex (4) etymology (4) android (4) film (3) kbd (3) security (3) network (3) wrong (3) c (3) tour-de-scheme (3) poem (2) philosophy (2) comic (2) llvm (2) cook (2) treta (2) lows (2) physics (2) perl (1) german (1) wm (1) en-esperanto (1) audio (1) translation (1) old-chinese (1) kindle (1) pointless (1)

Elsewhere

Quod vide


Copyright © 2010-2019 Vítor De Araújo
O conteúdo deste blog, a menos que de outra forma especificado, pode ser utilizado segundo os termos da licença Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International.

Powered by Blognir.