Elmord's Magic Valley

Computers, languages, and computer languages. Às vezes em Português, sometimes in English.

Pattern matching and AST manipulation in Fenius

2019-05-30 19:40 -0300. Tags: comp, prog, pldesign, fenius, in-english

Fenius has pattern matching! This means you can now write code like this:

record Rectangle(width, height)
record Triangle(base, height)
record Circle(radius)

let pi = 355/113    # We don't have float syntax yet :(

let area(shape) = {
    match shape {
        Rectangle(width, height) => width * height
        Triangle(base, height) => base * height / 2
        Circle(radius) =>  pi * radius * radius
    }
}

print(area(Rectangle(4, 5)))
print(area(Triangle(3, 4)))
print(area(Circle(10)))

More importantly, you can now pattern match over ASTs (abstract syntax trees). This is perhaps the most significant addition to Fenius so far. It means that the code for the for macro from this post becomes:

# Transform `for x in items { ... }` into `foreach(items, fn (x) { ... })`.
let for = Macro(fn (ast) {
    match ast {
        ast_match(for _(var) in _(items) _(body)) => {
            ast_gen(foreach(_(items), fn (_(var)) _(body)))
        }
    }
})

This is a huge improvement over manually taking apart the AST and putting a new one together, and it basically makes macros usable.

It still does not handle hygiene: it won't prevent inserted variables from shadowing bindings in the expansion site, and will break if you shadow the AST constructors locally. But that will come later. (The AST constructors will move to their own module eventually, too.)

The _(var) syntax is a bit of a hack. I wanted to use some operator, like ~var or $var, but the problem is that all operators in Fenius can be interpreted as either infix or prefix depending on context, so in for $var would be interpreted as an infix expression for $ var, and you would have to parenthesize everything. One solution to this is to consider some operators (like $) as exclusively prefix. I will think about that.

How does it work?

I spent a good while hitting my head against the whole meta-ness of the ast_match/ast_gen macros. In fact I'm still hitting my head against it even though I have already implemented them. I'll try to explain them here (to you and to myself).

ast_match(x) is a macro that generates a pattern that would match the AST for x. So, for example, ast_match(f(x)) generates a pattern that would match the AST for f(x). Which pattern is that? Well, it's:

Call(_, Identifier(_, `f`), [Identifier(_, `x`)])

That's what you would have to write on the left-hand side of the => in a match clause to match the AST for f(x). (The _ patterns are to discard the location information, which is the first field of every AST node. ast_gen is just like ast_match but does not discard location information.) So far, so good.

But here's the thing: that's not what the macro has to output. That's what you would have to write in the source code. The macro has to output the AST for the pattern. This means that where the pattern has, say, Identifier, the macro actually has to output the AST for that, i.e., Identifier(nil, `Identifier`). And for something like:

Identifier(_, `f`)

i.e., a call to the Identifier constructor, the macro has to output:

Call(nil, Identifier(nil, `Identifier`),
          [Identifier(nil, `_`), Constant(nil, `f`)])

and for the whole AST of f(x), i.e.:

Call(_, Identifier(_, `f`), [Identifier(_, `x`)])

the macro has to output this monstrosity:

Call(nil, Identifier(nil, `Call`),
     [Identifier(nil, `_`),
      Call(nil, Identifier(nil, `Identifier`),
                [Identifier(nil, `_`), Constant(nil, `f`)]),
      Array(nil, [Call(nil, Identifier(nil, `Identifier`),
                            [Identifier(nil, `_`), Constant(nil, `x`)])])])

All of this is to match f(x). It works, is all encapsulated inside the ast_* macros (so the user doesn't have to care about it), and the implementation is not even that much code, but it's shocking how much complexity is behind it.

Could it have been avoided? Perhaps. I could have added a quasiquote pattern of sorts, which would be treated especially by match; when matching quasiquote(ast), the matching would happen against the constructors of ast itself, rather than the code it represents. Then I would have to implement separate logic for quasiquote outside of a pattern (e.g., on the right-hand side). In the end, I think it would require much more code. ast_match/ast_gen actually share all the code (they call the same internal meta-expand function, with a different value for a "keep location information" boolean argument), and requires no special-casing in the match form: from match's perspective, it's just a macro that expands to a pattern. You can write macros that expand to patterns and use them in the left-hand side of match too.

(I think I'll have some observations on how all of this relates/contrasts to Lisp in the future, but I still have not finished digesting them, and I'm tracking down some papers/posts I read some time ago which were relevant to this.)

Missing things

The current pattern syntax has no way of matching against a constant. That is:

match false {
    true => "yea"
    false => "nay"
}

binds true (as a variable) to false and returns "yea". I still haven't found a satisfactory way of distinguishing variables from constants (which are just named by identifiers anyway). Other languages do various things:

One thing that occurred to me is to turn all constructors into calls (i.e., you'd write true() and false(), not only in patterns but everywhere), which would make all patterns unambiguous, but that seems a bit annoying.

Rust's solution seems the least intrusive, but Fenius does not really have a syntactically separate class of "constructors" (as opposed to just variables bound to a constant value), and considering all bound variables as constants in patterns makes patterns too fragile (if you happen to add a global variable – or worse, a new function in the base library – with the same name as a variable currently in use in a pattern, you break the pattern). I'll have to think more about it. Suggestions and comments, as always, are welcome.

Another missing thing is a way to debug patterns: I would like to be able to activate some kind of 'debug mode' for match which showed why a pattern did not match. I think this is feasible, but we'll see in the future.

Comentários / Comments (0)

Deixe um comentário / Leave a comment

Main menu

Recent posts

Recent comments

Tags

em-portugues (213) comp (148) prog (71) in-english (62) life (49) unix (38) pldesign (37) lang (32) random (28) about (28) mind (26) lisp (25) fenius (22) mundane (22) web (20) ramble (18) img (13) rant (12) hel (12) scheme (10) privacy (10) freedom (8) esperanto (7) music (7) lash (7) bash (7) academia (7) copyright (7) home (6) mestrado (6) shell (6) android (5) conlang (5) misc (5) emacs (5) latex (4) editor (4) etymology (4) php (4) worldly (4) book (4) politics (4) network (3) c (3) tour-de-scheme (3) security (3) kbd (3) film (3) wrong (3) cook (2) treta (2) poem (2) physics (2) x11 (2) audio (2) comic (2) lows (2) llvm (2) wm (2) philosophy (2) perl (1) wayland (1) ai (1) german (1) en-esperanto (1) golang (1) translation (1) kindle (1) pointless (1) old-chinese (1)

Elsewhere

Quod vide


Copyright © 2010-2024 Vítor De Araújo
O conteúdo deste blog, a menos que de outra forma especificado, pode ser utilizado segundo os termos da licença Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International.

Powered by Blognir.