Elmord's Magic Valley

Posts com a tag: `prog`

Some notes on Ruby #3: blocks

2024-06-17 15:52 +0100. Tags: comp, prog, pldesign, ruby, in-english

[This post is part of a series on Ruby semantics.]

In the third installment of this series, we are going to have a look on one of Ruby’s most prominent features: blocks.

A block is a piece of code that can be invoked with arguments and produce a value. Blocks can be written like this:

{|arg1, arg2, ...| body}

Or like this:

do |arg1, arg2, ...|
  body
end

These forms are equivalent, except for precedence: f g { block } is interpreted as f(g { block }) (the block is passed to g), while f g do block end is interpreted as f(g) { block } (the block is passed to f). The |arguments| can be omitted if the block takes no arguments. My impression is that the do syntax is preferred for multi-line blocks.

Blocks are kind of like anonymous functions, but they are not really first-class: a bare block like { puts 42 } on its own is a syntax error, and {} is interpreted as an empty dictionary (hash in Ruby terminology). The only place a block can appear is at the end of a method call, like f(x, y) { puts 42 } or f x, y do puts 42 end. This will make the block available to the method, which can use it in a number of ways.

Within the method, yield(arg1, arg2, ...) will invoke the block with the given arguments; whatever the block returns is the result of the yield call. The number of passed arguments generally does not have to match the number of arguments expected by the block: extra arguments are ignored, and missing arguments are assigned nil. (The only exception seems to be keyword arguments declared by the block without a default value; these will raise an error if not passed.)

def one_two_three
  yield 1
  yield 2
  yield 3
end

one_two_three {|x| puts x*10}  # prints 10, 20, 30


# We can also use the value produced by the block within the method.
def map_one_two_three
  [yield(1), yield(2), yield(3)]
end

map_one_two_three {|x| x*10}   # => [10, 20, 30]

The method does not have to declare that it accepts a block: you can pass a block to any method, it’s just that some will do something useful with it and some won’t. Within the method, you can test if it was called with a block by calling the block_given? predicate. Many methods from the Ruby standard library can be called with or without a block and adapt their behavior accordingly. For example, open("somefile") returns a file object, but open("somefile") {|f| ...} opens the file, passes the file object as an argument to the block, and closes the file when the block finishes (analogous to using with in Python). Another example is the Array constructor:

Array.new with no arguments returns an empty array;
Array.new(5) returns a 5-element array initialized to nil;
Array.new(5) {|i| i*i} returns a 5-element array, calling the block with each array index to initialize the corresponding array position, in this case resulting in [0, 1, 4, 9, 16].

Yet another example is the times method of the Integer class. With a block, it calls the block n times (where n is the integer), passing an iteration counter to the block as an argument:

irb(main):024:0> 5.times {|i| puts "Hello number #{i}!" }
Hello number 0!
Hello number 1!
Hello number 2!
Hello number 3!
Hello number 4!
=> 5

If you don’t need the iteration counter, you can just pass a block taking no arguments (and now we can see why Ruby allows block arguments not to match exactly with the values they are invoked with):

irb(main):025:0> 5.times { puts "Hello!" }
Hello!
Hello!
Hello!
Hello!
Hello!
=> 5

And finally, if you don’t pass it any block, it returns an Enumerator instance, which supports a bunch of methods, such as map or sum:

irb(main):035:0> 5.times.map {|x| x*x}
=> [0, 1, 4, 9, 16]


irb(main):036:0> 5.times.sum   # 0 + 1 + 2 + 3 + 4
=> 10

Another way a method can use a block is by declaring an &argument in its argument list: in this case, the block will be reified into a Proc object and will be available as a regular object to the method:

# This is equivalent to the `yield` version.
def one_two_three(&block)
  block.call(1)
  block.call(2)
  block.call(3)
end

Conversely, if you have a Proc object and you want to pass it to a method expecting a block, you can use the & syntax in the method call:

# Make a Proc out of a block...
tenfold = proc {|x| puts x*10}

# ...and pass it to a procedure expecting a block.
# This works with either version of one_two_three.
one_two_three(&tenfold)  # prints 10, 20, 30

In the above example, we also see another way we can turn a block into a Proc object: by passing it to the builtin proc method.

Block scope

Blocks can see the local variables that were defined at the time the block was created. Assignment to such variables modify the variable outside the block. Assignment to any other variable creates a local variable visible within the block and any nested blocks, but not outside.

x = 1

1.times {
    puts x      # this is the outer x
    x = 2       # this is still the outer x
    y = 3       # this is local to the block
    1.times {
        puts y  # this is the outer y
        y = 4   # this is still the outer y
    }
    puts y      # prints 4
}
puts x          # prints 2
puts y          # error: undefined variable

An exception to this are the block parameters: a block parameter is always a fresh variable, even if a local variable with the same name already exists. (Before Ruby 1.9, this was not the case: a block parameter with the same name as a local variable would overwrite the variable when the block was called.)

You can explicitly ask for a fresh variable to be created by declaring them in the parameter list after a semicolon:

x = 1
y = 2

1.times {|i; x, y| # i is the block argument; x and y are fresh variables
  x = 10
  y = 20
  puts x       # prints 10
  puts y       # prints 20
}

puts x         # prints 1
puts y         # prints 2

Block control flow

Within the block, next can be used to return from the block back to the yield that invoked it. If an argument is passed to next, it will become the value returned by the yield call:

def do_stuff
  result = yield 1
  puts "The result is #{result}"
end

do_stuff {
  puts "I'm here"
  next 42
  puts "This line will never run"
}

# prints:
#   I'm here
#   The result is 42

This construct is analogous to a continue in Python loop. For example:

5.times {|i|
    if i%2 == 0
      next  # skip even numbers
    end

    puts i
}

# prints:
#   1
#   3

Although it is more idiomatic to use the postfixed if in this case:

5.times {|i|
   next if i%2 == 0  # skip even numbers
   puts i
}

break within the block can be used to return from the method that called the block. Again, if an argument is passed to break, it becomes the return value of the method. For example:

def do_stuff
  result = yield 1
  puts "The result is #{result}"
end

x = do_stuff {
  puts "I'm here"
  break 42
  puts "This line will never run"
}

In this code, do_stuff invokes the block, which prints I'm here and causes do_stuff to return 42 immediately. Nothing else is printed; the The result is ... line won’t run. The return value (42) is assigned to x.

redo jumps back to the beginning of the block. It accepts no arguments. I’m sure this is useful in some circumstance, though my creativity partly fails me, and partly does not see why this would be useful only within blocks. But now you know it exists.

return within a block returns from the method the block is in. For example:

def do_stuff
  result = yield 1
  puts "The result is #{result}"
end

def foo
  do_stuff {
    puts "I'm here"
    return 42  # returns from `foo`
    puts "This line will never run"
  }
  puts "And neither will this"
end

foo  # prints "I'm here" and returns 42

proc vs. lambda

lambda is similar to proc: it takes a block and returns a Proc object. Unlike proc:

The resulting procedure checks that the number of arguments passed to it matches the number of arguments expected by the block;
return within the lambda returns from the block itself, not the enclosing method.

The shortcut syntax -> x, y { body } is equivalent to lambda {|x, y| body }.

Miscellanea

There are many equivalent ways of calling a Proc object:

irb(main):001:0> p = proc {|x| x+1}
=> #<Proc:0x00007f9c4b7b1698 (irb):1>
irb(main):002:0> p.call(5)
=> 6
irb(main):003:0> p.(5)
=> 6
irb(main):004:0> p[5]
=> 6

If a block declares no arguments, the names _1, _2, …, _9 can be used to refer to arguments by number:

irb(main):014:0> [1,2,3,4,5].map { _1 * _1 }
=> [1, 4, 9, 16, 25]

If such a block is turned into a lambda, the resulting procedure will require as many arguments as the highest argument number used:

irb(main):021:0> lambda { _9 }
=> #<Proc:0x00007f9c4b79c518 (irb):21 (lambda)>
irb(main):022:0> lambda { _9 }.call(1)
(irb):22:in `block in <top (required)>': wrong number of arguments (given 1, expected 9) (ArgumentError)

If a block using return in its body is reified into a Proc object using proc, and the Proc object escapes the method it was created in, and is invoked afterwards, the return will cause a LocalJumpError:

def m
  p = proc { return 42 }

  # If we called p.call() here, it would cause `m` to return 42.
  # But instead, we will return `p` to the caller...
  p
end

p = m
# ...and call it here, after `m` has already returned!
p.call()  # error: in `block in m': unexpected return (LocalJumpError)

EOF

That’s all for today, folks. There is still plenty to cover: classes, modules, mixins, the singleton class, eval and metaprogramming shenanigans. I plan to write about these Real Soon Now™.

Comentários / Comments

Some notes on Ruby #2: variables, constants, and scope

2024-06-16 11:12 +0100. Tags: comp, prog, pldesign, ruby, in-english

[This post is part of a series on Ruby semantics.]

I’m still trying to wrap my head around all the intricacies of variable/name scope in Ruby. These notes are part of my attempt to figure it all out, so take it with a grain of salt, and feel free to send corrections and additions in the comments.

As I explained in the previous post, the focus of these notes is not on how to use the language, but rather on how it works. This post in particular will deal with a lot of corner cases, which are helpful to figure out what the interpreter is doing. Let’s go!

Types of variables

Ruby has a bunch of different types of variables and variable-like things, distinguished by their initial characters:

Local variables begin with a lowercase ASCII letter, an underscore, or a non-ASCII character (i.e., a Unicode codepoint above 127). Any non-ASCII character can be used in an identifier, even things like the zero width space (U+200B). Local variables are visible in the scope they were defined in and nested scopes, kinda (more on that later).
Constants begin with an uppercase ASCII character. Constants belong to the class or module they are defined in (which is Object in the top-level). They cannot be defined or redefined from within methods, but they can be redefined outside of methods (with a warning).
Instance variables begin with @, like @foo. They belong to the current object (i.e., self).
Class variables begin with @@, like @@foo. They belong to the class they are defined in and are shared with all of its subclasses (if a subclass mutates the variable, the superclass will reflect the mutation).
- These are not the same as class instance variables, which are not a distinct variable type, but are simply the instance variables of the class object. Remember, classes are objects too (instances of Class), and therefore have their own instance variables as well, which are distinct from the instance variables of the instances of the class. Class instance variables are not shared with subclasses, because each subclass is a distinct object, with its own (class) instance variables.
- Class variables cannot be accessed from the top-level: unlike constants, they don’t implicitly refer to Object’s class variables in that case. I’m not sure why this inconsistency exists, but it might be because class variables are shared with the subclasses, and therefore defining a class variable on Object by accident would affect almost every class in Ruby, whereas a constant with the same name can be defined in a subclass with no issues.
Finally, global variables begin with $, like $foo, and are visible across the whole program.

Unlike Python, there is no per-file global scope. Global variables ($foo) are true program-wide globals. Constants, instance variables and class variables are properties of various objects: when you define one of those, you are effectively mutating the class/module/instance they were defined in, and the effects will be visible in other places where these objects are used. You can define local variables at the top-level, but they won’t be visible inside any class or method defition, nor is there any concept of importing the variables defined in a different file: when you require another file, you will be able to see the effects of running that file (such as defining constants, instance variables and class variables, which, again, are object mutation rather than what you would think of as variable definition in Python or Scheme), but local variables defined at the file top-level won’t be visible outside it.

Variables vs. methods

The allowed names for local variables and constants are also allowed method names. Because Ruby does not require parentheses in a method call, and also allows the receiver to be omitted (self.f() can be written as f(), which can be written as just f), a bare identifier like foo could be either a method name or a variable/constant name. How does Ruby distinguish those?

First, if the parentheses are used (foo()) , or if there are arguments after the identifier, with or without parentheses (foo 42), then foo is unambiguously interpreted as a method name.

If there are neither parentheses nor arguments, and the identifier begins with a lowercase ASCII letter or an underscore, it will be interpreted as a local variable if there has been a variable assignment to that identifier within the lexical scope of the reference. So in foo = 42; foo, the second foo is a local variable. This disambiguation happens at parse time, and is based on the textual appearance of an assignment in the scope of the reference, regardless of whether the assignment is actually executed at runtime. So, for example:

def foo
    "I'm a method"
end

if false
    foo = "I'm a local variable"
end

p foo  # Prints nil!

When Ruby sees the assignment to foo in the code, it creates a local variable for it, even if the assignment does not run. The variable is initialized with nil.

Note that foo() here would still invoke the method, even though there is a local variable with the same name. You might ask: what if I have a local variable whose value is a function (e.g., a lambda)? How do I call it? In this case, you have to invoke foo.call():

def foo
    "I'm a method"
end

foo = lambda { "I'm a lambda" }

p foo()        # "I'm a method"
p foo          # #<Proc:...>
p foo.call()   # "I'm a lambda"

This is similar to how in Common Lisp, there are distinct namespaces for functions and variables, and you need to use (funcall foo) to call a function stored in a variable. However, because the parentheses are not mandatory in Ruby, it has to do some extra work to guess what you want when it sees a bare identifier.

What about constants with the same name as methods? In this case, the rules are different: Ruby treats an uppercase-initial identifier as a constant unless there are parentheses or arguments:

def A
  "I'm a method"
end

A    # error: uninitialized constant A

A()  # "I'm a method"

Local variable scope

Previously, I said that local variables are visible in the scope they were defined in and nested scopes. That’s not quite true, though, because a lot of syntactic constructs start a clean slate on local variables. For example, local variables defined outside a class declaration are not visible inside it:

x = 1

class Foo
  x  # error: undefined local variable or method `x' for Foo:Class (NameError)
end

The same applies to module and def:

class Foo
  x = 1
  
  def m
    x
  end
end

Foo.new.m  # error: in `m': undefined local variable or method `x' for #<Foo:...> (NameError)

Neither will the variable be accessible via Foo.x, Foo::x, or anything else. It will be visible for code that runs within the class declaration, though:

class Foo
  x = 1
  puts x     # this is fine
  A = x      # and so is this: it initializes the constant `A` with 1
end

Even though Ruby allows multiple declarations of the same class, and each subsequent declaration modifies the existing class rather than defining a new one, local variables declared within one class declaration will not be visible to subsequent declarations of the same class:

class Foo
  x = 1
end

class Foo
  puts x  # error: in `<class:Foo>': undefined local variable or method `x' for Foo:Class (NameError)
end

But note that constants work fine in this case:

class Foo
  A = 1
end

class Foo
  puts A  # prints 1
end

This is because constants are a property of the class object, so a constant declaration mutates the class object and therefore its effect is persistent, whereas local variables only exist within the lexical/textual scope where they were declared.

Constant resolution

Speaking of which, constant scope resolution is the one thing I’m having the hardest time figuring out. It does mostly what you would expect in normal situations, but it does so by quite strange means. What seems to be going on is that Ruby uses lexical scope to determine the dynamic resolution order of the constant. Let me show what I mean.

Classes can be nested, and you can use the constants of the outer class in the inner one:

class A
  X = 1
  
  class B
    def m
      X
    end
  end
end

puts A::B.new.m  # prints 1

You can do this even if the constant definition is not textually within the same class declaration as the method definition:

class A
  X = 1
end

class A
  class B
    def m
      X
    end
  end
end

puts A::B.new.m  # still prints 1

But if you define the method directly in A::B without syntactically nesting it within A, then it doesn’t work:

class A
  X = 1
end

class A::B
  def m
    X
  end
end

puts A::B.new.m  # error: in `m': uninitialized constant A::B::X (NameError)

This resolution is dynamic, though. Let’s go back to our previous example:

class A
  X = 1

  class B
    def m
      X
    end
  end
end

puts A::B.new.m  # still prints 1

The method is getting the constant defined in A. Let’s now add a constant X to B:

class A::B
  X = 2
end

And now if we call the method:

A::B.new.m  # prints 2!

Now method m refers to a constant that did not exist at the time it was defined. In other words, it searches for X at runtime in all classes the method was textually nested in. (Remember that if you define m directly in A::B without textually nesting it in both classes, it only looks up in B.)

What about inheritance? Let’s define some classes:

class One
  X = 1
end

class Two
  X = 2
end

class A < One
  X = 10
  
  class B < Two
    X = 20
    
    def m
      X
    end
  end
end

puts A::B.new.X  # prints 20

Now let’s go about removing constants and seeing what happens:

irb(main):022:0> A::B.send(:remove_const, :X)
=> 20
irb(main):023:0> A::B.new.m
=> 10

It prefers the constant of the outer class over the one from the inheritance chain. Let’s remove that one as well:

irb(main):024:0> A.send(:remove_const, :X)
=> 10
irb(main):025:0> A::B.new.m 
=> 2

Ok, after exhausting the outer class chain, it falls back to the inheritance chain. What if we remove it from the superclass as well?

irb(main):026:0> Two.send(:remove_const, :X)
=> 2
irb(main):027:0> A::B.new.m 
(irb):16:in `m': uninitialized constant A::B::X (NameError)

So it doesn’t try the inheritance chain of the outer class.

One last check: what if you redefine a constant in a subclass but do not redefine the method?

class A
  X = 10

  class B
    X = 20

    def m
      X
    end
  end
end


class C < A::B
  X = 30
end

puts C.new.m  # prints 20

So it looks up based on where the method is defined, not the class it’s called from.

In summary, when Ruby sees a reference to a constant, it tries to find it:

First in the classes in which the reference textually appears, from inner to outer class;
Then in the superclasses of the (inner) class it textually appears in.

Uninitialized variable access

Accessing an undefined local variable raises an “undefined local variable or method” error. (Because of the ambiguity between variables and method names mentioned before, the error message mentions both cases here.) Similarly, accessing an undefined constant is an error.

Accessing an uninitialized global variable produces nil. If you run the code with warnings enabled (ruby -w), you will also get a warning about it.

Accessing an uninitialized instance variable produces nil and no warning. There used to be one but it was removed in Ruby 3.0.

Finally, accessing an uninitialized class variable raises an error (just like locals and constants, but unlike instance variables).

EOF

That’s all for today, folks. I did not even get to blocks in this post, but they’ll have to wait for a post of their own. Stay tuned!

Comentários / Comments

Some notes on Ruby #1: method definition and calls

2024-06-14 21:17 +0100. Tags: comp, prog, pldesign, ruby, in-english

[This post is part of a series on Ruby semantics.]

I’ve been studying Ruby recently for a job opportunity. The job did not pan out in the end, and therefore I’ll probably not continue with my Ruby studies, but I want to write down some things I learned before I forget them.

The focus of these notes is not on how to use the language, but rather on how it works, i.e., the language semantics. This may end up making the language seem weirder than it actually is in practice, because a lot of the examples will be dealing with corner cases. I will be writing this from a Python (and sometimes Lisp) perspective.

All calls are method calls

Functions and methods, though superficially similar, work very differently in Python and Ruby. In both languages, x.f(a) mean “call method f of object x with argument a”, but it works quite differently behind the scenes:

In Python, first x.f evaluates to a bound method object, and then that function-like object is called with a as an argument.
In Ruby, this means to send a message f with argument a to object x. There is no intermediate bound method object.

In Ruby, x.f on its own is equivalent to x.f(), i.e., send the message f with no arguments to object x. In general, parentheses can be omitted from method calls if there is no ambiguity.

f() on its own is equivalent to self.f(). Whereas in Python, self is an argument of the function that implements a method and has to be defined explicitly, in Ruby self is a keyword that refers to the current object and is always available.

Likewise, def always defines a method. Whereas in Python def defines a function in the local scope, in Ruby def defines a method in the current class. So, for example:

class Foo
  def g
    def h
      42
    end
  end
end

This defines a class Foo with a method g, which, when called, defines method h in class Foo. So, afterwards:

irb(main):008:0> x = Foo.new
=> #<Foo:0x00007f72a83dd3b8>

irb(main):009:0> x.h  # Method does not exist yet.
(irb):9:in `<main>': undefined method `h' for #<Foo:0x00007f72a83dd3b8> (NoMethodError)
        from /usr/lib/ruby/gems/3.1.0/gems/irb-1.4.1/exe/irb:11:in `<top (required)>'
        from /usr/bin/irb:25:in `load'
        from /usr/bin/irb:25:in `<main>'

irb(main):010:0> x.g  # When g is called, h is defined.
=> :h

irb(main):011:0> x.h  # Now h exists.
=> 42

(In this sense, Python’s def is more like Scheme’s define, whereas Ruby’s def is more like Common Lisp’s defun or defmethod.)

If a new Foo object is instantiated now, it will have access to method h already, since the def h defined it in the class Foo, not in the instance x:

irb(main):012:0> y = Foo.new
=> #<Foo:0x00007f72a842a3c0>

irb(main):013:0> y.h
=> 42

What if you use def at the top-level outside of a class? Well, in that case, self refers to the main object, which is an instance of Object (the base class of most Ruby classes). So a method defined at the top-level is a method of Object! For example, let’s define a hello method with no arguments (again, the parentheses around the arguments can be omitted):

def hello
  puts "Hello, world!"
end

And now we can call it:

irb(main):019:0> hello
Hello, world!
=> nil

But since the method was defined as a method of Object, won’t it be available on every object?

irb(main):020:0> 4.hello
(irb):20:in `<main>': private method `hello' called for 4:Integer (NoMethodError)
        from /usr/lib/ruby/gems/3.1.0/gems/irb-1.4.1/exe/irb:11:in `<top (required)>'
        from /usr/bin/irb:25:in `load'
        from /usr/bin/irb:25:in `<main>'

Note that the call fails not because the method is not defined, but because the method is private. We can override the access control by using the send method to send the message explicitly to the object:

irb(main):021:0> 4.send(:hello)
Hello, world!
=> nil

And there we go. The code at the top-level effectively runs as if it were inside a:

class Object
  private

  <... your code goes here ...>
end

Note how code that looks superficially like Python and seems to work the same way is actually doing so by very different means. For example, consider a piece of code like:

def multiply(x, y)
  x * y
end

class DeepThought
  def compute_answer()
    multiply(6, 7)
  end
end

puts DeepThought.new().compute_answer()  # prints 42

The method compute_answer uses the multiply method defined at the top-level. In Python, the equivalent code works by searching for multiply in the current environment, finding it at the global scope, and calling the function bound to it. In Ruby, this works by defining multiply as a method of Object, and because DeepThought inherits from Object by default, it has multiply as a method. We could have written self.multiply(6, 7) and we would get the same result.

This means you can easily clobber someone else’s method definitions if you define a method at the top-level. I guess it’s okay to do that if you’re writing a standalone script that won’t be used as part of something bigger, but if you’re writing a library, or a piece of a program consisting of multiple files, you probably want to wrap all your method definitions within a class or module definition. I plan to talk about those in a future blog post. See you next time!

Comentários / Comments

Updates on Fenius and life

2024-03-26 15:19 +0000. Tags: comp, prog, pldesign, fenius, lisp, life, in-english

Fenius

Over the last couple of months (but mainly over the last four weeks or so), I’ve been working on the Fenius interpreter, refactoring it and adding features. The latest significant feature was the ability to import Common Lisp packages, and support for keyword arguments in a Common-Lisp-compatible way, i.e., f(x, y=z) ends up invoking (f x :y z), i.e., f with three arguments, x, the keyword :y, and z. Although this can lead to weird results if keyword arguments are passed where positional arguments are expected or vice-versa (a keyword like :y may end up being interpreted as a regular positional value rather than as the key of the next argument), the semantics is exactly the same as in Common Lisp, which means we can call Common Lisp functions from Fenius (and vice-versa) transparently. Coupled with the ability to import Common Lisp packages, this means that we can write some useful pieces of code even though Fenius still doesn’t have much in its standard library. For example, this little script accepts HTTP requests and responds with a message and the parsed data from the request headers (yes, I know that it’s not even close to fully supporting the HTTP standard, but this is just a demonstration of what can be done):

# Import the Common Lisp standard functions, as well as SBCL's socket library.
let lisp = importLispPackage("COMMON-LISP")
let sockets = importLispPackage("SB-BSD-SOCKETS")

# We need a few Common Lisp keywords (think of it as constants)
# to pass to the socket library.
let STREAM = getLispValue("KEYWORD", "STREAM")
let TCP = getLispValue("KEYWORD", "TCP")

# Import an internal function from the Fenius interpreter.
# This should be exposed in the Fenius standard library, but we don't have much
# of a standard library yet.
let makePort = getLispFunction("FENIUS", "MAKE-PORT")

# Add a `split` method to the builtin `Str` class.
# This syntax is provisional (as is most of the language anyway).
# `@key start=0` defines a keyword argument `start` with default value 0.
method (self: Str).split(separator, @key start=0) = {
    if start > self.charCount() {
        []
    } else {
        let position = lisp.search(separator, self, start2=start)
        let end = (if position == [] then self.charCount() else position)
        lisp.cons(
            lisp.subseq(self, start, end),
            self.split(separator, start=end+separator.charCount()),
        )
    }
}

# Listen to TCP port 8000 and wait for requests.
let main() = {
    let socket = sockets.makeInetSocket(STREAM, TCP)
    sockets.socketBind(socket, (0,0,0,0), 8000)
    sockets.socketListen(socket, 10)

    serveRequests(socket)
}

# Process one request and call itself recursively to loop.
let serveRequests(socket) = {
    print("Accepting connections...")

    let client = sockets.socketAccept(socket)
    print("Client: ", client)
    let clientStream = sockets.socketMakeStream(client, input=true, output=true)
    let clientPort = makePort(stream=clientStream, path="<client>")
    let request = parseRequest(clientPort)

    clientPort.print("HTTP/1.0 200 OK")
    clientPort.print("")
    clientPort.print("Hello from Fenius!")
    clientPort.print(request.repr())

    lisp.close(clientStream)
    sockets.socketClose(client)

    serveRequests(socket)
}

# Remove the "\r" from HTTP headers. We don't have "\r" syntax yet, so we call
# Common Lisp's `(code-char 13)` to get us a \r character (ASCII value 13).
let strip(text) = lisp.remove(lisp.codeChar(13), text)

# Define a structure to contain data about an HTTP request.
# `@key` defines the constructor as taking keyword (rather than positional) arguments.
record HttpRequest(@key method, path, headers)

# Read an HTTP request from the client socket and return an HttpRequest value.
let parseRequest(port) = {
    let firstLine = strip(port.readLine()).split(" ")
    let method = firstLine[0]
    let path = firstLine[1]
    let protocolVersion = firstLine[2]

    let headers = parseHeaders(port)

    HttpRequest(method=method, path=path, headers=headers)
}

# Parse the headers of an HTTP request.
let parseHeaders(port) = {
    let line = strip(port.readLine())
    if line == "" {
        []
    } else {
        let items = line.split(": ") # todo: split only once
        let key = items[0]
        let value = items[1]
        lisp.cons((key, value), parseHeaders(port))
    }
}

main()

Having reached this stage, it’s easier for me to just start trying to use the language to write small programs and get an idea of what is missing, what works well and what doesn’t, and so on.

One open question going forward is how much I should lean on Common Lisp compatibility. In one direction, I might go all-in into compatibility and integration into the Common Lisp ecosystem. This would give Fenius easy access to a whole lot of existing libraries, but on the other hand would limit how much we can deviate from Common Lisp semantics, and the language might end up being not much more than a skin over Common Lisp, albeit with a cleaner standard library. That might actually be a useful thing in itself, considering the success of ReasonML (which is basically a skin over OCaml).

In the opposite direction, I might try to not rely on Common Lisp too much, which means having to write more libraries instead of using existing ones, but also opens up the way for a future standalone Fenius implementation.

Life

I quit my job about 6 months ago. My plan was to relax a bit and work on Fenius (among other things), but I’ve only been able to really start working on it regularly over the last month. I’ve been mostly recovering from burnout, and only recently have started to get back my motivation to sit down and code things. I’ve also been reading stuff on Old Chinese (and watching a lot of great videos from Nathan Hill’s channel), and re-reading some Le Guin books, as well as visiting and hosting friends and family.

I would like to go on with this sabbatical of sorts, but unfortunately money is finite, my apartment rental contract ends by the end of July, and the feudal lord wants to raise the rent by over 40%, which means I will have to (1) get a job in the upcoming months, and (2) probably move out of Lisbon. I’m thinking of trying to find some kind of part-time job, or go freelancing, so I have extra time and braincells to work on my personal projects. We will see how this plays out.

EOF

That’s all for now, folks! See you next time with more thoughts on Fenius and other shenanigans.

Comentários / Comments

Adventures with Fenius and Common Lisp

2023-01-22 00:05 +0000. Tags: comp, prog, pldesign, fenius, lisp, in-english

I started playing with Fenius (my hobby, vaporware programming language) again. As usual when I pick up this project again after a year or two of hiatus, I decided to restart the whole thing from scratch. I currently have a working parser and a very very simple interpreter that is capable of running a factorial program. A great success, if you ask me.

This time, though, instead of doing it in Go, I decided to give Common Lisp a try. It was good to play a bit with Go, as I had wanted to become more familiar with that language for a long time, and I came out of the experience with a better idea of what the language feels like and what are its strong and weak points. But Common Lisp is so much more my type of thing. I like writing individual functions and testing and experimenting with them as I go, rather than writing one whole file and then running it. I like running code even before it’s complete, while some functions may still be missing or incomplete, to see if the parts that are finished work as expected, and to modify the code according to these partial results. Common Lisp is made for this style of development, and it’s honestly the only language I have ever used where this kind of thing is not an afterthought, but really a deeply ingrained part of the language. (I think Smalltalk and Clojure are similar in this respect, but I have not used them.) Go is very much the opposite of this; as I discussed in my previous Go post, the language is definitely not conceived with the idea that running an incomplete program is a useful thing to do.

Common Lisp macros, and the ability to run code at compile time, also opens up some interesting ways to structure code. One thing I’m thinking about is to write a macro to pattern-match on AST nodes, which would make writing the interpreter more convenient than writing lots of field access and conditional logic to parse language constructs. But I still have quite a long way to go before I can report on how that works out.

What kind of language I’m trying to build?

This is a question I’ve been asking myself a lot lately. I’ve come to realize that I want many different, sometimes conflicting things from a new language. For example, I would like to be able to use it to write low-level things such as language runtimes/VMs, where having control of memory allocation would be useful, but I would also like to not care about memory management most of the time. I would also like to have some kind of static type system, but to be able to ignore types when I wish to.

In the long term, this means that I might end up developing multiple programming languages along the way focusing on different features, or maybe even two (or more) distinct but interoperating programming languages. Cross-language interoperability is a long-standing interest of mine, in fact. Or I might end up finding a sweet spot in the programming language design space that satisfies all my goals, but I have no idea what that would be like yet.

In the short term, this means I need to choose which aspects to focus on first, and try to build a basic prototype of that. For now, I plan to focus on the higher-level side of things (dynamically-typed, garbage-collected). It is surprisingly easier to design a useful dynamic programming language than a useful static one, especially if you already have a dynamic runtime to piggy-back on (Common Lisp in my case). Designing a good static type system is pretty hard. For now, the focus should be on getting something with about the same complexity as R7RS-small Scheme, without the continuations.

F-expressions

One big difference between Scheme/Lisp and Fenius, however, is the syntax. Fenius currently uses the syntax I described in The Lispless Lisp. This is a more “C-like” syntax, with curly braces, infix operators, the conventional f(x,y) function call syntax, etc., but like Lisp S-expressions, this syntax can be parsed into an abstract syntax tree without knowing anything about the semantics of specific language constructs. I’ve been calling this syntax “F-expressions” (Fenius expressions) lately, but maybe I’ll come up with a different name in the future.

If you are not familiar with Lisp and S-expressions, think of YAML. YAML allows you to represent elements such as strings, lists and dictionaries in an easy-to-read (sorta) way. Different programs use YAML for representing all kinds of data, such as configuration files, API schemas, actions to run, etc., but the same YAML library can be used to parse or generate those files without having to know anything about the specific purpose of the file. In this way, you can easily write scripts that consume or produce YAML for these programs without having to implement parsing logic specific for each situation. F-expressions are the same, except that they are optimized for representing code: instead of focusing on representing lists and dictionaries, you have syntax for representing things like function calls and code blocks. This means you can manipulate Fenius source code with about the same ease you can manipulate YAML.

(Lisp’s S-expressions work much the same way, except they use lists (delimited by parentheses) as the main data structure for representing nested data.)

Fenius syntax is more complex than Lisp-style atoms and lists, but it still has a very small number of elements (8 to be precise: constants, identifiers, phrases, blocks, lists, tuples, calls and indexes). This constrains the syntax of the language a bit: all language constructs have to fit into these elements. But the syntax is flexible enough to accomodate a lot of conventional language constructs (see the linked post). Let’s see how that will work out.

One limitation of this syntax is that in constructions like if/else, the else has to appear in the same line as the closing brace of the then-block, i.e.:

if x > 0 {
    print("foo")
} else {
    print("bar")
}

Something like:

if x > 0 {
    print("foo")
}
else {
    print("bar")
}

doesn’t work, because the else would be interpreted as the beginning of a new command. This is also one reason why so far I have preferred to use braces instead of indentation for defining blocks: with braces it’s easier to tell where one command like if/else or try/except ends through the placement of the keyword in the same line as the closing brace vs. in the following line. One possibility that occurs to me now is to use a half-indentation for continuation commands, i.e.:

if x > 0:
    print("foo")
  else:
    print("bar")

but this seems a bit ~~cursed~~ error-prone. Another advantage of the braces is that they are more REPL-friendly: it’s easier for the REPL to know when a block is finished and can be executed. By contrast, the Python REPL for example uses blank lines to determine when the input is finished, which can cause problems when copy-pasting code from a file. Copy-pasting from the REPL into a file is also easier, as you can just paste the code anywhere and tell your text editor to reindent the whole code. (Unlike the Python REPL, which uses ... as an indicator that it’s waiting for more input, the Fenius REPL just prints four spaces, which makes it much easier to copy multi-line code typed in the REPL into a file.)

Versioning

Fenius (considered as a successor of Hel) is a project that I have started from scratch and abandoned multiple times in the past. Every time I pick it up again, I generally give it a version number above the previous incarnation: the first incarnation was Hel 0.1, the second one (which was a completely different codebase) was Hel 0.2, then Fenius 0.3, then Fenius 0.4.

This numbering scheme is annoying in a variety of ways. For one, it suggests a continuity/progression that does not really exist. For another, it suggests a progression towards a mythical version 1.0. Given that this is a hobby project, and of a very exploratory nature, it’s not even clear what version 1.0 would be. It’s very easy for even widely used, mature projects to be stuck in 0.x land forever; imagine a hobby project that I work on and off, and sometimes rewrite from scratch in a different language just for the hell of it.

To avoid these problems, I decided to adopt a CalVer-inspired versioning scheme for now: the current version is Fenius 2023.a.0. In this scheme, the three components are year, series, micro.

The year is simply the year of the release. It uses the 4-digit year to make it very clear that it is a year and not just a large major version.

The series is a letter, and essentially indicates the current “incarnation” of Fenius. If I decide to redo the whole thing from scratch, I might label the new version 2023.b.0. I might also bump the version to 2023.b.0 simply to indicate that enough changes have accumulated in the 2023.a series that it deserves to be bumped to a new series; but even if I don’t, it will eventually become 2024.a.0 if I keep working on the same series into the next year, so there is no need to think too much about when to bump the series, as it rolls over automatically every year anyway.

The reason to use a letter instead of a number here is to make it even less suggestive of a sequential progression between series; 2023.b might be a continuation of 2023.a, or it might be a completely separate thing. In fact it’s not unconceivable that I might work on both series at the same time.

The micro is a number that is incremented for each new release in the same series. A micro bump in a given series does imply a sequential continuity, but it does not imply anything in terms of compatibility with previous versions. Anything may break at any time.

Do I recommend this versioning scheme for general use? Definitely not. But for a hobby project that nothing depends on, this scheme makes version numbers both more meaningful and less stressful for me. It’s amazing how much meaning we put in those little numbers and how much we agonize over them; I don’t need any of that in my free time.

(But what if Fenius becomes a widely-used project that people depend on? Well, if and when this happens, I can switch to a more conventional versioning scheme. That time is certainly not anywhere near, though.)

Implementation strategies

My initial plan is to make a rudimentary AST interpreter, and then eventually have a go at a bytecode interpreter. Native code compilation is a long-term goal, but it probably makes more sense to flesh out the language first using an interpreter, which is generally easier to change, and only later on to make an attempt at a serious compiler, possibly written in the language itself (and bootstrapped with the interpreter).

Common Lisp opens up some new implementation strategies as well. Instead of writing a native code compiler directly, one possibility is to emit Lisp code and call SBCL’s own compiler to generate native code. SBCL can generate pretty good native code, especially when given type declarations, and one of Fenius’ goals is to eventually have an ergonomic syntax for type declarations, so this might be interesting to try out, even if I end up eventually writing my own native code compiler.

This also opens up the possibility of using SBCL as a runtime platform (in much the same way as languages like Clojure run on top of the JVM), and thus integrating into the Common Lisp ecosystem (allowing Fenius code to call Common Lisp and vice-versa). On the one hand, this gives us access to lots of existing Common Lisp libraries, and saves some implementation work. On the other hand, this puts some pressure on Fenius to stick to doing things the same way as Common Lisp for the sake of compatibility (e.g., using the same string format, the same object system, etc.). I’m not sure this is what I want, but might be an interesting experiment along the way. I would also like to become more familiar with SBCL’s internals as well.

EOF

That’s it for now, folks! I don’t know if this project is going anywhere, but I’m enjoying the ride. Stay tuned!

1 comentário / comment

Some impressions about Go

2020-12-13 10:06 +0000. Tags: comp, prog, golang, in-english

A couple of months ago, I decided to rewrite the implementation of Fenius from scratch… in Go. I’ve also been working on a web project in Go at work. In this post, I write some of my reflections about the language.

First, a bit of a disclaimer. This post may end up sounding too negative; yet I chose to write the implementation of Fenius in Go for a reason, and I don’t regret this decision. Therefore, despite all the complaints I have about the language, I still think it’s a useful tool to have in my toolbox. With that in mind, here we go.

Also, a bit of context for those who don’t follow this blog regularly: Fenius is a programming language I am designing and playing with in my free time. The goal is to mix elements of functional and object-oriented programming and Lisp-style macros in a non-Lisp syntax, among other things. In its current incarnation, the language is implemented as an interpreter in written Go.

Why did I choose Go for this project?

I’ve been curious about Go for a long time, but had never taken the time to play with it. I don’t have much patience for following tutorials, so for me the most effective way of learning a new programming language is to pick some project and try to code it in the language, and learn things as I go.

I realized Fenius would be a good match for Go for a bunch of reasons:

Compared to higher-level languages (such as Common Lisp):

Go generates small, self-contained executables; the end-user does not have to have Go installed to run the Fenius binary. SBCL can do that too, but the resulting binary is much larger.
It is easy to generate Go binaries for multiple platforms, without having to run the compiler in those platforms as well.
The Go string type is essentially just an array of bytes that is treated as UTF-8 encoded data by the language, but has no qualms dealing with invalid UTF-8 data. This happens to mostly match the semantics of strings in Fenius, and is in contrast with languages such as Common Lisp, Scheme or Python which try to encode strings coming from the outside world as Unicode strings (including file names and program arguments, which have no guarantee of being valid UTF-8, with Python doing some tricks to smuggle arbitrary bytes into a Unicode string).

(The only point where Go and Fenius diverge is that, when handling strings as UTF-8, Go generally replaces invalid bytes in the input with the Unicode replacement character (U+FFFD �), whereas the behavior I expect to eventually have in Fenius is to pass the invalid bytes unchanged into the output. The only language I know of that matches my desired behavior is Elixir.)

Compared to lower-level languages (such as C):

Go is garbage collected, which not only makes my life easier during development, it also means I don’t have to implement a garbage collector for Fenius myself.
Go is memory-safe, type-safe (at least compared to C), and has no undefined behavior. (It does have some unspecified behavior in the memory model with respect to goroutine synchronization, but nothing comparable to C’s concept undefined behavior which basically means “if you make a misstep, the compiler will stab you in the back”.)
Go has first-class functions / closures, which makes a lot of things more convienient than they would be in C.
Go puts some incredible effort into letting you write top level declarations in any order and figuring out any dependencies between them automatically. For instance, you don’t have to care about the order in which mutually dependent structs or functions are declared, you can initialize global variables with non-constant expressions, and everything just works.

In summary, I see Go basically as a garbage-collected, memory-safe language with a small runtime, somewhat above C in abstraction level, but not much above. This can be either good or bad (or sometimes one and sometimes the other), depending on the requirements of your project.

(Another reason for using Go, which is unrelated to any of the features of the language itself, is that Go is used for a bunch of things where I work, so learning it would be useful for me professionally. And indeed, the experience I acquired working on the Fenius interpreter has been hugely useful at work so far.)

With all that said, Go does leave something to be desired in many respects, could be better designed in others, and just plain annoys me in others. Let’s investigate those.

Do repeat yourself

Go bills itself as a simple programming language, and simple it is. However, one thing it made me reflect about is that there is more than one way to go about simplicity. Scheme, for instance, also aims at being a simple programming language; and yet Scheme is far more expressive than Go. Now, “expressiveness” is a vague concept, so let’s try to make this more concrete. What I’m after here is an idea that might be called abstraction power: the ability to abstract repeating patterns in the code into reusable entities. Go leaves a lot to be desired in this department. Whereas Scheme is a simple language that gives you a basic set of building blocks from which you can build higher-level abstractions, Go is a simple language that pretty much forces everything to stay at the simple level. Whereas Scheme is simple but open-ended, “open-ended” is about the last word I would use to describe Go.

The thing is, Go is this way by design: whether or not you like this (and I don’t), it is an intentional rather than accidental part of the design of the language. And it does have some benefits: because there are fewer (no) ways to extend the language, it’s also easier to exclude certain behaviors when analyzing what a piece of code does. For example, recently at work, while trying to figure out how GORM works, someone wondered if GORM closed database connections automatically when the database handler went out of scope, and I was able to say I didn’t think that was possible, simply because there is no mechanism in Go that could be used to achieve that.¹ Likewise, if you have something of the form someStruct.SomeField, you can be sure all this will do is read a memory location, not run arbitrary code. Of course, this has a flip side: anyone accessing someStruct.SomeField really depends on the struct having this field; it cannot be replaced by a property method in the future. You either have to live with that, or write an accessor method and always use that instead of accessing the field directly in the rest of the program, just like in plain ol’ Java.

The `while` problem

Go’s if has a two-clause version which allows you to initialize some variables and use them in the if condition. This works particularly well with the Go stategy of signaling errors and the results of some builtin constructs by returning multiple values. One common example is the “comma ok” idiom: the statement value, ok := someMap[key] sets value to the value of the key in the map (or a zero value if the key is not present), and ok to a boolean indicating whether the key was present in the map. Combined with the two-clause if form, this allows you to write:

if value, ok := someMap[key]; ok {
    fmt.Printf("Key is present and has value %v\n", value)
} else {
    fmt.Printf("Key is not present\n")
}

where ok is set in the first clause and used as a condition in the second. Likewise, switch also has a two-clause form.

Given that, you might expect there would be an analogous construct for loops. In fact, even in C and similar languages, one can write things like:

int ch;
while (ch = getchar(), ch != EOF) {
    putchar(ch);
}

where the while condition assigns a variable and uses it in the loop condition. Surely Go can do the same thing, right?

Alas, it can’t. The problem is that Go painted itself into a corner by merging the traditional functions of while into the for construct. Basically, Go has a three-clause for which is equivalent to C’s for:

for i := 0; i < 10; i++ {
    fmt.Println(i)
}

and a one-clause for which is equivalent to C’s while:

for someExpressionProducingABoolean() {
    fmt.Println("still true")
}

but because of this, the language designers are reluctant to add a two-clause version, since it could easily be confused with the three-clause version – just type an extra ; at the end, and you have an empty third clause, which changes the behavior of the first clause to run just once before the loop rather than before every iteration. This could be easily avoided by having a separate while keyword in the language for the one- and two-clause versions, but I very much doubt this will ever be introduced, even in Go 2.

The issue comes up again every now and then. The solution usually offered is to use the three-clause for and repeat the same code in the first and third clauses, i.e., the equivalent of doing:

 for (ch = getchar(); ch != EOF; ch = getchar()) {
     putchar(ch)
 }

i.e., “do repeat yourself”, or using a zero-clause for (which is equivalent to a while (true)) and an explicit break to get out of the loop. Incidentally, in the thread above, one of the Go team members replies that this kind of thing is not possible in other C-like languages either, but as we saw above, C actually can handle this situation because C has the comma operator and assignment is an expression, not a statement, which allows you to write stuff like while (ch = getchar(), ch != EOF), whereas Go neither has the comma operator, nor does it have assignment as an expression. I’m not arguing that it should have these, but rather that the lack of these elements makes a two-clause while more desirable in Go than it is in C.

Iteration, but not for you

There are many operations that are only available for builtin types, and you cannot implement them for your custom types. Consider, for example, iteration: Go has an iteration construct that looks like this:

for key, value := range iterableThing {
    fmt.Printf("The key is %v, and the value is %v", key, value)
}

but it only works for arrays/slices, maps, strings and channels; you cannot make your own types iterable. Given that Go has interfaces, it would be easy for the language to include a standard Iterator interface which user-defined types could implement; but that’s not the case. (On a second thought, that’s actually not possible because Go does not have generics, and the return type of the iterator depends on the thing being iterated over.) If you want to write any sort of custom iteration, you will have to make do with regular function calls, a problem that is aggravated by the lack of a two-clause while (as seen above), which might allow you to test if there are more elements and get the next element at the same time.

Generics, but not for you

This is one of the most frequent complaints people have about Go. Go has no form of parametric polymorphism (a.k.a. generics): there is no way, for example, to define a function that works on lists of X for any type X, or to define a new type “binary tree” parameterized by the type of the elements you want to store in the tree.

If you are defining a new container data type and you want to be able to store elements of any type inside it, one option is to define the container’s elements as having type interface{}, i.e., the empty interface, which is satisfied by every type. This is roughly equivalent to using Object in Java. By doing this, you give up any static type safety when dealing with the container’s elements, and you have to cast the elements back to their original type when extracting them from the container, so basically you are left with a dynamically-typed language except with more boilerplate. The alternative, of course, is to repeat yourself and just write multiple versions of the functions and data structures you need, specialized for the types you happen to need.

Another option, seriously offered as an alternative by the language designers, is to write a code generator to generate specialized versions of the functions and data structures you need. No, Go does not have macros; what this entails is actually writing a program yourself that spits out a .go file with the content you want. Besides being much more work and being harder to maintain (although there are projects around that can do this for you; you just have to make sure to run the damn program every time you make changes to the original struct), it does not really help distributing libraries containing generic types.

Now, the funny thing is that the builtin types (arrays, slices, maps and channels) are type-parametric, and there is a number of builtin functions in Go, such as append and copy, that are generic as well, so, once again, Go has this feature, because it’s useful, but it’s only available for the builtin types and functions. This special-casing of builtin types is one of the most annoying aspects of Go’s design to me.

Now, unlike some fervorous Go proponents, the language designers themselves do recognize the lack of generics as a problem, and have done so for a long time; they have just been unsure how best to add them to Go’s design and afraid of adding in a bad design and then being stuck with supporting it forever, since Go makes strong guarantees about backwards compatibility, which is all perfectly reasonable. It looks like Go 2 will likely come with support for generics; we just don’t really know when that will happen.

Error handling

This is another classic of Go complaints, and with reason – it’s the other main problem that is serious enough to be recognized by the language designers, and may get better in Go 2. Until that happens, though, we are stuck with the Go 1 style of error handling.

In Go, errors are typically reported by returning multiple values. For example, a function like os.Open returns two values: an open file handler (which may be nil if an error occurred and the file could not be opened), and an error value indicating which error, if any, has happened. Typical use looks like this:

func doSomethingWithFile() int {
    file, err := os.Open("/etc/passwd")
    if err != nil {
        log.Panicf("Error opening file: %v", err)
    }
    // ... do something with file ...
    return 42;
}

or you can make your function return an error value itself, so you can pass it on for the caller to handle:

func doSomethingWithFile() (int, error) {
    file, err := os.Open("/etc/passwd")
    if err != nil {
        return 0, err
    }
    // ... do something with file ...
    return 42, nil
}

There are many problems with this approach. The most obvious one is that this quickly becomes a repetitive pile of if err != nil { return nil, err } after anything that may return an error, which distracts from the actual business logic. There is no way to abstract this repetition away, since you can’t pass the result of a multiple-values function as an argument to another function without assigning it to variables first, and a subfunction would not be able to return from the main function anyway. Macros could help here, but Go does not have them.

The second problem is that you don’t return either a value or an error (as you would do with Rust’s Result type, which is either an Ok(some_result) or an Err(some_error)); you return both a value and an error, which means you still have to return a value even when there is no value to return. For reference types, you can return nil; for other types, you typically return the zero value of that type (e.g., 0 for integers, "" for strings, a struct with zero-valued fields, etc.) The zero value is often a perfectly valid value that can occur in non-error situations as well, so if you make a mistake in handling the error, you may end up silently passing a zero value as a valid value to the rest of the program, rather than blowing up like an exception would.

This is partly mitigated by the fact that in Go it is an error to declare a variable and not use it, so you are forced to do something with the err you just created – unless an err already exists in scope, in which case your value, err := foo() will just reuse the existing err and no error will be generated if you don’t do anything with it. Moreover, functions that only have side-effects but don’t return anything other than an error (or do return some other value but the value is rarely used) are not protected by this. Perhaps the most common example are the fmt.Print* functions, which return the number of bytes written and an error value, but I’ve never seen this error value handled – it would become an utter mess if you were to do the if err != nil { ... } rigmarole after every print, so no one does, and print errors just get silently ignored by the vast majority of programs.

The third problem is that a function returning an error type does not really tell you anything about which errors it can return. This is also a problem with exceptions in most languages, but Go’s approach to error values feels even more unstructured. Consider for example the net package: it has a zillion little functions, most of which can return an error; almost none of them document which errors they can return. At least in POSIX C (which uses an even more primitive error value system, typically returning -1 and setting a global errno variable to the appropriate error code), you have manpages listing every possible error you can get from the function. In Go, I suppose the usual strategy is to find out the errors you care about and handle these, and pass the ones you don’t recognize up to the caller. That’s basically the strategy of exceptions, except done manually by you, with a lot of repetitive clutter in the code.

To be fair, the situation can be somewhat ameliorated through strategic use of the panic/recover mechanism, which is like exceptions except you’re not supposed to use them like exceptions. panic is usually used for fatal situations that mean the program cannot proceed. For situations that are supposed to be recoverable, you’re supposed to use error values. Now, what counts as recoverable or not depends on the circumstances. In general, you don’t want to call panic from a library (unless you hit an assertion violation or some other indicator of a bug), because you want library users to be able to treat the errors produced by your library. But in application code, where you know which situations are going to be handled and which are not, you can often use panic more liberally to abort on situations where execution cannot proceed and reduce the set of possible error values you pass up to the caller to only those the caller is expected to handle. Then you can use recover as a catch-all for long-running programs, to log the error and keep running (the Gin web framework, for instance, will automatically catch panics, log them and return a 500 to the client by default). I don’t know if this is considered idiomatic Go or not, but I do know that it makes code cleaner in my experience.

There is also precedent for using panic for non-local exits in the standard library: the encoding/json package uses panic to jump out of recursive calls when encountering an error, and then recover to turn the panic into a regular error value for users of the library.

No inheritance

Go has no inheritance; instead, it emphasizes the use of interfaces and composition. This is an interesting design choice, but it does cause problems sometimes. So far I have been in two situations where having something akin to an “abstract struct” from which I could inherit would have made my code simpler.

The first situation was in the Fenius interpreter: the abstract syntax tree (AST) generated by the parser has 8 different types of nodes, each of which is a struct type, some of which have subfields that are AST nodes themselves (for example, an AstBlock contains a list of AST nodes representing statements inside the block). To handle this, I define an AST interface which every node type implements. Now, one thing that every AST node has in common is a Location field. But an interface cannot require a satisfying type to have specific fields, only specific methods. Therefore, if I want the location of an AST node to be accessible regardless of its type, the only option I have is to add a Location() method to the interface (which I actually call Loc(), because I cannot have a field and a method with the same name), and implement it for each node type, so I have 8 identical method definitions of the form:

func (ast AstIdentifier) Loc() Location { return ast.Location }

in the code, one for each node type.

The second situation was in the web project at work, where I implemented a simple validation package to validate incoming JSON request bodies. In this package, I define a bunch of types representing different types of fields, such as String, Integer, Boolean, etc. Usage looks like this:

var FormValidator = validation.Map{
    "name": validation.String{Required: true, MaxLength: 50},
    "age": validation.Integer{Required: false, MinValue: 0},
}

All of these types have in common a boolean Required field. But again, since there is no inheritance, given a validator type there is no generic way for me to access the Required field. The only way is to implement a method returning the field for every validator type (or to use reflection and give up type safety).

Now, Go has an interesting feature, which is that you can embed other types in a struct, and you can even access the fields and methods of the embedded struct without naming it explicitly, so in principle I could do something like:

type BaseValidator struct {
    Required bool
}

type String struct {
    BaseValidator
    MaxLength int
}

and now if I instantiate a String struct s, I can even write s.Required without naming the embedded struct! This could solve my problem, except that when initializing the struct, I cannot write just String{Required: true}: I have to write String{BaseValidator: BaseValidator{Required: true}}, which ruins my pretty Map definition.

Another thing that could solve my problem is writing a constructor function for the String type, but since Go does not have keyword arguments, that does not look pretty either. The only solution that looks pretty in the client code is to repeat myself in the package code.

No love for unfinished programs

In Go, it is a compilation error to define a variable and not use it, or to import a module and not use it. I do think it’s worthwhile to ensure that these things are not present in finished code (the one that goes to code review and gets deployed); that’s why we have linters. But requiring it during development is a pain in the ass. Say you are debugging a piece of code. Comment out something to see what happens? Code does not compile because a variable is not in use. Or you add some debug prints, run the code, see what happens, comment out the debug print, run again… code does not compile because you import fmt and don’t use it. These seemingly minor but frequent annoyances break your flow during development.

There are lots of interesting invariants that are useful to ensure are respected in finished programs, but which will be violated at various points during development, between the time you check out the repository and the time you have a new version ready to be deployed. It is my long-standing position that running unfinished programs is a useful thing to be able to do; this is a topic I might revisit in a future blog post. It is okay when a language rejects an incomplete program for technical reasons (e.g., the implementation cannot ensure run-time safety for code that calls non-existent functions, or calls a function with the wrong argument types). What annoys me is when a language goes out of its way to stop you from running code that it would otherwise be perfectly capable of running. Java’s checked exceptions and Go’s unused variable/import checks fall into this category. This could be easily solved by having a compiler switch to disable those checks during development, but alas, no.

At the same time, a struct constructor with missing fields is not an error, not even a warning, so if you forget a field, or add a new field to the struct and forget to update some place that constructs it, you get no help from the language; not even golint will tell you about it. (Yes, there are useful use cases for omitting struct fields, but I would expect at least a linter option to detect this.)

One-letter identifiers are the norm

And this is enshrined in the Go Code Review Comments page from the Go wiki:

Variable names in Go should be short rather than long. This is especially true for local variables with limited scope. Prefer c to lineCount. Prefer i to sliceIndex.

Prefer c to lineCount? Why? It is general wisdom that code is read more often than it’s written, so it pays off to use descriptive variable names. It may be super clear for you, today, that c is a line count, but what about people new to the code base, or your future self 6 months from now? Is there any clarity gained by using c instead of lineCount? Is the code simpler?

As for i instead of sliceIndex… well, sure, since sliceIndex says very little about the slice’s purpose anyway. Depending on the context, there may be a better name than both i and sliceIndex to give to this variable. But I do grant that i may be an okay name for a slice index in a simple loop (although slice indexes don’t really appear that much anyway, since you can iterate over the values directly).

Testing

The only good thing I can say about Go’s testing infrastructure is that it exists; that’s about it. It is afflicted by Go’s obsession with single-letter identifiers (it defines types such as testing.T for tests, testing.B for benchmarks, testing.M for main test context). It provides no assert functions; you’re supposed to write an explicit if and panic to indicate test failures. (There is a popular library called Testify that provides asserts and also shows diffs between expected and found values.)

Despite doing very little, it also does too much. For instance, it caches test results by default (!). You can disable this behavior: “The idiomatic way to disable test caching explicitly”, I quote, “is to use -count=1.” (!!) It also runs tests from different packages in parallel by default, which makes for all sorts of fun if your tests use a database – the main one being spending a day figuring out why your tests don’t work, since this fact is not particularly prominent in documentation, i.e., it is not something you are likely to find out unless you are specifically looking for it. (You can disable parallelism, or use one of various third-party packages with different solutions to tests involving databases.)

The attitude

This one is very subjective, and not related to the language itself, but it just rubs me wrong when I see the Go designers speaking of Go as if it truly stood out from the crowd. Even when recognizing other languages, they seem to want to position Go as a pioneer in a great new wave of programming languages. From the Go FAQ:

We were not alone in our concerns. After many years with a pretty quiet landscape for programming languages, Go was among the first of several new languages—Rust, Elixir, Swift, and more—that have made programming language development an active, almost mainstream field again.

The Go project started by the end of 2007 and went public in 2009. Was the programming language landscape really that silent in the preceding years? Without doing any research other than checking the dates on Wikipedia, I can think of D (2001), Groovy (2003), Scala (2004), F# (2005), Vala (2006), Clojure (2007), and Nim (2008). So no, we were not in any kind of programming language dark ages before Go came along inaugurating a great renaissance of programming languages.

Recently I watched a video in which Rob Pike speaks of the fact that Go numeric constants work like abstract numbers without a specific type, so you can use 1 in a place expecting an int or a byte or a float64 without relying on type conversion rules, as a “relatively novel idea”. Guys, Haskell has had this at least since 1990. These ideas are not new, you have just been oblivious to the rest of the world.

Of course, Go does bring its own share of new ideas, and new ways to combine existing ideas. It just annoys me when they see themselves as doing something truly exceptional and out of the ordinary.

So why do I keep using this language?

And yet, despite all of the above, I still think Go was a good choice for implementing the Fenius interpreter, and I still think it’s a good choice in a variety of situations. So I think it’s appropriate to finish this post with some counterpoints to the above. Why do I keep using Go, despite all of the above problems?

First of all, it gets the job done. It is often the case that practical considerations, often having more to do with a language’s runtime and environment than with the language itself, lead to the choice of a given language for a job. For example, PHP has a terrible language design, but it’s super easy to deploy, so it makes sense to choose PHP for some tasks in some circumstances, even though there are plenty of better languages available. As for Go, regardless of any of the problems mentioned before, it does give me a lightweight memory-safe garbage-collected runtime, native self-contained executables, and does not try to hide the operating system from me. These characteristics make Go a good choice for my particular use case. (I should also note that, despite the above comparison with PHP, a lot of thought has been put into Go’s design, even if I disagree with many of the design choices.)

Second, in almost every respect in which Go is bad, C is even worse. So if you come to Go with a perspective of “I want something like C but less annoying”, Go actually delivers. And I would rather program in Go than in C++, even though C++ does not have many of the problems mentioned above, because the problems it does have are even worse. When I think from this perspective, I’m actually glad Go exists in this world, because it means I have fewer reasons to write C or C++.²

In fact, when you realize that Go came about as a reaction to C++, the relentless obsession with (a certain kind of) simplicity makes a lot more sense. C++ is a behemoth of a language, and it gets bigger with every new standard. Go is not only a very simple language, it makes it hard to build complex abstractions on the top of it, almost like a safeguard against C++-ish complexity creeping in. One can argue the reaction was too exaggerated, but I can understand where they are coming from.

There is a final bit of ambivalent praise I want to give Go, related to the above. I think Go embodies the Unix philosophy in a way no other recently designed language that I know of does. This is not an unambiguously good thing, mind you; it brings to my mind the worse is better concept, an interesting view of Unix and C by someone from outside of that tradition (and an essay with a fascinating story in itself). But Go had key Unix people among its designers – Ken Thompson (the inventor of Unix himself) and Rob Pike (who worked on Plan 9) – and it shows. For good and for bad, Go is exactly the kind of language you would expect Unix people to come up with if they sat down to design a higher-level successor to C. And notwithstanding all my misgivings about the language, I can respect that.

_____

1 Recently I learned it is possible to set a finalizer on an object, but they are not deterministic or related to scoping. I do find it a bit surprising that Go has finalizers, though.

2 If I did not need garbage collection, Rust would be a good option for this project as well. But as I mentioned in the beginning, I do need a garbage collector because Fenius is garbage-collected. If I were to implement it in a non-garbage-collected language, I would have to write a garbage collector for Fenius myself, whereas with Go or other garbage-collected languages, I can get away with relying on the host language’s garbage collector. I think of Rust and Go as complementary rather than in opposition, but that’s maybe a topic for another post.

3 comentários / comments

Type parameters and dynamic types: a crazy idea

2020-07-24 22:19 +0100. Tags: comp, prog, pldesign, fenius, in-english

A while ago I wrote a couple of posts about an idea I had about how to mix static and dynamic typing and the problems with that idea. I've recently thought about a crazy solution to this problem, probably too crazy to implement in practice, but I want to write it down before it flees my mind.

The problem

Just to recapitulate, the original idea was to have reified type variables in the code, so that a generic function like:

let foo[T](x: T) = ...

would actually receive T as a value, though one that would be passed automatically by default if not explicitly specified by the programmer, i.e., when calling foo(5), the compiler would have enough information to actually call foo[Int](5) under the hood without the programmer having to spell it out.

The problem is how to handle heterogeneous data structures, such as lists of arbitrary objects. For example, when deserializing a JSON object like [1, "foo", true] into a List[T], there is no value we can give for T that carries enough information to decode any element of the list.

The solution

The solution I had proposed in the previous post was to have a Dynamic type which encapsulates the type information and the value, so you would use a List[Dynamic] here. The problem is that every value of the list has to be wrapped in a Dynamic container, i.e., the list becomes [Dynamic(1), Dynamic("foo"), Dynamic(true)].

But there is a more unconventional possibility hanging around here. First, the problem here is typing a heterogeneous sequence of elements as a list. But there is another sequence type that lends itself nicely for this purpose: the tuple. So although [1, "foo", true] can't be given a type, (1, "foo", true) can be given the type Tuple[Int, Str, Bool]. The problem is that, even if the Tuple type parameters are variables, the quantity of elements is fixed statically, i.e., it doesn't work for typing an arbitrarily long list deserialized from JSON input, for instance. But what if I give this value the type Tuple[*Ts], where * is the splice operator (turns a list into multiple arguments), and Ts is, well, a list of types? This list can be given an actual type: List[Type]. So now we have these interdependent dynamic types floating around, and to know the type of the_tuple[i], the type stored at Ts[i] has to be consulted.

I'm not sure how this would work in practice, though, especially when constructing this list. Though maybe in a functional setting, it might work. Our deserialization function would look like (in pseudo-code):

let parse_list(input: Str): Tuple[*Ts] = {
    if input == "" {
        ()
        # Returns a Tuple[], and Ts is implicitly [].
    } elif let (value, rest) = parse_integer(input) {
        (value, *parse_list(rest))
        # If parse_list(rest) is of type Tuple[*Ts],
        # (value, *parse_list(rest)) is of type Tuple[Int, *Ts].
    } ...
}

For dictionaries, things might be more complicated; the dictionary type is typically along the lines of Dict[KeyType, ValueType], and we are back to the same problem we had with lists. But just as heterogeneous lists map to tuples, we could perhaps map heterogeneous dictionaries to… anonymous records! So instead of having a dictionary {"a": 1, "b": true} of type Dict[Str, T], we would instead have a record (a=1, b=true) of type Record[a: Int, b: Bool]. And just as a dynamic list maps to Tuple[*Ts], a dynamic dictionary maps to Record[**Ts], where Ts is a dictionary of type Dict[Str, Type], mapping each record key to a type.

Could this work? Probably. Would it be practical or efficient? I'm not so sure. Would it be better than the alternative of just having a dynamic container, or even specialized types for dynamic collections? Probably not. But it sure as hell is an interesting idea.

Comentários / Comments

Type parameters and dynamic types

2020-05-30 17:27 +0100. Tags: comp, prog, pldesign, fenius, in-english

In the previous post, I discussed an idea I had for handling dynamic typing in a primarily statically-typed language. In this post, I intend to first, describe the idea a little better, and second, explain what are the problems with it.

The idea

The basic idea is:

Functions can take type arguments as well as value arguments.
Type arguments are passed as actual concrete values that can be consulted at run-time to determine what methods the type supports.
Any value argument without an explicitly declared type gets an implicit type argument associated with it.

For example, consider a function signature like:

let f[A, B](arg1: Int, arg2: A, arg3: B, arg4): Bool = ...

This declares a function f with two explicit type parameters A and B, and four regular value parameters arg1 to arg4. arg1 is declared with a concrete Int type. arg2 and arg3 are declared as having types passed in as type parameters. arg4 does not have an explicit type, so in effect it behaves as if the function had an extra type parameter C, and arg4 has type C.

When the function is called, the type arguments don't have to be passed explicitly; rather, they will be automatically provided by the types of the expressions used as arguments. So, if I call f(42, "hello", 1.0, True), the compiler will implicitly pass the types Str and Float for A and B, as well as Bool for the implicit type parameter C.

In the body of f, whenever the parameters with generic types are used, the corresponding type parameters can be consulted at run-time to find the approprate methods to call. For example, if arg2.foo() is called, a lookup for the method foo inside A will happen at run-time. This lookup might fail, in which case we would get an exception.

This all looks quite beautiful.

The problem

The problem is when you introduce generic data structures into the picture. Let's consider a generic list type List[T], where T is a type parameter. Now suppose you have a list like [42, "hello", 1.0, True] (which you might have obtained from deserializing a JSON file, for instance). What type can T be? The problem is that, unlike the case for functions, there is one type variable for multiple elements. If all type information must be encoded in the value of the type parameter, there is no way to handle a heterogeneous list like this.

Having a union type here (say, List[Int|Str|Float|Bool]) will not help us, because union types require some way to distinguish which element of the union a given value belongs to, but the premise was for all type information to be carried by the type parameter so you could avoid encoding the type information into the value.

For a different example, consider you want to have a list objects satisfying an interface, e.g., List[JSONSerializable]. Different elements of the list may have different types, and therefore different implementations of the interface, and you would need type information with each individual element to be able to know at run-time where to find the interface implementation for each element.

Could this be worked around? One way would be to have a Dynamic type, whose implementation would be roughly:

record Dynamic(
    T: Type,
    value: T,
)

The Dynamic type contains a value and its type. Note that the type is not declared as a type parameter of Dynamic: it is a member of Dynamic. The implication is that a value like Dynamic(Int, 5) is not of type Dynamic[Int], but simply Dynamic: there is a single Dynamic type container which can hold values of any type and carries all information about the value's type within itself. (I believe this is an existential type, but I honestly don't know enough type theory to be sure.)

Now our heterogeneous list can simply be a List[Dynamic]. The problem is that to use this list, you have to wrap your values into Dynamic records, and unwrap them to use the values. Could it happen implicitly? I'm not really sure. Suppose you have a List[Dynamic] and you want to pass it to a function expecting a List[Int]. We would like this to work, if we want static and dynamic code to run along seamlessly. But this is not really possible, because the elements of a List[Dynamic] and a List[Int] have different representations. You would have to produce a new list of integers from the original one, unwrapping every element of the original list out of its Dynamic container. The same would happen if you wanted to pass a List[Int] to a function expecting a List[Dynamic].

All of this may be workable, but it is a different experience from regular gradual typing where you expect this sort of mixing and matching of static and dynamic code to just work.

[Addendum (2020-05-31): On the other hand, if I had an ahead-of-time statically-typed compiled programming language that allowed me to toss around types like this, including allowing user-defined records like Dynamic, that would be really cool.]

EOF

That's all I have for today, folks. In a future post, I intend to explore how interfaces work in a variety of different languages.

4 comentários / comments

Types and Fenius

2020-05-19 21:35 +0100. Tags: comp, prog, pldesign, fenius, in-english

Hello, fellow readers! In this post, I will try to write down some ideas that have been haunting me about types, methods and namespaces in Fenius.

I should perhaps start with the disclaimer that nothing has really happened in Fenius development since last year. I started rewriting the implementation in Common Lisp recently, but I only got to the parser so far, and the code is still not public. I have no haste in this; life is already complicated enough without one extra thing to feel guilty about finishing, and the world does not have a pressing need for a new programming language either. But I do keep thinking about it, so I expect to keep posting ideas about programming language design here more or less regularly.

So, namespaces

A year ago, I pondered whether to choose noun-centric OO (methods belong to classes, as in most mainstream OO languages) or verb-centric OO (methods are independent entities grouped under generic functions, as in Common Lisp). I ended up choosing noun-centric OO, mostly because classes provide a namespace grouping related methods, so:

You don't have to deal with name conflicts if two classes define independent methods with the same name;
You don't have to import each method separately. In fact you don't even have to import the class: if you have an object, you can call its methods without even knowing which class it belongs to (in a dynamic language context).

This choice has a number of problems, though; it interacts badly with other features I would like to have in Fenius. Consider the following example:

Suppose I have a bunch of classes that I want to be able to serialize to JSON. Some of these classes may be implemented by me, so I can add a to_json() method to them, but others come from third-party code that I cannot change. Even if the language allows me to add new methods to existing classes, I would rather not add a to_json() method to those classes because they might, in the future, decide to implement their own to_json() method, possibly in a different way, and I would be unintentionally overriding the library method which others might depend on.

What I really want is to be able to declare an interface of my own, and implement it in whatever way I want for any class (much like a typeclass in Haskell, or a trait in Rust):

from third_party import Foo

interface JSONSerializable {
    let to_json()
}

implement JSONSerializable for Foo {
    let to_json() = {
         ...
    }
}

In this way, the interface serves as a namespace for to_json(), so that even if Foo implements its own to_json() in the future, it would be distinct from the one I defined in my interface.

The problem is: if I have an object x of type Foo and I call x.to_json(), which to_json() is called?

One way to decide that would be by the declared type of x: if it's declared as Foo, it calls Foo's to_json(), and JSONSerializable's to_json() is not even visible. If it's declared as JSONSerializable, then the interface's method is called. The problem is that Fenius is supposed to be a dynamically-typed language: the declared (static) type of an object should not affect its dynamic behavior. A reference to an object, no matter how it was obtained, should be enough to access all of the object's methods.

Solution 1: Interface wrappers

One way to conciliate things would be to make it so that the interface wraps the implementing object. By this I mean that, if you have an object x of type Foo, you can call JSONSerializable(x) to get another object, of type JSONSerializable, that wraps the original x, and provides the interface's methods.

Moreover, function type declarations can be given the following semantics: if a function f is declared as receiving a parameter x: SomeType, and it's called with an argument v, x will be bound to the result of SomeType.accept(v). For interfaces, the accept method returns an interface wrapper for the given object, if the object belongs to a class implementing the interface. Other classes can define accept in any way they want to implement arbitrary casts. The default implementation for class.accept(v) would be to return v intact if it belongs to class, and raise an exception if it doesn't.

Solution 2: Static typing with dynamic characteristics

Another option is to actually go for static typing, but in a way that still allows dynamic code to co-exist more or less transparently with it.

In this approach, which methods are visible in a given dot expression x.method is determined by the static type of x. One way to see this is that x can have multiple methods, possibly with the same name, and the static type of x acts like a lens filtering a specific subset of those methods.

What happens, then, when you don't declare the type of the variable/parameter? One solution would be implicitly consider those as having the basic Object type, but that would make dynamic code extremely annoying to use. For instance, if x has type Object, you cannot call x+1 because + is not defined for Object.

Another, more interesting solution, is to consider any untyped function parameter as a generic. So, if f(x) is declared without a type for x, this is implicitly equivalent to declaring it as f(x: A), for a type variable A. If this were a purely static solution, this would not solve anything: you still cannot call addition on a generic value. But what if, instead, A is passed as a concrete value, implicitly, to the function? Then our f(x: A) is underlyingly basically f(x: A, A: Type), with A being a type value packaging the known information about A. When I call, for instance, f(5), under the hood the function is called like f(5, Int), where Int packages all there is to know about the Int type, including which methods it supports. Then if f's body calls x+1, this type value can be consulted dynamically to look up for a + method.

Has this been done before? Probably. I still have to do research on this. One potential problem with this is how the underlying interface of generic vs. non-generic functions (in a very different sense of 'generic function' from CLOS!) may differ. This is a problem for functions taking functions as arguments: if your function expects an Int -> Int function as argument and I give it a A -> Int function instead, that should work, but underlyingly an A -> Int takes an extra argument (the A type itself). This is left as an exercise for my future self.

Gradual typing in reverse

One very interesting aspect of this solution is that it's basically the opposite of typical gradual typing implementations: instead of adding static types to a fundamentally dynamic language, this adds dynamic powers to a fundamentally static system. All the gradual typing attempts I have seen so far try to add types to a pre-existing dynamic language, which makes an approach like this one less palatable since one wants to be able to give types to code written in a mostly dynamic style, including standard library functions. But if one is designing a language from scratch, one can design it in a more static-types-friendly way, which would make this approach more feasible.

I wonder if better performance can be achieved in this scenario, since in theory the static parts of the code can happily do their stuff without ever worrying about dynamic code. I also wonder if boxing/unboxing of values when passing them between the dynamic and static parts of the code can be avoided as well, since all the extra typing information can be passed in the type parameter instead. Said research, as always, will require more and abundant funding.

Comentários / Comments

Chez Scheme vs. SBCL: a comparison

2019-11-14 11:06 -0300. Tags: comp, prog, lisp, scheme, in-english

Back at the beginning of the year, when I started working on what would become Fenius (which I haven't worked on for a good while now; sorry about that), I was divided between two languages/platforms for writing the implementation: Chez Scheme (a Scheme implementation) and SBCL (a Common Lisp implementation). I ended up choosing Chez Scheme, since I like Scheme better. After using Chez for a few months now, however, I've been thinking about SBCL again. In this post, I ponder the two options.

Debugging and interactive development

The main reason I've been considering a switch is this: my experience with interactive development with Chez has been less than awesome. The stack traces are uninformative: they don't show the line of code corresponding to each frame (rather, they show the line of code of the entire function, and only after you ask to enter debug mode, inspect the raise continuation, and print the stack frames), and you can't always inspect the values of parameters/local variables. The recommended way to debug seems to be to trace the functions you want to inspect; this will print each call to the function (with arguments) and the return value of each call. But you must do it before executing the function; it won't help you interpret the stack trace of an exception after the fact.

The interaction between Chez and Geiser (an Emacs mode for interactive development with Scheme) often breaks down too: sometimes, trying to tab-complete an identifier will hang Emacs. From my investigation, it seems that what happens is that the Chez process will enter the debugger, but Geiser is unaware of that and keeps waiting for the normal > prompt to appear. Once that happens, it's pretty much stuck forever (you can't tab-complete anymore) until you restart Chez. There is probably a solution to this; I just don't know what it is.

As I have mentioned before, Chez has no concept of running the REPL from inside a module (library in Scheme terminology), which means you can't call the private functions of a module from the REPL. The solution is… not to use modules, or to export everything, or split the code so you can load the module code without the module-wrapping form.

By contrast, SBCL works with SLIME, the Superior Lisp Interaction Mode for Emacs. SLIME lets you navigate the stack trace, see the values of local variables by pressing TAB on the frame, you can press v to jump to the code corresponding to a stack frame (right to the corresponding expression, not just the line), among other features. Common Lisp is more committed to interactive development than Scheme in general, so this point is a clear win for SBCL.

(To be fair, Guile Scheme has pretty decent support for interactive development. However, Guile + Geiser cannot do to stack traces what SBCL + SLIME can.)

Execution speed

In my experience, SBCL and Chez are both pretty fast – not at the "as fast as hand-crafted C" level, but pretty much as fast as I could desire for a dynamically-typed, garbage-collected, safe language. In their default settings, Chez actually often beats SBCL, but SBCL by default generates more debugger-friendly code. With all optimizations enabled, Chez and SBCL seem to be pretty much the same in terms of performance.

One advantage SBCL has is that you can add type annotations to code to make it faster. Be careful with your optimization settings, though: if you compile with (optimize (safety 0)), "[a]ll declarations are believed without assertions", i.e., the compiler will generate code that assumes your types are correct, and will produce undefined behavior (a.k.a. nasal demons) in case it is not.

Startup time and executable size

This one is complicated. In my machine, Chez compiles a "hello world" program to a 837-byte .so file, which takes about 125ms to run – a small but noticeable startup time. A standalone binary compiled with chez-exe weighs in at 2.7MB and takes 83ms – just barely noticeable.

As for SBCL, a "hello world" program compiles to a 228-byte .fasl file, which runs in 15ms, which is really good. The problem is if the file loads libraries. For instance, if I add this to the beginning of the "hello world":

(require 'asdf)        ;; to be able to load ASDF packages
(require 'cl-ppcre)    ;; a popular regex library

…now the script takes 422ms to run, which is very noticeable.

SBCL can also generate standalone executables, which are actually dumps of the whole running SBCL image: you can load all the libraries you want and generate an executable with all of them preloaded. If we do that, we're back to the excellent 15ms startup time – but the executable has 45MB, because it contains a full-fledged SBCL in it (plus libraries). It's a bit of a bummer if you intend to create multiple independent command-line utilities, for example. Also, I guess it's easier to convince people to download a 2.7MB file than a 45MB one when you want them to try out your fancy new application, though that may not be that much of a concern these days. (The binary compresses down to 12MB with gzip, and 7.6MB with xz.)

Another worry I have is memory consumption (which is a concern in cheap VPSes such as the one running this blog, for instance): running a 45MB binary will use at least 45MB of RAM, right? Well, not necessarily. When you run an executable, the system does not really load all of the executable's contents into memory: it maps the code (and other) sections of the executable into memory, but they will actually only be loaded from the disk to RAM as the memory pages are touched by the process. This means that most of those 45MB might never actually take up RAM.

In fact, using GNU time (not the shell time builtin, the one in /usr/bin, package time on Debian) to measure maximum memory usage, the SBCL program uses 19MB of RAM, while the Chez program uses 27MB. So the 45MB SBCL binary is actually more memory-friendly than the 2.7MB Chez one. Who'd guess?

Available libraries

Common Lisp definitely has the edge here, with over a thousand libraries (of varying quality) readily available via Quicklisp. There is no central repository or catalog of Chez (or Scheme) libraries, and there are not many Chez libraries that I'm aware of (although I wish I had learned about Thunderchez earlier).

[Addendum (2019-11-16): @Caonima67521344 on Twitter points out there is the Akku package manager for Chez and other Schemes.]

The language itself

This one is a matter of personal taste, but I just like Scheme better than Common Lisp. I like having a single namespace for functions and variables (which is funny considering I was a big fan of Common Lisp's two namespaces back in the day), and not having to say funcall to call a function stored in a variable. I like false being distinct from the empty list, and for cdr of the empty list to be an error rather than nil. I like Scheme's binding-based modules better than Common Lisp's symbol-based packages (although Chez modules are annoying to use, as I mentioned before; Guile is better in this regard). Common Lisp's case insensitivity by default plus all standard symbols being internally uppercase is a bit annoying too. Scheme has generally more consistent names for things as well. I used to dislike hygienic macros back in the day, but nowadays, having syntax-case available to break hygiene when necessary, I prefer hygienic macros as well.

And yet… Common Lisp and Scheme aren't that different. Most of those things don't have a huge impact in the way I code. (Well, macros are very different, but anyway.) One things that does have an impact is using named let and recursion in Scheme vs. loops in Common Lisp: named let (similar to Clojure's loop/recur) is one of my favorite Scheme features, and I use it all the time. However, it is not difficult to implement named let as a macro in Common Lisp, and if you only care about tail-recursive named let (i.e., Clojure's loop/recur), it's not difficult to implement an efficient version of it in Common Lisp as a macro. Another big difference is call/cc (first class continuations) in Scheme, but I pretty much never use call/cc in my code, except possibly as escape continuations (which are equivalent to Common Lisp's return).

On the flip side, Common Lisp has CLOS (the Common Lisp Object System) in all its glory, with generic functions and class redefinition powers and much more. Guile has GOOPS, which provides many of the same features, but I'm not aware of a good equivalent for Chez.

Conclusion

As is usually the case when comparing programming languages/platforms, none of the options is an absolute winner in all regards. Still, for my use case and for the way I like to program, SBCL looks like a compelling option. I'll have to try it for a while and see how it goes, and tell you all about it in a future post.

Computers, languages, and computer languages. Às vezes em Português, sometimes in English.

Posts com a tag: prog

2024-06-17 15:52 +0100. Tags: comp, prog, pldesign, ruby, in-english

Block scope

Block control flow

proc vs. lambda

Miscellanea

EOF

2024-06-16 11:12 +0100. Tags: comp, prog, pldesign, ruby, in-english

Types of variables

Variables vs. methods

Local variable scope

Constant resolution

Uninitialized variable access

EOF

2024-06-14 21:17 +0100. Tags: comp, prog, pldesign, ruby, in-english

All calls are method calls

2024-03-26 15:19 +0000. Tags: comp, prog, pldesign, fenius, lisp, life, in-english

Fenius

Life

EOF

2023-01-22 00:05 +0000. Tags: comp, prog, pldesign, fenius, lisp, in-english

What kind of language I’m trying to build?

F-expressions

Versioning

Implementation strategies

EOF

2020-12-13 10:06 +0000. Tags: comp, prog, golang, in-english

Why did I choose Go for this project?

Do repeat yourself

The while problem

Iteration, but not for you

Generics, but not for you

Error handling

No inheritance

No love for unfinished programs

One-letter identifiers are the norm

Testing

The attitude

So why do I keep using this language?

2020-07-24 22:19 +0100. Tags: comp, prog, pldesign, fenius, in-english

The problem

The solution

2020-05-30 17:27 +0100. Tags: comp, prog, pldesign, fenius, in-english

The idea

The problem

EOF

2020-05-19 21:35 +0100. Tags: comp, prog, pldesign, fenius, in-english

So, namespaces

Solution 1: Interface wrappers

Solution 2: Static typing with dynamic characteristics

Gradual typing in reverse

2019-11-14 11:06 -0300. Tags: comp, prog, lisp, scheme, in-english

Debugging and interactive development

Execution speed

Startup time and executable size

Available libraries

The language itself

Conclusion

Main menu

Recent posts

Recent comments

Tags

Elsewhere

Quod vide

Posts com a tag: `prog`

The `while` problem