Some notes on Ruby #3: blocks :: Elmord's Magic Valley

Some notes on Ruby #3: blocks

2024-06-17 15:52 +0100. Tags: comp, prog, pldesign, ruby, in-english

[This post is part of a series on Ruby semantics.]

In the third installment of this series, we are going to have a look on one of Ruby’s most prominent features: blocks.

A block is a piece of code that can be invoked with arguments and produce a value. Blocks can be written like this:

{|arg1, arg2, ...| body}

Or like this:

do |arg1, arg2, ...|
  body
end

These forms are equivalent, except for precedence: f g { block } is interpreted as f(g { block }) (the block is passed to g), while f g do block end is interpreted as f(g) { block } (the block is passed to f). The |arguments| can be omitted if the block takes no arguments. My impression is that the do syntax is preferred for multi-line blocks.

Blocks are kind of like anonymous functions, but they are not really first-class: a bare block like { puts 42 } on its own is a syntax error, and {} is interpreted as an empty dictionary (hash in Ruby terminology). The only place a block can appear is at the end of a method call, like f(x, y) { puts 42 } or f x, y do puts 42 end. This will make the block available to the method, which can use it in a number of ways.

Within the method, yield(arg1, arg2, ...) will invoke the block with the given arguments; whatever the block returns is the result of the yield call. The number of passed arguments generally does not have to match the number of arguments expected by the block: extra arguments are ignored, and missing arguments are assigned nil. (The only exception seems to be keyword arguments declared by the block without a default value; these will raise an error if not passed.)

def one_two_three
  yield 1
  yield 2
  yield 3
end

one_two_three {|x| puts x*10}  # prints 10, 20, 30


# We can also use the value produced by the block within the method.
def map_one_two_three
  [yield(1), yield(2), yield(3)]
end

map_one_two_three {|x| x*10}   # => [10, 20, 30]

The method does not have to declare that it accepts a block: you can pass a block to any method, it’s just that some will do something useful with it and some won’t. Within the method, you can test if it was called with a block by calling the block_given? predicate. Many methods from the Ruby standard library can be called with or without a block and adapt their behavior accordingly. For example, open("somefile") returns a file object, but open("somefile") {|f| ...} opens the file, passes the file object as an argument to the block, and closes the file when the block finishes (analogous to using with in Python). Another example is the Array constructor:

Array.new with no arguments returns an empty array;
Array.new(5) returns a 5-element array initialized to nil;
Array.new(5) {|i| i*i} returns a 5-element array, calling the block with each array index to initialize the corresponding array position, in this case resulting in [0, 1, 4, 9, 16].

Yet another example is the times method of the Integer class. With a block, it calls the block n times (where n is the integer), passing an iteration counter to the block as an argument:

irb(main):024:0> 5.times {|i| puts "Hello number #{i}!" }
Hello number 0!
Hello number 1!
Hello number 2!
Hello number 3!
Hello number 4!
=> 5

If you don’t need the iteration counter, you can just pass a block taking no arguments (and now we can see why Ruby allows block arguments not to match exactly with the values they are invoked with):

irb(main):025:0> 5.times { puts "Hello!" }
Hello!
Hello!
Hello!
Hello!
Hello!
=> 5

And finally, if you don’t pass it any block, it returns an Enumerator instance, which supports a bunch of methods, such as map or sum:

irb(main):035:0> 5.times.map {|x| x*x}
=> [0, 1, 4, 9, 16]


irb(main):036:0> 5.times.sum   # 0 + 1 + 2 + 3 + 4
=> 10

Another way a method can use a block is by declaring an &argument in its argument list: in this case, the block will be reified into a Proc object and will be available as a regular object to the method:

# This is equivalent to the `yield` version.
def one_two_three(&block)
  block.call(1)
  block.call(2)
  block.call(3)
end

Conversely, if you have a Proc object and you want to pass it to a method expecting a block, you can use the & syntax in the method call:

# Make a Proc out of a block...
tenfold = proc {|x| puts x*10}

# ...and pass it to a procedure expecting a block.
# This works with either version of one_two_three.
one_two_three(&tenfold)  # prints 10, 20, 30

In the above example, we also see another way we can turn a block into a Proc object: by passing it to the builtin proc method.

Block scope

Blocks can see the local variables that were defined at the time the block was created. Assignment to such variables modify the variable outside the block. Assignment to any other variable creates a local variable visible within the block and any nested blocks, but not outside.

x = 1

1.times {
    puts x      # this is the outer x
    x = 2       # this is still the outer x
    y = 3       # this is local to the block
    1.times {
        puts y  # this is the outer y
        y = 4   # this is still the outer y
    }
    puts y      # prints 4
}
puts x          # prints 2
puts y          # error: undefined variable

An exception to this are the block parameters: a block parameter is always a fresh variable, even if a local variable with the same name already exists. (Before Ruby 1.9, this was not the case: a block parameter with the same name as a local variable would overwrite the variable when the block was called.)

You can explicitly ask for a fresh variable to be created by declaring them in the parameter list after a semicolon:

x = 1
y = 2

1.times {|i; x, y| # i is the block argument; x and y are fresh variables
  x = 10
  y = 20
  puts x       # prints 10
  puts y       # prints 20
}

puts x         # prints 1
puts y         # prints 2

Block control flow

Within the block, next can be used to return from the block back to the yield that invoked it. If an argument is passed to next, it will become the value returned by the yield call:

def do_stuff
  result = yield 1
  puts "The result is #{result}"
end

do_stuff {
  puts "I'm here"
  next 42
  puts "This line will never run"
}

# prints:
#   I'm here
#   The result is 42

This construct is analogous to a continue in Python loop. For example:

5.times {|i|
    if i%2 == 0
      next  # skip even numbers
    end

    puts i
}

# prints:
#   1
#   3

Although it is more idiomatic to use the postfixed if in this case:

5.times {|i|
   next if i%2 == 0  # skip even numbers
   puts i
}

break within the block can be used to return from the method that called the block. Again, if an argument is passed to break, it becomes the return value of the method. For example:

def do_stuff
  result = yield 1
  puts "The result is #{result}"
end

x = do_stuff {
  puts "I'm here"
  break 42
  puts "This line will never run"
}

In this code, do_stuff invokes the block, which prints I'm here and causes do_stuff to return 42 immediately. Nothing else is printed; the The result is ... line won’t run. The return value (42) is assigned to x.

redo jumps back to the beginning of the block. It accepts no arguments. I’m sure this is useful in some circumstance, though my creativity partly fails me, and partly does not see why this would be useful only within blocks. But now you know it exists.

return within a block returns from the method the block is in. For example:

def do_stuff
  result = yield 1
  puts "The result is #{result}"
end

def foo
  do_stuff {
    puts "I'm here"
    return 42  # returns from `foo`
    puts "This line will never run"
  }
  puts "And neither will this"
end

foo  # prints "I'm here" and returns 42

proc vs. lambda

lambda is similar to proc: it takes a block and returns a Proc object. Unlike proc:

The resulting procedure checks that the number of arguments passed to it matches the number of arguments expected by the block;
return within the lambda returns from the block itself, not the enclosing method.

The shortcut syntax -> x, y { body } is equivalent to lambda {|x, y| body }.

Miscellanea

There are many equivalent ways of calling a Proc object:

irb(main):001:0> p = proc {|x| x+1}
=> #<Proc:0x00007f9c4b7b1698 (irb):1>
irb(main):002:0> p.call(5)
=> 6
irb(main):003:0> p.(5)
=> 6
irb(main):004:0> p[5]
=> 6

If a block declares no arguments, the names _1, _2, …, _9 can be used to refer to arguments by number:

irb(main):014:0> [1,2,3,4,5].map { _1 * _1 }
=> [1, 4, 9, 16, 25]

If such a block is turned into a lambda, the resulting procedure will require as many arguments as the highest argument number used:

irb(main):021:0> lambda { _9 }
=> #<Proc:0x00007f9c4b79c518 (irb):21 (lambda)>
irb(main):022:0> lambda { _9 }.call(1)
(irb):22:in `block in <top (required)>': wrong number of arguments (given 1, expected 9) (ArgumentError)

If a block using return in its body is reified into a Proc object using proc, and the Proc object escapes the method it was created in, and is invoked afterwards, the return will cause a LocalJumpError:

def m
  p = proc { return 42 }

  # If we called p.call() here, it would cause `m` to return 42.
  # But instead, we will return `p` to the caller...
  p
end

p = m
# ...and call it here, after `m` has already returned!
p.call()  # error: in `block in m': unexpected return (LocalJumpError)

EOF

That’s all for today, folks. There is still plenty to cover: classes, modules, mixins, the singleton class, eval and metaprogramming shenanigans. I plan to write about these Real Soon Now™.

Elmord's Magic Valley

Computers, languages, and computer languages. Às vezes em Português, sometimes in English.