Elmord's Magic Valley

Computers, languages, and computer languages. Às vezes em Português, sometimes in English.

Posts com a tag: ruby

Some notes on Ruby #3: blocks

2024-06-17 15:52 +0100. Tags: comp, prog, pldesign, ruby, in-english

[This post is part of a series on Ruby semantics.]

In the third installment of this series, we are going to have a look on one of Ruby’s most prominent features: blocks.

A block is a piece of code that can be invoked with arguments and produce a value. Blocks can be written like this:

{|arg1, arg2, ...| body}

Or like this:

do |arg1, arg2, ...|
  body
end

These forms are equivalent, except for precedence: f g { block } is interpreted as f(g { block }) (the block is passed to g), while f g do block end is interpreted as f(g) { block } (the block is passed to f). The |arguments| can be omitted if the block takes no arguments. My impression is that the do syntax is preferred for multi-line blocks.

Blocks are kind of like anonymous functions, but they are not really first-class: a bare block like { puts 42 } on its own is a syntax error, and {} is interpreted as an empty dictionary (hash in Ruby terminology). The only place a block can appear is at the end of a method call, like f(x, y) { puts 42 } or f x, y do puts 42 end. This will make the block available to the method, which can use it in a number of ways.

Within the method, yield(arg1, arg2, ...) will invoke the block with the given arguments; whatever the block returns is the result of the yield call. The number of passed arguments generally does not have to match the number of arguments expected by the block: extra arguments are ignored, and missing arguments are assigned nil. (The only exception seems to be keyword arguments declared by the block without a default value; these will raise an error if not passed.)

def one_two_three
  yield 1
  yield 2
  yield 3
end

one_two_three {|x| puts x*10}  # prints 10, 20, 30


# We can also use the value produced by the block within the method.
def map_one_two_three
  [yield(1), yield(2), yield(3)]
end

map_one_two_three {|x| x*10}   # => [10, 20, 30]

The method does not have to declare that it accepts a block: you can pass a block to any method, it’s just that some will do something useful with it and some won’t. Within the method, you can test if it was called with a block by calling the block_given? predicate. Many methods from the Ruby standard library can be called with or without a block and adapt their behavior accordingly. For example, open("somefile") returns a file object, but open("somefile") {|f| ...} opens the file, passes the file object as an argument to the block, and closes the file when the block finishes (analogous to using with in Python). Another example is the Array constructor:

Yet another example is the times method of the Integer class. With a block, it calls the block n times (where n is the integer), passing an iteration counter to the block as an argument:

irb(main):024:0> 5.times {|i| puts "Hello number #{i}!" }
Hello number 0!
Hello number 1!
Hello number 2!
Hello number 3!
Hello number 4!
=> 5

If you don’t need the iteration counter, you can just pass a block taking no arguments (and now we can see why Ruby allows block arguments not to match exactly with the values they are invoked with):

irb(main):025:0> 5.times { puts "Hello!" }
Hello!
Hello!
Hello!
Hello!
Hello!
=> 5

And finally, if you don’t pass it any block, it returns an Enumerator instance, which supports a bunch of methods, such as map or sum:

irb(main):035:0> 5.times.map {|x| x*x}
=> [0, 1, 4, 9, 16]


irb(main):036:0> 5.times.sum   # 0 + 1 + 2 + 3 + 4
=> 10

Another way a method can use a block is by declaring an &argument in its argument list: in this case, the block will be reified into a Proc object and will be available as a regular object to the method:

# This is equivalent to the `yield` version.
def one_two_three(&block)
  block.call(1)
  block.call(2)
  block.call(3)
end

Conversely, if you have a Proc object and you want to pass it to a method expecting a block, you can use the & syntax in the method call:

# Make a Proc out of a block...
tenfold = proc {|x| puts x*10}

# ...and pass it to a procedure expecting a block.
# This works with either version of one_two_three.
one_two_three(&tenfold)  # prints 10, 20, 30

In the above example, we also see another way we can turn a block into a Proc object: by passing it to the builtin proc method.

Block scope

Blocks can see the local variables that were defined at the time the block was created. Assignment to such variables modify the variable outside the block. Assignment to any other variable creates a local variable visible within the block and any nested blocks, but not outside.

x = 1

1.times {
    puts x      # this is the outer x
    x = 2       # this is still the outer x
    y = 3       # this is local to the block
    1.times {
        puts y  # this is the outer y
        y = 4   # this is still the outer y
    }
    puts y      # prints 4
}
puts x          # prints 2
puts y          # error: undefined variable

An exception to this are the block parameters: a block parameter is always a fresh variable, even if a local variable with the same name already exists. (Before Ruby 1.9, this was not the case: a block parameter with the same name as a local variable would overwrite the variable when the block was called.)

You can explicitly ask for a fresh variable to be created by declaring them in the parameter list after a semicolon:

x = 1
y = 2

1.times {|i; x, y| # i is the block argument; x and y are fresh variables
  x = 10
  y = 20
  puts x       # prints 10
  puts y       # prints 20
}

puts x         # prints 1
puts y         # prints 2

Block control flow

Within the block, next can be used to return from the block back to the yield that invoked it. If an argument is passed to next, it will become the value returned by the yield call:

def do_stuff
  result = yield 1
  puts "The result is #{result}"
end

do_stuff {
  puts "I'm here"
  next 42
  puts "This line will never run"
}

# prints:
#   I'm here
#   The result is 42

This construct is analogous to a continue in Python loop. For example:

5.times {|i|
    if i%2 == 0
      next  # skip even numbers
    end

    puts i
}

# prints:
#   1
#   3

Although it is more idiomatic to use the postfixed if in this case:

5.times {|i|
   next if i%2 == 0  # skip even numbers
   puts i
}

break within the block can be used to return from the method that called the block. Again, if an argument is passed to break, it becomes the return value of the method. For example:

def do_stuff
  result = yield 1
  puts "The result is #{result}"
end

x = do_stuff {
  puts "I'm here"
  break 42
  puts "This line will never run"
}

In this code, do_stuff invokes the block, which prints I'm here and causes do_stuff to return 42 immediately. Nothing else is printed; the The result is ... line won’t run. The return value (42) is assigned to x.

redo jumps back to the beginning of the block. It accepts no arguments. I’m sure this is useful in some circumstance, though my creativity partly fails me, and partly does not see why this would be useful only within blocks. But now you know it exists.

return within a block returns from the method the block is in. For example:

def do_stuff
  result = yield 1
  puts "The result is #{result}"
end

def foo
  do_stuff {
    puts "I'm here"
    return 42  # returns from `foo`
    puts "This line will never run"
  }
  puts "And neither will this"
end

foo  # prints "I'm here" and returns 42

proc vs. lambda

lambda is similar to proc: it takes a block and returns a Proc object. Unlike proc:

The shortcut syntax -> x, y { body } is equivalent to lambda {|x, y| body }.

Miscellanea

There are many equivalent ways of calling a Proc object:

irb(main):001:0> p = proc {|x| x+1}
=> #<Proc:0x00007f9c4b7b1698 (irb):1>
irb(main):002:0> p.call(5)
=> 6
irb(main):003:0> p.(5)
=> 6
irb(main):004:0> p[5]
=> 6

If a block declares no arguments, the names _1, _2, …, _9 can be used to refer to arguments by number:

irb(main):014:0> [1,2,3,4,5].map { _1 * _1 }
=> [1, 4, 9, 16, 25]

If such a block is turned into a lambda, the resulting procedure will require as many arguments as the highest argument number used:

irb(main):021:0> lambda { _9 }
=> #<Proc:0x00007f9c4b79c518 (irb):21 (lambda)>
irb(main):022:0> lambda { _9 }.call(1)
(irb):22:in `block in <top (required)>': wrong number of arguments (given 1, expected 9) (ArgumentError)

If a block using return in its body is reified into a Proc object using proc, and the Proc object escapes the method it was created in, and is invoked afterwards, the return will cause a LocalJumpError:

def m
  p = proc { return 42 }

  # If we called p.call() here, it would cause `m` to return 42.
  # But instead, we will return `p` to the caller...
  p
end

p = m
# ...and call it here, after `m` has already returned!
p.call()  # error: in `block in m': unexpected return (LocalJumpError)

EOF

That’s all for today, folks. There is still plenty to cover: classes, modules, mixins, the singleton class, eval and metaprogramming shenanigans. I plan to write about these Real Soon Now™.

Comentários / Comments

Some notes on Ruby #2: variables, constants, and scope

2024-06-16 11:12 +0100. Tags: comp, prog, pldesign, ruby, in-english

[This post is part of a series on Ruby semantics.]

I’m still trying to wrap my head around all the intricacies of variable/name scope in Ruby. These notes are part of my attempt to figure it all out, so take it with a grain of salt, and feel free to send corrections and additions in the comments.

As I explained in the previous post, the focus of these notes is not on how to use the language, but rather on how it works. This post in particular will deal with a lot of corner cases, which are helpful to figure out what the interpreter is doing. Let’s go!

Types of variables

Ruby has a bunch of different types of variables and variable-like things, distinguished by their initial characters:

Unlike Python, there is no per-file global scope. Global variables ($foo) are true program-wide globals. Constants, instance variables and class variables are properties of various objects: when you define one of those, you are effectively mutating the class/module/instance they were defined in, and the effects will be visible in other places where these objects are used. You can define local variables at the top-level, but they won’t be visible inside any class or method defition, nor is there any concept of importing the variables defined in a different file: when you require another file, you will be able to see the effects of running that file (such as defining constants, instance variables and class variables, which, again, are object mutation rather than what you would think of as variable definition in Python or Scheme), but local variables defined at the file top-level won’t be visible outside it.

Variables vs. methods

The allowed names for local variables and constants are also allowed method names. Because Ruby does not require parentheses in a method call, and also allows the receiver to be omitted (self.f() can be written as f(), which can be written as just f), a bare identifier like foo could be either a method name or a variable/constant name. How does Ruby distinguish those?

First, if the parentheses are used (foo()) , or if there are arguments after the identifier, with or without parentheses (foo 42), then foo is unambiguously interpreted as a method name.

If there are neither parentheses nor arguments, and the identifier begins with a lowercase ASCII letter or an underscore, it will be interpreted as a local variable if there has been a variable assignment to that identifier within the lexical scope of the reference. So in foo = 42; foo, the second foo is a local variable. This disambiguation happens at parse time, and is based on the textual appearance of an assignment in the scope of the reference, regardless of whether the assignment is actually executed at runtime. So, for example:

def foo
    "I'm a method"
end

if false
    foo = "I'm a local variable"
end

p foo  # Prints nil!

When Ruby sees the assignment to foo in the code, it creates a local variable for it, even if the assignment does not run. The variable is initialized with nil.

Note that foo() here would still invoke the method, even though there is a local variable with the same name. You might ask: what if I have a local variable whose value is a function (e.g., a lambda)? How do I call it? In this case, you have to invoke foo.call():

def foo
    "I'm a method"
end

foo = lambda { "I'm a lambda" }

p foo()        # "I'm a method"
p foo          # #<Proc:...>
p foo.call()   # "I'm a lambda"

This is similar to how in Common Lisp, there are distinct namespaces for functions and variables, and you need to use (funcall foo) to call a function stored in a variable. However, because the parentheses are not mandatory in Ruby, it has to do some extra work to guess what you want when it sees a bare identifier.

What about constants with the same name as methods? In this case, the rules are different: Ruby treats an uppercase-initial identifier as a constant unless there are parentheses or arguments:

def A
  "I'm a method"
end

A    # error: uninitialized constant A

A()  # "I'm a method"

Local variable scope

Previously, I said that local variables are visible in the scope they were defined in and nested scopes. That’s not quite true, though, because a lot of syntactic constructs start a clean slate on local variables. For example, local variables defined outside a class declaration are not visible inside it:

x = 1

class Foo
  x  # error: undefined local variable or method `x' for Foo:Class (NameError)
end

The same applies to module and def:

class Foo
  x = 1
  
  def m
    x
  end
end

Foo.new.m  # error: in `m': undefined local variable or method `x' for #<Foo:...> (NameError)

Neither will the variable be accessible via Foo.x, Foo::x, or anything else. It will be visible for code that runs within the class declaration, though:

class Foo
  x = 1
  puts x     # this is fine
  A = x      # and so is this: it initializes the constant `A` with 1
end

Even though Ruby allows multiple declarations of the same class, and each subsequent declaration modifies the existing class rather than defining a new one, local variables declared within one class declaration will not be visible to subsequent declarations of the same class:

class Foo
  x = 1
end

class Foo
  puts x  # error: in `<class:Foo>': undefined local variable or method `x' for Foo:Class (NameError)
end

But note that constants work fine in this case:

class Foo
  A = 1
end

class Foo
  puts A  # prints 1
end

This is because constants are a property of the class object, so a constant declaration mutates the class object and therefore its effect is persistent, whereas local variables only exist within the lexical/textual scope where they were declared.

Constant resolution

Speaking of which, constant scope resolution is the one thing I’m having the hardest time figuring out. It does mostly what you would expect in normal situations, but it does so by quite strange means. What seems to be going on is that Ruby uses lexical scope to determine the dynamic resolution order of the constant. Let me show what I mean.

Classes can be nested, and you can use the constants of the outer class in the inner one:

class A
  X = 1
  
  class B
    def m
      X
    end
  end
end

puts A::B.new.m  # prints 1

You can do this even if the constant definition is not textually within the same class declaration as the method definition:

class A
  X = 1
end

class A
  class B
    def m
      X
    end
  end
end

puts A::B.new.m  # still prints 1

But if you define the method directly in A::B without syntactically nesting it within A, then it doesn’t work:

class A
  X = 1
end

class A::B
  def m
    X
  end
end

puts A::B.new.m  # error: in `m': uninitialized constant A::B::X (NameError)

This resolution is dynamic, though. Let’s go back to our previous example:

class A
  X = 1

  class B
    def m
      X
    end
  end
end

puts A::B.new.m  # still prints 1

The method is getting the constant defined in A. Let’s now add a constant X to B:

class A::B
  X = 2
end

And now if we call the method:

A::B.new.m  # prints 2!

Now method m refers to a constant that did not exist at the time it was defined. In other words, it searches for X at runtime in all classes the method was textually nested in. (Remember that if you define m directly in A::B without textually nesting it in both classes, it only looks up in B.)

What about inheritance? Let’s define some classes:

class One
  X = 1
end

class Two
  X = 2
end

class A < One
  X = 10
  
  class B < Two
    X = 20
    
    def m
      X
    end
  end
end

puts A::B.new.X  # prints 20

Now let’s go about removing constants and seeing what happens:

irb(main):022:0> A::B.send(:remove_const, :X)
=> 20
irb(main):023:0> A::B.new.m
=> 10

It prefers the constant of the outer class over the one from the inheritance chain. Let’s remove that one as well:

irb(main):024:0> A.send(:remove_const, :X)
=> 10
irb(main):025:0> A::B.new.m 
=> 2

Ok, after exhausting the outer class chain, it falls back to the inheritance chain. What if we remove it from the superclass as well?

irb(main):026:0> Two.send(:remove_const, :X)
=> 2
irb(main):027:0> A::B.new.m 
(irb):16:in `m': uninitialized constant A::B::X (NameError)

So it doesn’t try the inheritance chain of the outer class.

One last check: what if you redefine a constant in a subclass but do not redefine the method?

class A
  X = 10

  class B
    X = 20

    def m
      X
    end
  end
end


class C < A::B
  X = 30
end

puts C.new.m  # prints 20

So it looks up based on where the method is defined, not the class it’s called from.

In summary, when Ruby sees a reference to a constant, it tries to find it:

Uninitialized variable access

Accessing an undefined local variable raises an “undefined local variable or method” error. (Because of the ambiguity between variables and method names mentioned before, the error message mentions both cases here.) Similarly, accessing an undefined constant is an error.

Accessing an uninitialized global variable produces nil. If you run the code with warnings enabled (ruby -w), you will also get a warning about it.

Accessing an uninitialized instance variable produces nil and no warning. There used to be one but it was removed in Ruby 3.0.

Finally, accessing an uninitialized class variable raises an error (just like locals and constants, but unlike instance variables).

EOF

That’s all for today, folks. I did not even get to blocks in this post, but they’ll have to wait for a post of their own. Stay tuned!

Comentários / Comments

Some notes on Ruby #1: method definition and calls

2024-06-14 21:17 +0100. Tags: comp, prog, pldesign, ruby, in-english

[This post is part of a series on Ruby semantics.]

I’ve been studying Ruby recently for a job opportunity. The job did not pan out in the end, and therefore I’ll probably not continue with my Ruby studies, but I want to write down some things I learned before I forget them.

The focus of these notes is not on how to use the language, but rather on how it works, i.e., the language semantics. This may end up making the language seem weirder than it actually is in practice, because a lot of the examples will be dealing with corner cases. I will be writing this from a Python (and sometimes Lisp) perspective.

All calls are method calls

Functions and methods, though superficially similar, work very differently in Python and Ruby. In both languages, x.f(a) mean “call method f of object x with argument a”, but it works quite differently behind the scenes:

In Ruby, x.f on its own is equivalent to x.f(), i.e., send the message f with no arguments to object x. In general, parentheses can be omitted from method calls if there is no ambiguity.

f() on its own is equivalent to self.f(). Whereas in Python, self is an argument of the function that implements a method and has to be defined explicitly, in Ruby self is a keyword that refers to the current object and is always available.

Likewise, def always defines a method. Whereas in Python def defines a function in the local scope, in Ruby def defines a method in the current class. So, for example:

class Foo
  def g
    def h
      42
    end
  end
end

This defines a class Foo with a method g, which, when called, defines method h in class Foo. So, afterwards:

irb(main):008:0> x = Foo.new
=> #<Foo:0x00007f72a83dd3b8>

irb(main):009:0> x.h  # Method does not exist yet.
(irb):9:in `<main>': undefined method `h' for #<Foo:0x00007f72a83dd3b8> (NoMethodError)
        from /usr/lib/ruby/gems/3.1.0/gems/irb-1.4.1/exe/irb:11:in `<top (required)>'
        from /usr/bin/irb:25:in `load'
        from /usr/bin/irb:25:in `<main>'

irb(main):010:0> x.g  # When g is called, h is defined.
=> :h

irb(main):011:0> x.h  # Now h exists.
=> 42

(In this sense, Python’s def is more like Scheme’s define, whereas Ruby’s def is more like Common Lisp’s defun or defmethod.)

If a new Foo object is instantiated now, it will have access to method h already, since the def h defined it in the class Foo, not in the instance x:

irb(main):012:0> y = Foo.new
=> #<Foo:0x00007f72a842a3c0>

irb(main):013:0> y.h
=> 42

What if you use def at the top-level outside of a class? Well, in that case, self refers to the main object, which is an instance of Object (the base class of most Ruby classes). So a method defined at the top-level is a method of Object! For example, let’s define a hello method with no arguments (again, the parentheses around the arguments can be omitted):

def hello
  puts "Hello, world!"
end

And now we can call it:

irb(main):019:0> hello
Hello, world!
=> nil

But since the method was defined as a method of Object, won’t it be available on every object?

irb(main):020:0> 4.hello
(irb):20:in `<main>': private method `hello' called for 4:Integer (NoMethodError)
        from /usr/lib/ruby/gems/3.1.0/gems/irb-1.4.1/exe/irb:11:in `<top (required)>'
        from /usr/bin/irb:25:in `load'
        from /usr/bin/irb:25:in `<main>'

Note that the call fails not because the method is not defined, but because the method is private. We can override the access control by using the send method to send the message explicitly to the object:

irb(main):021:0> 4.send(:hello)
Hello, world!
=> nil

And there we go. The code at the top-level effectively runs as if it were inside a:

class Object
  private

  <... your code goes here ...>
end

Note how code that looks superficially like Python and seems to work the same way is actually doing so by very different means. For example, consider a piece of code like:

def multiply(x, y)
  x * y
end

class DeepThought
  def compute_answer()
    multiply(6, 7)
  end
end

puts DeepThought.new().compute_answer()  # prints 42

The method compute_answer uses the multiply method defined at the top-level. In Python, the equivalent code works by searching for multiply in the current environment, finding it at the global scope, and calling the function bound to it. In Ruby, this works by defining multiply as a method of Object, and because DeepThought inherits from Object by default, it has multiply as a method. We could have written self.multiply(6, 7) and we would get the same result.

This means you can easily clobber someone else’s method definitions if you define a method at the top-level. I guess it’s okay to do that if you’re writing a standalone script that won’t be used as part of something bigger, but if you’re writing a library, or a piece of a program consisting of multiple files, you probably want to wrap all your method definitions within a class or module definition. I plan to talk about those in a future blog post. See you next time!

Comentários / Comments

Main menu

Recent posts

Recent comments

Tags

em-portugues (213) comp (152) prog (74) in-english (66) life (49) pldesign (40) unix (39) lang (32) random (28) about (28) mind (26) lisp (25) fenius (22) mundane (22) web (20) ramble (18) img (13) hel (12) rant (12) privacy (10) scheme (10) freedom (8) copyright (7) bash (7) esperanto (7) academia (7) lash (7) music (7) shell (6) mestrado (6) home (6) misc (5) emacs (5) android (5) conlang (5) worldly (4) php (4) book (4) editor (4) latex (4) etymology (4) politics (4) c (3) tour-de-scheme (3) network (3) film (3) kbd (3) ruby (3) wrong (3) security (3) llvm (2) poem (2) wm (2) cook (2) philosophy (2) treta (2) audio (2) comic (2) x11 (2) lows (2) physics (2) german (1) ai (1) perl (1) golang (1) translation (1) wayland (1) en-esperanto (1) old-chinese (1) kindle (1) pointless (1)

Elsewhere

Quod vide


Copyright © 2010-2024 Vítor De Araújo
O conteúdo deste blog, a menos que de outra forma especificado, pode ser utilizado segundo os termos da licença Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International.

Powered by Blognir.