Some notes on Ruby #2: variables, constants, and scope

2024-06-16 11:12 +0100. Tags: comp, prog, pldesign, ruby, in-english

[This post is part of a series on Ruby semantics.]

I’m still trying to wrap my head around all the intricacies of variable/name scope in Ruby. These notes are part of my attempt to figure it all out, so take it with a grain of salt, and feel free to send corrections and additions in the comments.

As I explained in the previous post, the focus of these notes is not on how to use the language, but rather on how it works. This post in particular will deal with a lot of corner cases, which are helpful to figure out what the interpreter is doing. Let’s go!

Types of variables

Ruby has a bunch of different types of variables and variable-like things, distinguished by their initial characters:

Local variables begin with a lowercase ASCII letter, an underscore, or a non-ASCII character (i.e., a Unicode codepoint above 127). Any non-ASCII character can be used in an identifier, even things like the zero width space (U+200B). Local variables are visible in the scope they were defined in and nested scopes, kinda (more on that later).
Constants begin with an uppercase ASCII character. Constants belong to the class or module they are defined in (which is Object in the top-level). They cannot be defined or redefined from within methods, but they can be redefined outside of methods (with a warning).
Instance variables begin with @, like @foo. They belong to the current object (i.e., self).
Class variables begin with @@, like @@foo. They belong to the class they are defined in and are shared with all of its subclasses (if a subclass mutates the variable, the superclass will reflect the mutation).
- These are not the same as class instance variables, which are not a distinct variable type, but are simply the instance variables of the class object. Remember, classes are objects too (instances of Class), and therefore have their own instance variables as well, which are distinct from the instance variables of the instances of the class. Class instance variables are not shared with subclasses, because each subclass is a distinct object, with its own (class) instance variables.
- Class variables cannot be accessed from the top-level: unlike constants, they don’t implicitly refer to Object’s class variables in that case. I’m not sure why this inconsistency exists, but it might be because class variables are shared with the subclasses, and therefore defining a class variable on Object by accident would affect almost every class in Ruby, whereas a constant with the same name can be defined in a subclass with no issues.
Finally, global variables begin with $, like $foo, and are visible across the whole program.

Unlike Python, there is no per-file global scope. Global variables ($foo) are true program-wide globals. Constants, instance variables and class variables are properties of various objects: when you define one of those, you are effectively mutating the class/module/instance they were defined in, and the effects will be visible in other places where these objects are used. You can define local variables at the top-level, but they won’t be visible inside any class or method defition, nor is there any concept of importing the variables defined in a different file: when you require another file, you will be able to see the effects of running that file (such as defining constants, instance variables and class variables, which, again, are object mutation rather than what you would think of as variable definition in Python or Scheme), but local variables defined at the file top-level won’t be visible outside it.

Variables vs. methods

The allowed names for local variables and constants are also allowed method names. Because Ruby does not require parentheses in a method call, and also allows the receiver to be omitted (self.f() can be written as f(), which can be written as just f), a bare identifier like foo could be either a method name or a variable/constant name. How does Ruby distinguish those?

First, if the parentheses are used (foo()) , or if there are arguments after the identifier, with or without parentheses (foo 42), then foo is unambiguously interpreted as a method name.

If there are neither parentheses nor arguments, and the identifier begins with a lowercase ASCII letter or an underscore, it will be interpreted as a local variable if there has been a variable assignment to that identifier within the lexical scope of the reference. So in foo = 42; foo, the second foo is a local variable. This disambiguation happens at parse time, and is based on the textual appearance of an assignment in the scope of the reference, regardless of whether the assignment is actually executed at runtime. So, for example:

def foo
    "I'm a method"
end

if false
    foo = "I'm a local variable"
end

p foo  # Prints nil!

When Ruby sees the assignment to foo in the code, it creates a local variable for it, even if the assignment does not run. The variable is initialized with nil.

Note that foo() here would still invoke the method, even though there is a local variable with the same name. You might ask: what if I have a local variable whose value is a function (e.g., a lambda)? How do I call it? In this case, you have to invoke foo.call():

def foo
    "I'm a method"
end

foo = lambda { "I'm a lambda" }

p foo()        # "I'm a method"
p foo          # #<Proc:...>
p foo.call()   # "I'm a lambda"

This is similar to how in Common Lisp, there are distinct namespaces for functions and variables, and you need to use (funcall foo) to call a function stored in a variable. However, because the parentheses are not mandatory in Ruby, it has to do some extra work to guess what you want when it sees a bare identifier.

What about constants with the same name as methods? In this case, the rules are different: Ruby treats an uppercase-initial identifier as a constant unless there are parentheses or arguments:

def A
  "I'm a method"
end

A    # error: uninitialized constant A

A()  # "I'm a method"

Local variable scope

Previously, I said that local variables are visible in the scope they were defined in and nested scopes. That’s not quite true, though, because a lot of syntactic constructs start a clean slate on local variables. For example, local variables defined outside a class declaration are not visible inside it:

x = 1

class Foo
  x  # error: undefined local variable or method `x' for Foo:Class (NameError)
end

The same applies to module and def:

class Foo
  x = 1
  
  def m
    x
  end
end

Foo.new.m  # error: in `m': undefined local variable or method `x' for #<Foo:...> (NameError)

Neither will the variable be accessible via Foo.x, Foo::x, or anything else. It will be visible for code that runs within the class declaration, though:

class Foo
  x = 1
  puts x     # this is fine
  A = x      # and so is this: it initializes the constant `A` with 1
end

Even though Ruby allows multiple declarations of the same class, and each subsequent declaration modifies the existing class rather than defining a new one, local variables declared within one class declaration will not be visible to subsequent declarations of the same class:

class Foo
  x = 1
end

class Foo
  puts x  # error: in `<class:Foo>': undefined local variable or method `x' for Foo:Class (NameError)
end

But note that constants work fine in this case:

class Foo
  A = 1
end

class Foo
  puts A  # prints 1
end

This is because constants are a property of the class object, so a constant declaration mutates the class object and therefore its effect is persistent, whereas local variables only exist within the lexical/textual scope where they were declared.

Constant resolution

Speaking of which, constant scope resolution is the one thing I’m having the hardest time figuring out. It does mostly what you would expect in normal situations, but it does so by quite strange means. What seems to be going on is that Ruby uses lexical scope to determine the dynamic resolution order of the constant. Let me show what I mean.

Classes can be nested, and you can use the constants of the outer class in the inner one:

class A
  X = 1
  
  class B
    def m
      X
    end
  end
end

puts A::B.new.m  # prints 1

You can do this even if the constant definition is not textually within the same class declaration as the method definition:

class A
  X = 1
end

class A
  class B
    def m
      X
    end
  end
end

puts A::B.new.m  # still prints 1

But if you define the method directly in A::B without syntactically nesting it within A, then it doesn’t work:

class A
  X = 1
end

class A::B
  def m
    X
  end
end

puts A::B.new.m  # error: in `m': uninitialized constant A::B::X (NameError)

This resolution is dynamic, though. Let’s go back to our previous example:

class A
  X = 1

  class B
    def m
      X
    end
  end
end

puts A::B.new.m  # still prints 1

The method is getting the constant defined in A. Let’s now add a constant X to B:

class A::B
  X = 2
end

And now if we call the method:

A::B.new.m  # prints 2!

Now method m refers to a constant that did not exist at the time it was defined. In other words, it searches for X at runtime in all classes the method was textually nested in. (Remember that if you define m directly in A::B without textually nesting it in both classes, it only looks up in B.)

What about inheritance? Let’s define some classes:

class One
  X = 1
end

class Two
  X = 2
end

class A < One
  X = 10
  
  class B < Two
    X = 20
    
    def m
      X
    end
  end
end

puts A::B.new.X  # prints 20

Now let’s go about removing constants and seeing what happens:

irb(main):022:0> A::B.send(:remove_const, :X)
=> 20
irb(main):023:0> A::B.new.m
=> 10

It prefers the constant of the outer class over the one from the inheritance chain. Let’s remove that one as well:

irb(main):024:0> A.send(:remove_const, :X)
=> 10
irb(main):025:0> A::B.new.m 
=> 2

Ok, after exhausting the outer class chain, it falls back to the inheritance chain. What if we remove it from the superclass as well?

irb(main):026:0> Two.send(:remove_const, :X)
=> 2
irb(main):027:0> A::B.new.m 
(irb):16:in `m': uninitialized constant A::B::X (NameError)

So it doesn’t try the inheritance chain of the outer class.

One last check: what if you redefine a constant in a subclass but do not redefine the method?

class A
  X = 10

  class B
    X = 20

    def m
      X
    end
  end
end


class C < A::B
  X = 30
end

puts C.new.m  # prints 20

So it looks up based on where the method is defined, not the class it’s called from.

In summary, when Ruby sees a reference to a constant, it tries to find it:

First in the classes in which the reference textually appears, from inner to outer class;
Then in the superclasses of the (inner) class it textually appears in.

Uninitialized variable access

Accessing an undefined local variable raises an “undefined local variable or method” error. (Because of the ambiguity between variables and method names mentioned before, the error message mentions both cases here.) Similarly, accessing an undefined constant is an error.

Accessing an uninitialized global variable produces nil. If you run the code with warnings enabled (ruby -w), you will also get a warning about it.

Accessing an uninitialized instance variable produces nil and no warning. There used to be one but it was removed in Ruby 3.0.

Finally, accessing an uninitialized class variable raises an error (just like locals and constants, but unlike instance variables).

EOF

That’s all for today, folks. I did not even get to blocks in this post, but they’ll have to wait for a post of their own. Stay tuned!

Elmord's Magic Valley

Computers, languages, and computer languages. Às vezes em Português, sometimes in English.