[This post is part of a series on Ruby semantics.]
I’m still trying to wrap my head around all the intricacies of variable/name scope in Ruby. These notes are part of my attempt to figure it all out, so take it with a grain of salt, and feel free to send corrections and additions in the comments.
As I explained in the previous post, the focus of these notes is not on how to use the language, but rather on how it works. This post in particular will deal with a lot of corner cases, which are helpful to figure out what the interpreter is doing. Let’s go!
Ruby has a bunch of different types of variables and variable-like things, distinguished by their initial characters:
Local variables begin with a lowercase ASCII letter, an underscore, or a non-ASCII character (i.e., a Unicode codepoint above 127). Any non-ASCII character can be used in an identifier, even things like the zero width space (U+200B). Local variables are visible in the scope they were defined in and nested scopes, kinda (more on that later).
Constants begin with an uppercase ASCII
character. Constants belong to the class or module they are defined in
(which is Object
in the top-level). They cannot be defined
or redefined from within methods, but they can be redefined
outside of methods (with a warning).
Instance variables begin with @
,
like @foo
. They belong to the current object (i.e.,
self
).
Class variables begin with @@
, like
@@foo
. They belong to the class they are defined in and are
shared with all of its subclasses (if a subclass mutates the variable,
the superclass will reflect the mutation).
These are not the same as class instance
variables, which are not a distinct variable type, but are
simply the instance variables of the class object. Remember, classes are
objects too (instances of Class
), and therefore have their
own instance variables as well, which are distinct from the instance
variables of the instances of the class. Class instance variables are
not shared with subclasses, because each subclass is a distinct object,
with its own (class) instance variables.
Class variables cannot be accessed from the top-level: unlike
constants, they don’t implicitly refer to Object
’s class
variables in that case. I’m not sure why this inconsistency exists, but
it might be because class variables are shared with the subclasses, and
therefore defining a class variable on Object
by accident
would affect almost every class in Ruby, whereas a constant with the
same name can be defined in a subclass with no issues.
Finally, global variables begin with
$
, like $foo
, and are visible across the whole
program.
Unlike Python, there is no per-file global scope. Global variables
($foo
) are true program-wide globals. Constants, instance
variables and class variables are properties of various objects: when
you define one of those, you are effectively mutating the
class/module/instance they were defined in, and the effects will be
visible in other places where these objects are used. You can
define local variables at the top-level, but they won’t be visible
inside any class or method defition, nor is there any concept of
importing the variables defined in a different file: when you
require
another file, you will be able to see the
effects of running that file (such as defining constants,
instance variables and class variables, which, again, are object
mutation rather than what you would think of as variable
definition in Python or Scheme), but local variables defined at the file
top-level won’t be visible outside it.
The allowed names for local variables and constants are also allowed
method names. Because Ruby does not require parentheses in a
method call, and also allows the receiver to be omitted
(self.f()
can be written as f()
, which can be
written as just f
), a bare identifier like foo
could be either a method name or a variable/constant name. How does Ruby
distinguish those?
First, if the parentheses are used (foo()
) , or if there
are arguments after the identifier, with or without parentheses
(foo 42
), then foo
is unambiguously interpreted
as a method name.
If there are neither parentheses nor arguments, and the identifier
begins with a lowercase ASCII letter or an underscore, it will be
interpreted as a local variable if there has been a variable assignment
to that identifier within the lexical scope of the reference. So in
foo = 42; foo
, the second foo
is a local
variable. This disambiguation happens at parse time, and is
based on the textual appearance of an assignment in the scope
of the reference, regardless of whether the assignment is actually
executed at runtime. So, for example:
def foo "I'm a method" end if false foo = "I'm a local variable" end p foo # Prints nil!
When Ruby sees the assignment to foo
in the code, it
creates a local variable for it, even if the assignment does not run.
The variable is initialized with nil
.
Note that foo()
here would still invoke the method, even
though there is a local variable with the same name. You might ask: what
if I have a local variable whose value is a function (e.g., a
lambda
)? How do I call it? In this case, you have to invoke
foo.call()
:
def foo "I'm a method" end foo = lambda { "I'm a lambda" } p foo() # "I'm a method" p foo # #<Proc:...> p foo.call() # "I'm a lambda"
This is similar to how in Common Lisp, there are distinct namespaces
for functions and variables, and you need to use
(funcall foo)
to call a function stored in a variable.
However, because the parentheses are not mandatory in Ruby, it has to do
some extra work to guess what you want when it sees a bare
identifier.
What about constants with the same name as methods? In this case, the rules are different: Ruby treats an uppercase-initial identifier as a constant unless there are parentheses or arguments:
def A "I'm a method" end A # error: uninitialized constant A A() # "I'm a method"
Previously, I said that local variables are visible in the scope they
were defined in and nested scopes. That’s not quite true,
though, because a lot of syntactic constructs start a clean slate on
local variables. For example, local variables defined outside a
class
declaration are not visible inside it:
x = 1 class Foo x # error: undefined local variable or method `x' for Foo:Class (NameError) end
The same applies to module
and def
:
class Foo x = 1 def m x end end Foo.new.m # error: in `m': undefined local variable or method `x' for #<Foo:...> (NameError)
Neither will the variable be accessible via Foo.x
,
Foo::x
, or anything else. It will be visible for code that
runs within the class
declaration, though:
class Foo x = 1 puts x # this is fine A = x # and so is this: it initializes the constant `A` with 1 end
Even though Ruby allows multiple declarations of the same class, and
each subsequent declaration modifies the existing class rather than
defining a new one, local variables declared within one
class
declaration will not be visible to
subsequent declarations of the same class:
class Foo x = 1 end class Foo puts x # error: in `<class:Foo>': undefined local variable or method `x' for Foo:Class (NameError) end
But note that constants work fine in this case:
class Foo A = 1 end class Foo puts A # prints 1 end
This is because constants are a property of the class object, so a constant declaration mutates the class object and therefore its effect is persistent, whereas local variables only exist within the lexical/textual scope where they were declared.
Speaking of which, constant scope resolution is the one thing I’m having the hardest time figuring out. It does mostly what you would expect in normal situations, but it does so by quite strange means. What seems to be going on is that Ruby uses lexical scope to determine the dynamic resolution order of the constant. Let me show what I mean.
Classes can be nested, and you can use the constants of the outer class in the inner one:
class A X = 1 class B def m X end end end puts A::B.new.m # prints 1
You can do this even if the constant definition is not textually
within the same class
declaration as the method
definition:
class A X = 1 end class A class B def m X end end end puts A::B.new.m # still prints 1
But if you define the method directly in A::B
without
syntactically nesting it within A
, then it doesn’t
work:
class A X = 1 end class A::B def m X end end puts A::B.new.m # error: in `m': uninitialized constant A::B::X (NameError)
This resolution is dynamic, though. Let’s go back to our previous example:
class A X = 1 class B def m X end end end puts A::B.new.m # still prints 1
The method is getting the constant defined in A
. Let’s
now add a constant X
to B
:
class A::B X = 2 end
And now if we call the method:
A::B.new.m # prints 2!
Now method m
refers to a constant that did not exist at
the time it was defined. In other words, it searches for X
at runtime in all classes the method was textually
nested in. (Remember that if you define m
directly in
A::B
without textually nesting it in both classes, it only
looks up in B
.)
What about inheritance? Let’s define some classes:
class One X = 1 end class Two X = 2 end class A < One X = 10 class B < Two X = 20 def m X end end end puts A::B.new.X # prints 20
Now let’s go about removing constants and seeing what happens:
irb(main):022:0> A::B.send(:remove_const, :X) => 20 irb(main):023:0> A::B.new.m => 10
It prefers the constant of the outer class over the one from the inheritance chain. Let’s remove that one as well:
irb(main):024:0> A.send(:remove_const, :X) => 10 irb(main):025:0> A::B.new.m => 2
Ok, after exhausting the outer class chain, it falls back to the inheritance chain. What if we remove it from the superclass as well?
irb(main):026:0> Two.send(:remove_const, :X) => 2 irb(main):027:0> A::B.new.m (irb):16:in `m': uninitialized constant A::B::X (NameError)
So it doesn’t try the inheritance chain of the outer class.
One last check: what if you redefine a constant in a subclass but do not redefine the method?
class A X = 10 class B X = 20 def m X end end end class C < A::B X = 30 end puts C.new.m # prints 20
So it looks up based on where the method is defined, not the class it’s called from.
In summary, when Ruby sees a reference to a constant, it tries to find it:
Accessing an undefined local variable raises an “undefined local variable or method” error. (Because of the ambiguity between variables and method names mentioned before, the error message mentions both cases here.) Similarly, accessing an undefined constant is an error.
Accessing an uninitialized global variable produces nil
.
If you run the code with warnings enabled (ruby -w
), you
will also get a warning about it.
Accessing an uninitialized instance variable produces
nil
and no warning. There used to be one but it
was removed in Ruby
3.0.
Finally, accessing an uninitialized class variable raises an error (just like locals and constants, but unlike instance variables).
That’s all for today, folks. I did not even get to blocks in this post, but they’ll have to wait for a post of their own. Stay tuned!
Copyright © 2010-2024 Vítor De Araújo
O conteúdo deste blog, a menos que de outra forma especificado, pode ser utilizado segundo os termos da licença Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International.
Powered by Blognir.