aboutsummaryrefslogtreecommitdiff

Clone or download

  • Download the latest version as a ZIP or tarball
  • Clone with: git clone https://elmord.org/code/fenius-report

Title: The Revised⁰ Report on the Algorithmic Language Fenius

Introduction

This document describes the 0th version of the Fenius programming language. This is meant as an initial design. The goal of this document is to define a relatively uncluttered, easy to implement and easy to understand language that can be refined later. Features involving design decisions whose answers are currently not clear to me have either been left out entirely (e.g., keyword arguments, static typing), or given a simple design that is easy to implement and allows the rest of the language to go ahead (e.g., the object system, error handling).

Fenius combines features from various existing programming languages. None of these features are unique to Fenius; rather, the goal of Fenius is to combine these different features into a coherent, ergonomic programming language. These features include:

  • Dynamic typing, with the goal of introducing gradual typing in a future version of the language.

  • Extensible syntax: users can extend the language with new syntactic constructs that are indistinguishable from built-in constructs (such as if and while), by means of macros. Analogously, the set of operators is open-ended: any sequence of operator characters is a valid operator name (e.g., ++, =~, @.@), and users can define new operators as easily as defining regular functions and macros (because they are regular functions and macros).

  • A surface syntax that is flexible enough to accomodate a variety of syntactic constructs, while hardcoding as little as possible, and still maintaining a familiar form. For instance, an if construct like:

    if x > 0 {
        print("x is positive")
    } else {
        print("x is not positive")
    }
    

    looks similar to the equivalent construct in many other programming languages, but none of the keywords is hardcoded in the language: if is just a regular identifier that happens to be bound to the conditional syntactic construct by default, with a space-separated list of arguments following it. The syntax is composed of a small set of building blocks, such as identifiers, constants, blocks and calls, that can be easily manipulated by macros. Thus, Fenius achieves a middle ground between the absolute uniformity of Lisp parenthesized expressions (which are very easy to manipulate programmatically, but harder to read) and the syntaxes of conventional programming languages (in which every construct is a special case, making it hard to write code that manipulates code).

Files, modules and packages

File organization

Code is organized in modules grouped under packages.

A package is a directory containing a package.fen file. The name of the package is the name of the directory. The contents of the package.fen file is currently irrelevant. (It will be used in the future for metadata such as author names, version, dependencies, etc.)

The package directory must contain an src directory. A module is any file with a .fen extension inside the src directory (including subdirectories).

A module can be imported by specifying its path. The path of a module consists of the package name, followed by the pathname of the module file inside the src directory (but not including src), excluding the .fen extension. For example, if somehack is a directory containing a package.fen file, then the module whose code is located at file somehack/src/foo/bar.fen can be imported as somehack/foo/bar.

As a special case, importing a bare package name such as somehack without specifying any other path components is equivalent to importing somehack/somehack (i.e., it will load the file at somehack/src/somehack.fen).

A Fenius program is started by calling fenius <filename>, where <filename> the file containing the code to be executed. If this file is contained in (either directly or in a subdirectory of) the src directory of a directory <package> containing a package.fen file, then <package> is the root of the package being executed. Any imports from paths whose first component is <package> will be looked up in this directory. If the file is not contained in such a directory, then it's an independent module not belonging to any package, and imports will not be looked up at the directory the file is contained in.

Imports from packages other than the root package of the initial module are looked up in the package path. For instance, if an attempt is made to import a module somehack/foo/bar, a directory named somehack will be looked up in each directory in the package path sequentially; the first somehack directory found in that way will be used. Then the src/foo/bar.fen inside this directory will be loaded. It is an error if that file is not present; the runtime will not try to look for other somehack directories further on in the package path. It is also an error if the somehack directory exists but does not contain a package.fen file (i.e., is not a valid package).

On Unix platforms, the package path consists of the following items:

  • The :-separated list of directories in the FENIUS_PACKAGE_PATH environment variable, if present;
  • ~/.local/share/fenius/packages/ (where ~ is the current user's home directory);
  • /usr/local/share/fenius/packages/.

On non-Unix platforms, these should be adapted to the platform's conventions.

The package named std is special: it is always present, and some or all of its modules may be built-in into the runtime and may not correspond to any particular file in the filesystem.

Rationale

These rules aim to accomplish a number of goals. First, package.fen is used to uniquely determine the package root directory. This means the package root is the same regardless of the path passed to fenius. For instance, if your current directory is somehack/src/foo and you run fenius bar.fen, the package root is still somehack, not the current directory, and all imports are still relative to somehack (so your imports always work regardless of your current directory).

Another goal is to make installing a package as easy as cloning a git repository inside ~/.local/share/fenius (or whatever is your package path). Since a repository can have more than just the package source files (i.e., documentation, tests, etc.), having all the source files inside a src directory makes it easy to group those files while still leaving room for other things.

Loading modules

Upon loading a module, the contents of the corresponding file will be executed in a fresh environment whose parent environment contains the bindings exported by std/base. Any bindings introduced in this fresh environment during the execution of the code are the module's exported bindings. (In the future, there will be ways to selectively export bindings from a module.)

When a module is imported for the first time, the module is loaded, and a module object containing the module's exported bindings is returned. Subsequents imports of the same module return the same module object; the module is not loaded or executed a second time. The bindings inside the module object can be accessed via the :: operator defined in std/base.

Code structure

Lexical elements

A Fenius source code file is a UTF-8 encoded text file. Non-ASCII characters can only appear inside string literals and comments (for now).

Types of characters

A digit is one of the ASCII digits. An alphabetic character is one of the uppercase or lowercase ASCII letters, or _. An alphanumeric character is either an alphabetic character, or a digit. A whitespace character is either an ASCII space or an ASCII tab character. A newline is an ASCII line feed character, optionally preceded by an ASCII carriage return character.

Tokens

A source code file is interpreted as a series of zero or more tokens. A token consists of the longest sequence of characters that matches one of the token types below, beginning past the end of the previous token, or the beginning of the file in the case of the first token. Some tokens are self-delimiting: they don't require any whitespace or other intervening material to separate them from adjacent tokens. Other tokens are not self-delimiting: their boundaries are delimited either by self-delimiting tokens, or by whitespace. It is an error if there is non-self-delimiting material adjacent to a non-self-delimiting token; for example, 123 is a number, and abc is an identifier, but 123abc is not a valid sequence of a number and an identifier.

Identifiers

An identifier literal is one alphabetic character followed by zero or more alphanumeric characters. Identifier literals are not self-delimiting.

Numbers

A number literal is either a decimal integer literal, a hexadecimal integer literal, an octal integer literal, a binary integer literal, or a float literal. Number literals are not self-delimiting.

A decimal integer literal is one or more digits.

A hexadecimal integer literal is a 0, followed by a (lowercase) x, followed by one or more hexadecimal digits; a hexadecimal digit is either a digit or one of characters of the set abcdefABCDEF.

A binary integer literal is a 0, followed by a (lowercase) b, followed by one or more of the caracters of the set 01.

An octal integer literal is a 0, followed by a (lowercase) o, followed by one or more of the caracters of the set 01234567.

A float literal is one or more digits, followed by a ., followed by one or more digits, optionally followed by: a (lowercase) e, optionally followed by a + or a -, followed by one ore more digits. (For example, 1.2, 1.2e5 and 1.2e+5 are all valid float literals. while 1. is not.)

Note: A sequence of one or more digits followed by a . followed by a non-digit is not treated as the beginning of a float literal; instead, it is treated as the . operator with an integer on its left. For example, 1.e0 is not a float number, it is a call of a (hypothetical) method e0 on the integer 1.

Operators and delimiters

An operator is one or more characters of the set +-*/<>=.:?!@$%^&|~. Operators are self-delimiting.

A delimiter is one character of the set ()[]{},; and newline. Delimiters are self-delimiting.

Comments

A comment is the character #, followed by zero or more characters other than newline. Comments are self-delimiting. (Note that the newline itself, if present, is not part of the comment. Note also that a newline might not be present if the comment happens at the end of the file with no intervening newline between the start of the comment and the end of the file.)

Strings

A string literal is a character ", followed by one or more string elements, followed by a character ". Strings are not self-delimiting.

A string element is either a self-representing string character, or an escape sequence.

A self-representing string character is any character other than \ or ".

An escape sequence is one of:

  • \", representing the " character;
  • \\, representing the \ character;
  • \n, representing the ASCII line feed character;
  • \r, representing the ASCII carriage return character;
  • \t, representing the ASCII tab character;
  • \a, representing the ASCII bell character;
  • \b, representing the ASCII backspace character;
  • \f, representing the ASCII form feed character;
  • \v, representing the ASCII vertical tab character;
  • \e, representing the ASCII escape character;
  • \u{, followed by one or more hexadecimal digits, followed by }, representing the Unicode codepoint identified by the hexadecimal digits.
  • \x, followed by exactly two hexadecimal digits, representing the byte with the specified hexadecimal value. (Note that this escape sequence may introduce sequences of bytes that are not valid UTF-8. This is not an error: strings are allowed to contain any sequences of bytes. See the section on string values for more information.)
  • \ followed by a newline, which generates no character in the resulting string. This can be used to continue a string in the following line without including the newline character in the resulting string.

Line continuations

A \ character followed by a newline is equivalent to whitespace.

Syntactic elements

Fenius syntax is defined as a mapping from source tokens to Abstract Syntax Tree (AST) nodes. Those syntax tree nodes have corresponding classes in the std/ast module. This module will be referred to as ast in the remainder of this section.

This section describes the mapping from tokens to ast node classes. The semantics of evaluating those nodes as code is given in a separate section. Note that the mapping from source tokens to AST nodes is independent from their semantics as programs and may be useful on its own, for example as a data serialization or configuration file format, akin to YAML or JSON. It just so happens to be a data serialization format designed to be convienient for representing programs.

Treatment of whitespace

In most cases, tokens may be separated by whitespace with no difference in the resulting Abstract Syntax Tree. In the few cases where no whitespace must appear between two tokens, the expression "with no intervening whitespace" is used.

Comments are considered whitespace for the purposes of AST building.

Location information

All ast node classes have a location field: they take location as the first constructor parameter have a location method returning the location value of the node. This location field is filled with an object of the ast::Location class, representing information about the location of the node in the source file. The contents of this location object is currently left unspecified.

In the subsequent sections, whenever a class constructor has a location field, it is meant to be a location object as described here.

Bodies, phrases and constituents

A program body consists of zero or more phrases, each phrase separated by one or more phrase separators. A phrase separator is either a ; or a newline. The body itself may be preceded or followed by any number of phrase separators.

A phrase is a sequence of one or more constituents.

A constituent is the longest sequence of tokens that can be interpreted as one of the following:

Constituent type Examples
identifier foo
constant 123, "foo"
parenthesized expression (2), (2, 3)
bracketed expression [2], [2, 3]
block { foo; bar }
parenthesized application f(x, y)
bracketed application v[i, j]
operator expression -2 , 2+3, x?

For example, if x > 0 { print("yea") } else { print("nay") } is a phrase with 5 constituents: if, x > 0, { print("yea") }, else, and { print("nay") }. Each of these constituents have subparts, as will be seen in the following sections.

A single-constituent phrase maps to the AST node of the constituent itself. A multiple-constituent phrase is represented by a node ast::Phrase(location, constituents), where constituents is a list of the AST nodes of each constituent.

Notes

There are no zero-constituent phrases: the body rule accepts any number of phrase separators between phrases, so a sequence like foo; ; bar is exactly equivalent to foo; bar and generates the same AST node.

An ast::Phrase node is only ever generated for phrases with two or more constituents. A single-constituent phrase is always represented directly by the node of the constituent itself.

Identifiers

An identifier is composed of just an identifier literal. It maps to a node ast::Identifier(location, name, context), where name is a string containing the text of the identifier, and context is a context object that affects how the identifier will be resolved. This context value is always nil at parse time, but may be manipulated during macro expansion. See the section on macro expansion for more information.

Constants

A constant is either a number literal or a string literal. It maps to a node ast::Constant(location, value), where value is either an integer, a float or a string value, depending on the type of literal. The value is the interpreted value, not the text of the token: if it is a number literal, the node will contain the actual value as a number; if it is a string literal, the value will contain the actual string, with the escape sequences already interpreted.

Parenthesized expressions

A parenthesized expression is a (, followed by zero or more list items separated by commas, followed by a ). A list item is either a phrase or an operator.

Within a parenthesized expression, newlines are considered equivalent to whitespace. The ; delimiter is not allowed within a parenthesized expression.

A comma may optionally follow the last phrase of a parenthesized expression.

If the parenthesized expression contains a single list item not followed by a comma, it maps to the AST node of that list item. Otherwise, it maps to ast::Tuple(location, items), where items is a list of the AST nodes for each of the list items. If a list item is a phrase, it maps to the AST node of that phrase; if it is an operator, it maps to ast::Identifier(location, operator), where operator is the operator as a string.

Rationale

Allowing bare operators to appear inside parenthesized expressions provides a way to use operators in non-infix contexts, such as when passing them as arguments to functions. Allowing multiple operators in a single list without requiring each individual operator to be escaped makes it easier to import multiple operators from other modules (e.g., import foo/bar (+, *, ?)).

Bracketed expressions

A bracketed expression is a [, followed by zero or more list items (as defined above) separated by commas, followed by a ].

Within a bracketed expression, newlines are considered equivalent to whitespace. The ; delimiter is not allowed within a bracketed expression.

A comma may optionally follow the last phrase of a bracketed expression.

A bracketed expression maps to ast::List(location, items), where items is a list of the AST nodes for each of the list items.

Blocks

A block is a {, followed by a body, followed by a }. It maps to ast::Block(location, phrases), where phrases is a list of the AST nodes for the phrases contained in the block.

Application and operator expressions

[The following is a vague description of expressions involving precedence. They should be made more precise, but that's what I have for now.]

A parenthesized application is a constituent followed by a parenthesized expression with no intervening whitespace. It maps to a node ast::Call(location, head, arguments), where head is the AST node of the first constituent expression, and arguments is a list of the AST nodes of the elements inside the parenthesized expression.

A bracketed application is a constituent followed by a bracketed expression. It maps to a node ast::Subscript(location, head, arguments), where head is the AST node of the first constituent expression, and arguments is a list of the AST nodes of the elements inside the bracketed expression.

An operator expression is either a prefix expression, an infix expression, or a postfix expression.

A prefix expression is a prefix operator followed by a constituent. It maps to a node ast::Call(location, ast::Identifier(location, operator), [operand]), where operator is the operator as a string, and operand is the AST node for the following constituent.

An infix expression is a constituent, followed by an infix operator, followed by a constituent. It maps to a node ast::Call(location, ast::Identifier(location, operator), [left, right]), where operator is the operator as a string, and left and right are the AST node for the left-hand and right-hand constituents, respectively.

A postfix expression is a constituent followed by an postfix operator. It maps to a node ast::Call(location, ast::Identifier(location, operator), [operand]), where operator is the operator as a string, and operand is the AST node for the preceding constituent.

Parsing ambiguities are resolved by the precedence table below. All infix operators are left-associative.

Operator precedence and fixity

The characters that constitute an operator determine its precedence and fixity (whether they are prefix-only, prefix-or-infix, or postfix-only).

Fixity

If the operator begins with $ or @, it is a prefix-only operator.

Otherwise, if the operator ends with ? or !, it is a postfix operator

Otherwise, it is a prefix-or-infix operator.

A prefix operator is either a prefix-only operator, or a prefix-or-infix operator.

An infix operator is the same as a prefix-or-infix operator.

Precedence

Operators are assigned precedences according to what sequence of characters they end on, according to the following table. The longest suffix that matches is used. The higher the precedence, the tighter the operator binds (i.e., * has higher precedence than +).

Suffixes Precedence
->, => 10
.., : 20
&, | 30
=, <, > 40
+, - 50
*, /, % 60
^, ~ 70
?, !, $, @ 80
., :: 100

Parenthesized application (f(x)) and bracketed application (v[i]) have priority 90. Thus, x.y(z) parses as an application of x.y to the arguments (z), not as a . operator with operands x and y(z).

Rationale

Coming up with a set of precedence and fixity rules that is general enough to accomodate an open set of operators while still having the expected precedences for the familiar operators is tricky. The simplest approach would be to assign precedences based only on the first or the last character of an operator, but having -> (Fenius' function/lambda notation) with a lower precedence than both - and > is more convenient in practice. Likewise, => (intended to be used in pattern matching cases) having a lower precedence than the comparison operators imposes fewer restrictions on what kinds of expressions can be used in match conditions and results without requiring parentheses.

Most operators can be used in both prefix and infix position, with a few exceptions. Postfix operators are all postfix-only; otherwise, expressions like x ? ! y could be interpreted as either (x?) ! y or x ? (!y). In addition, $ is prefix-only; this allows it to be given a "variable sigil" usage reminiscent of Unix shell, for instance, in patterns like for $x in $list, without these being interpreted as infix expressions for $ x and in $ list.

Postfix status is granted based on the last character, but prefix status is granted based on the first character. Using the last character for postfix status is useful because the same characters can be incorporated in related infix operators, so that, for instance, Rust-like x? an x?.y can be defined. Using the first character for prefix allows prefix operators to be extended with more characters to the right, so one might have for instance $x, $*x, etc. Of course they could be extended to the left instead, but to me it seems subjectively clearer that all prefix-only operators begin (rather than end) with the same characters.

One way out of these issues would be to allow programs to declare custom precedences for operators, like Haskell does, but this would require interpreting a program's declarations (and module imports!) to be able to parse it, which is very much counter to Fenius' philosophy of treating source code syntax as a serialization format for ASTs that can be parsed independently from its meaning as code, much like Lisp S-expressions.

Semantics

Every piece of Fenius code is evaluated in an environment. An environment consists of a set of bindings and a parent environment (which may be null). Bindings have both a compile-time and a run-time component: at run-time, a binding maps an identifier name to a value; at compile-time, a binding keeps track of whether an identifier name is bound to a regular value or to a syntactic construct (i.e., a macro or a built-in special form).

The module top-level

A module source file, once parsed into a sequence of AST nodes, is evaluated in a fresh environment whose parent environment contains the bindings from std/base (as if imported via import_all). Each node is evaluated in order. If node that create bindings in the current environment is evaluated, those bindings are added to the module environment. The set of bindings added to the module environment after all AST nodes of the module are executed are the module's exported bindings.

Node evaluation

Constant nodes

An ast::Constant(location, value) node evaluates to value.

Identifier nodes

There are two aspects to identifier semantics: resolution and evaluation. Resolution happens at compile-time; evaluation, at run-time.

An ast::Identifier(location, name, context) node is resolved into a into a binding by looking it up in an environment determined by context: if context is nil, the identifier name is looked up in the current environment; otherwise, it is looked up in the environment associated with context. Lookup is done by looking up for a binding with the identifier name in the environment; if one is found, the identifier resolves to that binding; otherwise, lookup proceeds on the environment's parent, and so on until either a binding with the identifier name is found, or the environment has no parent. In the latter case, the identifier is unbound.

An identifier evaluates to the value associated to the binding it resolves to. If the identifier is unbound, an error is raised.

Phrase nodes

An ast::Phrase(location, constituents) node evaluates as follows. If the first constiutient is an identifier and it resolves to a syntactic binding to a built-in special form, evaluation proceeds according to the rules of that special form. If it resolves to a syntactic binding to a macro, evaluation proceeds according to the rules in the Macros section. Otherwise, an error is raised.

Call nodes

An ast::Call(location, head, arguments) node evaluates as follows. If head is an identifier and it resolves to a syntactic binding to a macro, evaluation proceeds according to the rules in the Macros section. Otherwise, head and each of the arguments is evaluated, in order. If head evaluates to a function, it is called with the evaluated arguments. If head evaluates to a Class object, its class constructor is called with the evaluated arguments. Otherwise, an error is raised.

Subscript nodes

An ast::Subscript(location, head, arguments) node evaluates as follows. The head and each of the arguments is evaluated, in order. If head has a method subscript, it is called with the evaluated arguments (as if evaluated_head.subscript(...evaluated_arguments) had been called). Otherwise, an error is raised.

Tuple and List nodes

An ast::Tuple(location, items) node evaluates to a tuple, constructed as follows. For each item:

  • If the item is an AST call node, the call head is an unbound identifier ..., and the call has a single argument, that argument is evaluated; it must evaluate to a streamable sequence. Each element of the stream is added to the tuple.

  • Otherwise, the element is evaluated and added to the tuple.

An ast::List(location, items) node is evaluated the same way, except the evaliated items are added to a list instead of a tuple.

Block nodes

An ast::Block(location, phrases) node evaluates as follows. A new environment is created whose parent environment is the current environment. Each of the phrases is evaluated, in sequence, in the new environment. The result of the evaluation of the last node in phrases is the result of the block as a whole. If phrases is empty, the result is the value of the std/base constant nil.

TODO: Clarify compile-time scope resolution in light of forward references and self-references. In short, block definitions have Scheme letrec* semantics.

Special forms

std/base special form: let

let name = value, where name is an identifier and value is any expression, declares a value binding named name in the current environment, and binds it to the result of evaluating value.

let name(...params) = value, where name is an identifier, is equivalent to let name = (...params) -> value.

std/base special form: ->

The function constructor form, analogous to lambda in Lisps.

(param₁, ..., paramₙ) -> body, where each name is an identifier, produces a function taking n parameter. When the function is called with n arguments, body is evaluated in an environment where param₁, ..., paramₙ are bound to the corresponding argument values, and whose parent environment is the environment in which the function was created.

The last element in the parameter list may be of the form ...rest, where rest is an identifier, making a parameter list of the form (param₁, ..., paramₙ, ...rest). In this case, the resulting function accepts n or more parameters; when called with n or more arguments, the first n parameters are bound to the first n argument values, and the rest parameter is bound to a list of the remaining argument values.

std/base special form: if

The full form of this construct is:

if <condition₁> [then] <consequent₁> [elif <condition₂> [then] <consequent₂> ...] [else <alternate>]

Evaluation proceeds as follows. <condition₁> is evaluated. If result is not a boolean an error is raised. If the result is true, <consequent₁> is evaluated and is the result is used as the result of the entire form. Otherwise, the remaining <condition> and <consequent> pairs are tried in sequence, until one of the <condition>s is true. If none of the <condition>s is true, <alternate> is evaluated. If no else <alternate> clause is provided, nil is returned.

std/base special forms: do, redo, return

The full form of this construct is:

do <name>(<param₁>=<value₁>, ..., <paramₙ>=<valueₙ>) <body>

Execution proceeds as follows. Result of evaluating each value in the current environment is bound to the corresponding parameter in an environment whose parent is the current environment. Then <body> is evaluated in this new environment. Within the body, name is bound to a handler to the block itself. The return value of the form is the return value of <body>, unless the redo or return forms are used to transfer control flow. After executions leaves the form, the block handler is invalidated, even if it escapes the current scope (e.g., is returned by the current form, or is stored in an external data structure).

redo <block>(<value₁>, ..., <valueₙ>) transfers control out of the body corresponding to the block handler <block> (running any pending finally blocks) and restarts execution of <body> in an environment where each <paramᵢ> is bound to the corresponding <valueᵢ> instead of the original values. If <block> has been invalidated, an error is raised.

return <value> from <block> transfers control out of the body corresponding to the block handler <block> (running any pending finally blocks), and <value> becomes the return value of the block as a whole. If <block> has been invalidated, an error is raised.

std/base special form: try

The full form of this construct is:

try <body> [except <selectorᵢ> <nameᵢ> <handlerᵢ> ...] [finally <finalizer>]

Execution proceeds as follows. First, <body> is evaluated. If no exception is raised, the value of <body> becomes the value of the try form as a whole. The <finalizer>, if present, is then evaluated, but its value is discarded.

If an exception e is raised during execution of <body>, execution of <body> is interrupted, and the exception is matched against each <selectorᵢ> in order, until one of them matches, as follows. If <selectorᵢ> is the identifier _, it matches. Otherwise, <selectorᵢ> is evaluated; it must evaluate either to a Class object, or to a tuple of Class objects. If instance_class(e) is equal (in the == sense) to any of the class objects, then it matches. If a selector <selectorᵢ> matches, <handlerᵢ> is evaluated in an environment where <nameᵢ> is bound to the exception e. Otherwise, the remaining selectors are tried in sequence until one of them matches. If none of the selectors match, execution leaves the try form (running <finalizer> if present), and execution continues as if the form as a whole had raised the exception e (i.e., the selectors/handlers in the next dynamically enclosing try form will be tried).

std/base special form: raise value

value is evaluated and raised as an exception.

std/base special form: import

The basic form of this construct is:

import <module_path>

<module_path> is one or more identifiers separated by the / operator. The module with the corresponding path is loaded (according to the procedure in the Loading modules section), and the last identifier in <module_path> is bound to the module object (e.g., import foo/bar/baz imports the module foo/bar/baz and binds it to baz).

An alternative form, import <module_path> as <identifier>, loads <module_path> and binds the module object to <identifier>.

Another form is import <module_path> (<bindingᵢ>, ...), where each <bindingᵢ> can be either an identifier or a phrase <identifier₁> as <identifier₂>. It loads <module_path>, and makes each specified binding from the module available as a binding in the current module (possibly with a different name, if the as <identifier₂> form is used).

std/base special form: import_all

import_all <module_path> loads the module and makes all of the module's bindings available in the current module. This form is useful in the REPL, but is not generally recommended for programs, since it makes it harder to tell which module a given identifier comes from.

std/base special form: macro

The full form of this construct is:

macro <identifier> = <transformer>

<transformer> must evaluate to a function taking three arguments: an AST node, an object representing the syntactic environment of the macro definition, and an object representing the syntactic environment of the macro call. A macro binding for <identifier> will be created, bound to the transformer function. When the macro is invoked, the transformer function is invoked according to the semantics described in the Semantics section.

Common operators

Many operators are shared by multiple data types. This section defines them.

Arithmetic and related operators

std/base functions: +, -, *, /, //, %, ^, ++

These functions call the method of same name on their left operand, passing the right operand (if any) as an argument. For instance, x+y is equivalent to x.(+)(y) (i.e., calling the method named + on x, with y as an argument), and -x is equivalent to x.(-)() (i.e., calling method named - on x, with no arguments).

The definitions of the methods themselves are given in the sections for each data type.

Comparison

std/base functions: x == y, x < y

These functions call the method of same name on their left operand, passing the right operand as an argument, i.e., they are equivalrnt to x.(==)(y) and x.(<)(y), respectively. The definition of those methods are given in the sections for each data type. All built-in data types implement ==; not all implement <. If the corresponding method does not exist, an error is raised.

std/base function: x === y

If x has a === method, invoke x.(===)(y). All built-in classes implement this method.

If x does not have a === method, return the result of instance_class(x) === instance_class(y) && instance_data(x) === instance_data(y).

The goal of x === y is to tell whether x and y are equivalent, i.e., whether they could be interchanged with one another in any context (with the possible exception of utilities capable of inspecting the bit patterns of those values, thus revealing internal implementation details). This is in contrast with ==, which is meant to implement a less strict notion of equality. For instance, 1 == 1.0 is true, since these values are numerically equal, but 1 === 1.0 is false, since these values do not have the same behavior in every context.

std/base function: x != y

Returns the negation of the result of x == y. Equivalent to not(x == y).

std/base function: x <= y

Returns true if either x < y or x == y is true.

std/base function: x > y

Returns the negation of the result of x <= y.

std/base function: x >= y

Returns the negation of the result of x < y.

std/base function: x !== y

Returns the negation of the result of x === y.

Basic data types

Booleans

std/base class: Bool

The class of boolean values. There are exactly two instances of this class, defined in std/base: True and False.

The Bool class has no constructor.

In this report, when a function or method is said to return true, it means returning the True instance of Bool; analogously, returning false means returning the False instance of Bool. No other values (such as 0 or 1) are considered true or false in Fenius; for instance, if the condition of an if statement evaluates to something that is not a boolean, an error is raised. To check whether a number is zero, use a regular == comparison. For collections, the convenience functions no and any (which check whether a collection is empty or non-empty, respectively) can be used.

Bool method: self.(==)(other), self.(===)(other)

Return true if other is a boolean and the same as self. If other is a different boolean or is not a boolean, return false.

Bool method: self.str()

Returns the string "True" if self is true, and "False" if self is false.

std/base function: not(boolean)

Return true if boolean is the false boolean, and false if boolean is the true boolean. If boolean is not a boolean, an error is raised.

std/base macro: x && y

Evaluate x. If the result is true, evaluate y and use it as the result. Otherwise, the result is false and y is not evaluated.

Equivalent to if x { y } else { False }.

std/base macro: x || y

Evaluate x. If the result is false, evaluate y and use it as the result. Otherwise, the result is true and y is not evaluated.

Equivalent to if x { True } else { y }.

Numbers

Introduction

There are two types of numbers in Fenius: integers and floats.

Integers

Integers are signed, and must either have the range of a k-bit two's complement integer for some fixed k ≥ 64 (i.e., it can represent values from -2k-1 to 2k-1 - 1, inclusive), or be implemented as arbitrary-precision integers.

Operations on integers that would result in values exceeding the range of representaion for an integer must be handled by either by:

  • Implementing two's complement wrap-around semantics (i.e., adding 1 to the most positive integer yields the most negative integer, for instance);
  • Trapping, i.e., raising an error whenever overflow happens.

Whatever strategy an implementation picks, it must be used consistently in all integer operations.

Rationale

Wrap-around semantics is the default behavior of integers in most modern CPU architectures. This is not optimal, since it can hide bugs, but is easiest to implement. Note, however, that implementations in C and other languages in which integer overflow is undefined behavior must be careful to ensure wrap-around semantics is always preserved by the compiler.

In an ideal world, raising an error on overflow would be better, but this may complicate implementation too much, especially at the current stage of the language.

In a previous version of this report, it was specified that integer operations whose result would exceed the representation range of an integer would return floats instead. I gave up on this idea because the whole point of having integers and floats as separate types is that we care about the distiction. In most non-mathematical contexts, using a float when an integer is expected would be wrong, and in contexts where this is not the case, one should be using floats to begin with anyway.

Although it would be convenient to allow integers to be less than 64-bits wide (e.g., 63 bits plus a tag bit), it would be too annoying if operating system calls that take or return 64-bit values (such as file offsets) could not be handled using native integers. An implementation is allowed to use a compact tagged representation for smaller integers, but this implementation detail is not exposed at the language level.

Arbitrary-precision integers are allowed as a possibility mainly for the sake of making it easy to target existing Common Lisp and Scheme runtimes as an implementation strategy.

Floats

Floats are IEEE 754 double-precision (64-bit) floats. Infinities and NaN values are supported.

std/base class: Int

The class of integer values.

The Int class constructor accepts one argument and converts it to an integer.

If the argument is an integer, the argument is returned unchanged.

If the argument is a float, it is rounded toward zero. If the resulting number can be represented as an integer, that integer is returned; otherwise, an error is raised. In particular, if the argument is an infinity or a NaN value, an error is raised.

If the argument is a string, it is parsed as an integer using the same rules for parsing integers in source code, with the addition that an optional + or - sign prefix is accepted. (In source code, such a sign is interpreted as the unary + or - operator.) In particular, the prefixes 0x, 0b and 0o are accepted.

If the argument is of any other type, an error is raised.

std/base class: Float

The class of float values.

The Float class constructor accepts one argument and converts it to a float.

If the argument is a float, it is returned unchanged.

If the argument is an integer, it is converted to a float. If the integer is not within the range of values representable by a finite float value, an infinity with the appropriate sign is returned. (This may happen if the implementation uses arbitrary-precision integers.)

If the argument is a string, it is parsed as a float using the same rules for parsing floats in source code, with the addition that an optional + or - sign prefix is accepted. (In source code, such a sign is interpreted as the unary + or - operator.) In addition, Infinity (optionally preceded by a + or - sign) and NaN are recognized.

If the argument is of any other type, an error is raised.

Constants

std/base constant: Infinity

The positive infinity float value. The negative infinity can be obtained by using the - operator (i.e., -Infinity).

std/base constant: NaN

A NaN float value. In IEEE 754, there are multiple distinct NaN values, but all NaN values are equal in Fenius (in both the == and === sense), so in effect NaN can be thought of as the single NaN value in the language.

std/base constants: INT_BITS

The number of bits an integer can hold, e.g., 64 if integers are 64-bits. In implementations with arbitrary-precision integers, this value can be nil.

Converting to a string

Int and Float method: self.str()

Returns a string representation of self, in a format accepted by the Int and Float constructors. For floats, the resulting string may or may not be in scientific notation (e.g., 1.2e-100).

Comparison operators

Int and Float method: self.(==)(other)

Return true if self is numerically equal to other. If other is not an integer or a float, return false.

== is not IEEE 754 compliant with respect to NaN: any NaN value is always equal to any NaN value. (Put another way, == is not the IEEE 754 equality operator. That operator might be defined in a library in the future if the need arises, but the == operator in std/base is not it.)

Int and Float method: self.(===)(other)

Return true if self and other are equivalent. This means they must have the same type, the same numerical value, and the same sign (if applicable, i.e., when not NaN). Note that IEEE 754 floats distinguish between positive and negative zero; 0.0 === -0.0 returns false.

Int and Float method: self.(<)(other).

Return true if self is numerically less than other. If other is not an integer or a float, an error is raised.

If either self or other is a NaN value, an error is raised. (Rationale: NaN values are not ordered relative to other numbers, not even to themselves, so asking whether a number is less than NaN is just as undefined as asking whether a number is less than a string (which is also raises an error).)

[TODO: When one argument is a float and the other is not, is it enough to cast the non-float argument to float before comparison? This clearly does not work in implementations with arbitrary-precision integers, since conversion may turn the integer into infinity, and suddenly 2^100000 < Infinity is false. But I'm not sure it works in all cases even with 64-bit integers and 64-bit floats, since floats are less precise than integers near the end of the integer range.]

Arithmetic operators

For all operators described in this section, unless otherwise stated, the following applies:

  • If either argument is a float, the result is a float.
  • If either argument is neither an integer nor a float, an error is raised.
  • If the result of an integer operation is not within the representation range of an integer (i.e., on overflow), this must be handled either by using two's complement wrap-around semantics or by raising an error.
  • Operations involving infinities and NaN values have the expected IEEE 754 semantics.

Int and Float method: self.(+)(), self.(+)(other)

With no arguments (i.e., when used as a prefix operator), return self. With an argument, return the result of self plus other.

Int and Float method: self.(-)(), self.(-)(other)

With no arguments (i.e., when used as a prefix operator), return self with the negated sign. With an argument, return the result of self minus other

Note that negating the most negative integer causes overflow, since the corresponding positive integer is outside the representation range of integers (e.g., with 64-bit integers, the range of representable values in two's complement representation is from -2⁶³ to 2⁶³-1, and -2⁶³ is not representable).

Int and Float method: self.(*)(other)

Return the result of self times other.

Int and Float method: self.(/)(other)

Return the result of self divided by other. The result is always a float. Division by zero is not an error: it returns an infinity or NaN as appropriate, according to IEEE 754 semantics.

Int and Float method: self.(//)(other) and self.(%)(other)

// returns the result of the division of self by other, rounded towards negative infinity to the nearest integer. If either argument is a float, the result is a float; otherwise, the result is an integer. If the divisor is zero, an error is raised.

Note that dividing the most negative integer by -1 causes overflow.

Note that, like Python and unlike C, integer division rounds towards negative infinity, not zero, i.e., -11 // 2 is -6, not -5. See "Rationale" in the next section for more information.

Int and Float method: self.(%)(other)

% returns the value of self modulo other. The sign of the result is the same as the sign of other. If other is zero, an error is raised.

If either self or other is NaN, the result is NaN. If self is an infinity (of either sign), the result is NaN. If other is positive infinity and self is a finite value, the result is self if it is non-negative, and negative infinity if it is negative. If other is negative infinity and self is a finite value, the result self if it is non-positive, and positive infinity if it is positive. See "Rationale" below for more information.

Rationale (no pun intended)

These two operators have pretty much the same behavior as in Python.

Guido van Rossum gives a justification for rounding towards negative infinity in this post. To sum up, the the following identity should hold: if self // other yields quotient and self % other yields remainder, then quotient * other + remainder == self. The behavior of negative integer division then hinges on the behavior of % with negative operands.

Having % as a modulo operator has the nice property that if other is a positive integer, then self % other for successive integer values of self yields an unbroken cycle of integers from 0 to self - 1, even if it crosses the zero boundary (i.e., changes sign), i.e., it implements a "wrap-around" semantics that works in both directions. For instance, imagine you have a 12-hour clock and you want to compute your final position in the clock given a time difference from the initial time. With a modulo operator, you can compute this as (initial_time + difference) % 12, and this will work even if difference is negative: if your initial time is 2 o'clock and you move back 5 hours, the resulting time is (2 - 5) % 12 = (-3) % 12 = 9 o'clock. This does not work with the C remainder semantics.

Having settled on modulo semantics for %, if we want quotient * other + residue == self, and residue is positive, then quotient * other must come before self, not after. In our example, if (-3) % 12 is 9 and we want quotient * 12 + 9 to be -3, then quotient must be -1, so that (-1) * 12 + 9 == -12 + 9 == -3.

One intuitive way to think about this is that the modulo operator with a positive modulus m imposes a ruler over the real number line with markings m units apart from each other, so that it divides the real line into segments of length m. (In our example with modulo 12, the real line would be split into segments of size 12, with markings at positions {..., -24, -12, 0, 12, 24, ...}.) The modulo operation tells you how distant you are from the nearest marking to your left (i.e., how far you are from the beginning of the segment you are in); that number is always positive (or zero). The integer division operator tells you how many markings that marking is from zero; that number may be either positive or negative (or zero), depending on the direction you have to walk to get to that marking.

A negative modulus m inverts the orientation of the ruler, so now each segment begins on its right boundary and ends on its left one. Now the modulo operator tells you how distant you are from the nearest marking to your right (that number is always negative, or zero). Integer division still tells you how many markings that marking is from zero, but the sign is flipped (i.e., the quotient is positive when you walk to your left and negative when you walk to your right).

Note that this extends to non-integers as well: a modulus of 1.5 defines boundaries at {..., -4.5, 3.0, 1.5, 0, 1.5, 3.0, 4.5, ...}. 5.0 % 1.5 is 0.5 (it is 0.5 units away from the marking to the left at 4.5), and 5.0 // 1.5 is 3 (that's how many markings that marking at 4.5 is from zero). Likewise, -5.0 % 1.5 is 1.0 (it is 1.0 units away from the marking to the left at -6.0), and -5.0 // 1.5 is -4 (that's how many markings that marking at 6.0 is from zero).

This also helps making sense of the results with infinity operands. A modulus of positive infinity splits the real line into segments of length infinity, with markings at {-∞, 0, +∞}. A positive number x is always within the segment that starts at 0 and ends at Infinity, so x % Infinity for positive x is x itself: that's how distant it is from the beginning of that segment, i.e., from zero. The segment to the left of zero begins at -Infinity and ends at 0; any number within it (i.e., a negative number) is infinitely distant from the beginning of that segment, so x % -Infinity for negative x is always Infinity. A modulus of negative infinity is the same but has the opposite orientation: x % -Infinity for negative x is x itself, and x % -Infinity for positive x is -Infinity (since the number is infinitely distant from the right boundary of the segment beginning at positive infinity and ending at zero).

A modulus of zero would split the real line into segments of length 0, which does not make sense (for one thing, there would be no numbers within each of those segments), so a zero modulus is an error. (An alternative would be to return NaN for floats and raise an error for integers, but I'm unsure about this strategy.)

Infinity modulo any value m is also NaN. For one, you cannot even determine which segment the infinity value would be in, let alone how distant it is from the left boundary of said segment. For another, for any segment we picked, the infinity value would be infinitely distant from the left boundary of the segment, but the result must be between 0 and m (exclusive), so this operation is not well-defined for an infinity dividend. (It is not clear to me whether the integer division of infinity by a finite value should yield infinity or NaN, but when in doubt we follow Python here and return NaN.)

Int and Float method: self.(^)(other)

Returns the result of self raised to the power of other.

TODO: Specify corner cases with infinities, zero and NaN, even if just by referring to the appropriate IEEE 754 part.

Float method: self.is_finite()

Returns true if self is a finite number (not an infinity and not a NaN value).

Float method: self.is_infinity()

Returns true if self is a positive or negative infinity.

Float method: self.is_nan()

Returns true if self is a NaN value.

Float method: self.sign()

Returns 1.0 if self is positive, and -1.0 if self is negative. Note that IEEE 754 distinguishes between positive and negative zero; this method can be used to tell them apart (it returns 1.0 for positive zero and -1.0 for negative zero). If self is NaN, the result is NaN.

std/base function: floor(x)

Returns the value of x rounded towards negative infinity to the nearest integer. The result is an integer, unless it cannot be represented as an integer, in which case a float is returned.

std/base function: ceiling(x)

Returns the value of x rounded towards positive infinity to the nearest integer. The result is an integer, unless it cannot be represented as an integer, in which case a float is returned.

std/base function: truncate(x)

Returns the value of x rounded towards zero to the nearest integer. The result is an integer, unless it cannot be represented as an integer, in which case a float is returned.

std/base function: abs(x)

Returns the absolute value of x (i.e., x if it is non-negative, and -x if it is negative).

Rationale

floor, ceiling, truncate, abs are functions instead of methods simply because they are more ergonomic to use this way when the argument is a more elaborate expression than a single variable (e.g., floor(x ^ 0.5) rather than (x ^ 0.5).floor()).

std/base function: max(...numbers)

Returns the argument with the highest numerical value. At least one argument has to be provided. If any of the arguments is NaN, the result is NaN.

std/base function: min(...numbers)

Returns the argument with the lowest numerical value. At least one argument has to be provided. If any of the arguments is NaN, the result is NaN.

Strings

Introduction and preliminary notes

Strings are immutable sequences of bytes. They are intended to represent sequences of Unicode codepoints encoded as UTF-8, but they don't have to: any sequence of bytes is a valid string.

String functions that treat strings as sequences of Unicode codepoints should treat bytes that are not part of a valid UTF-8 sequence as if they were unassigned Unicode codepoints. For instance, operations such as lowercasing a string should leave such non-UTF-8 bytes unchanged in the result.

std/base class: Str

The class of string values.

The string class constructor accepts one argument and calls the str() method on it.

Str method: self.byte(position)

Return the byte at position start (counting from zero) in the string, as an integer. position must be an integer. If position is negative, return the byte position positions from the end of the string (i.e., the last byte can be referenced as -1).

Examples:

  • "abc".byte(0)97 (the ASCII code of a)
  • "abc".byte(-1)99 (the ASCII code of c)
  • "abc".byte(3) → error

Str method: self.bytes(start, length)

Return the substring starting at position start (counting from zero), and having at most length bytes. If the end position would be beyond the end of the string, it is treated as if the end of the string had been specified. If start is negative, count from the end of the string; if the start position would be before the beginning of the string, the beginning of the string is used.

Examples:

  • "abc".bytes(0, 1)"a"
  • "abcde".bytes(-2, 2)"de"
  • "abcde".bytes(-3, 2)"cd"
  • "abcde".bytes(-3, 1000)"cde"

Str method: self.char(position)

Return the substring corresponding to the position-th (counting from zero) Unicode codepoint in the string. If position is negative, count from the end of the string.

Examples:

  • "hello".char(0)"h"
  • "Tír na nÓg".char(1)"í"
  • "Tír na nÓg".char(-2)"Ó"

Note

char is a bit of a misleading name, since this method considers codepoints and not "characters" (a term that is quite ambiguous in the world of Unicode), but codepoint is a bit too wordy for such a basic operation. A possible alternative name would be rune, as in Go usage, but in Go a rune is an integer value representing a Unicode codepoint, whereas this method returns a substring (there is no distinct character type in Fenius), so I'm not sure if this would be a good name to use.

Str method: self.chars(position, length)

Return the substring consisting starting at the position-th (counting from zero) Unicode codepoint in the string, and comprising at most length codepoints. If position is negative, count from the end of the string; if the start position would be before the beginning of the string, the beginning of the string is used.

Examples:

  • "Tír na nÓg".chars(0, 3)"Tír"
  • "Tír na nÓg".chars(4, 2)"na"
  • "Tír na nÓg".char(-3, 1000)"nÓg"
  • "Tír na nÓg".char(-1000, 3)"Tír"

Str method: self.(++)(other)

Return the concatenation of the strings self and other.

Str method: self.(==)(other), self.(===)(other)

Return true if self and other are both strings and contain the same bytes.

Str method: self.(<)(other)

Return true if self is before other in the lexicographical order of their bytes (which for UTF-8 strings is equivalent to the lexicographical order of their codepoints).

Str method: self.byte_len()

Return the number of bytes in the string.

Examples:

  • "Tír".byte_len()4

Str method: self.char_len()

Return the number of Unicode codepoints in the string.

Examples:

  • "Tír".char_len()3

Str method: self.split(delimiter), self.split(delimiter, count)

Split the string at occurrences of the substring delimiter, and return a list of the parts. If count (which must be an integer) is specified, only split at the first count occurrences of delimiter. If count is negative, count occurrences from the end of the string. A count of zero is the same as not specifying a count argument.

Examples

  • "Tír na nÓg".split(" ")["Tír", "na", "nÓg"]
  • "Tír na nÓg".split(" ", 1)["Tír", "na nÓg"]
  • "Tír na nÓg".split(" ", -1)["Tír na", "nÓg"]

Cells

Cells are mutable objects containing a single value. Cells are used to deal with mutability in a controlled way in Fenius.

The most basic usage for cells is to store a single value that can be replaced during execution. Variables are immutable in Fenius, but they can be bound to a cell which can itself change its contents:

let v = &42     # Bind `v` to a cell containing the number 42.
v := 81         # Change the cell contents to 81.
print(@v)       # Extract the cell contents and print it.

In this case, the cell serves simply as mutable container for immutable values. However, values of different types can implement custom behavior when stored within a cell. For instance, a list can make operations for appending or mutating elements in place available only when the list is contained within a cell. In this way, they can provide more efficient versions of those operations while still ensuring that when these values are taken out of the cell, they will remain immutable. In this way, mutability is (ideally) always visible in the language: everything is immutable unless it is inside a cell.

std/base function: &value

Create a cell containing value.

std/base function: cell := value

Change the contents of the cell to value.

std/base function: @cell

If the cell's value has a cell method extract_from_cell, call that method with no arguments. Otherwise, return the cell's value.

std/base macro: cell&.method

Evaluate cell. If the cell's value has a cell method method, invoke that method. Otherwise, this is equivalent to (@cell).method.

std/base macro: cell&[...items]

Evaluate cell and items. If the cell's value has a cell method subscript, call that method with ...items as arguments. Otherwise, this is equivalent to (@cell)[...items].

std/base method: self.(==)(other)

Return true if self and other are both cells and their contents are equal (in the == sense).

std/base method: self.(===)(other)

Return true if self and other are the same cell. Two distinct cells are not ===, even if they have the same contents.

Streams

Streams are sequences of elements generated on demand. They provide functionality analogous to iterators in imperative languages. However, unlike iterators, streams preserve the entire sequence of elements, and they can be traversed any number of times; traversing a stream s will always yield the same elements every time it is traversed. Nevertheless, streams provide cell methods so that, when stored within cells, they can be used in much the same way as iterators, in a more memory-efficient way.

std/base class: Stream

The class of stream values.

The stream constructor takes two arguments: an initial state value, an a generator function. Whenever a new value of the stream must be generated, the generator function will be called with the current state value as an argument, and must either return nil (indicating that there are no more elements in the stream), or a pair (next_value, next_state), where next_value will be used as the next value of the stream, and next_state is the state value that will be passed in the next call to the generator function. The generator function is guaranteed to be called only once per generated element.

Examples:

# Return a stream of all numbers from `start` to `end`.
let number_range(start, end) = {
    Stream(start, current_value -> {
        if current_value > end {
            nil
        } else {
            (current_value, current_value + 1)
        }
    })
}

# Return an infinite stream of the Fibonacci sequence.
let fibs = Stream((0, 1), state -> {
    let (lesser, greater) = state
    let next_state = (greater, greater + lesser)
    (lesser, next_state)
}

Stream method: self.empty()

Return true if there are no elements in the stream.

Stream method: self.first()

Return the first element of the stream (generating it if necessary). If there are no elements in self, an error is raised.

Stream method: self.rest()

Return the stream without its first element. If there are no elements in self, an error is raised.

Stream cell method: self&.next()

Return the next element of the stream stored in self, and set self to point to the rest of the stream. Equivalent to:

{
    let next_element = (@self).first()
    self := (@self).rest()
    next_element
}

Stream method: self.map(func)

Return a new stream whose elements are obtained by calling func on each element of the original stream self.

Stream method: self.each(func)

Call func on each element of the stream self. The values returned by func are discarded.

Example (using the number_range function defined in the previous example):

  • number_range(1, 5).map(x -> x*x)<Stream (1, 4, 9, 16, 25)>

Stream method: self.filter(func)

Return a new stream whose elements are those in the original stream self for which func returns true.

Example (using the number_range function defined in the previous example):

  • number_range(1, 10).filter(x -> x%2 == 0)<Stream (2, 4, 6, 8, 10)>

Stream method: self.reduce(initial_state, func)

func must be a function taking two arguments: a state and an element of the stream. For each element of the stream, in sequence, func will be called with the current state (which is initial_state in the first call) and the element; the result of the call will be used as the next state value. The return value of the method is the last value returned by func.

Example (using the number_range function defined in the previous example):

# Compute the sum of all numbers from 1 to 10.
let sum = number_range(1, 10).reduce(0, +)

# Concatenate a stream of strings.
let strings = ["foo", "bar", "baz"].stream()
let concatenated_strings = strings.reduce("", ++)

Stream method: self.then(func)

Return a stream consisting of the elements of self, followed by the elements of the stream produced by calling func with no arguments. func is called only once, and only after all of self has been traversed.

Stream method: self.consume(func)

Replace the contents of the cell with nil. If the previously stored value has a method mutable_cell_consume, call it with func as an argument. Otherwise, call func on the previously stored value.

TODO: Add a rationale.

Lists

Lists are sequences of elements of any type. Lists are generally immutable, but they provide mutable methods when contained inside cells.

std/base class: List

The class of list values.

The List class constructor accepts a stream as an argument and returns a list of all elements of the stream.

List method: self.stream()

Return a stream whose elements are the elements of the list.

List method: self.empty()

Return true if the list has no elements.

List method: self.subscript(position)

Return the element at position position (counting from zero) in the list. If position is negative, count from the end of the list. If position does not correspond to a valid position in the list, raise an error.

Note that the subscript is called by bracketed application, i.e., list[i] is equivalent to list.subscript(i).

List method: self.get(position, default)

Return the element at position positionin the list (with the same semantics as for subscript). If position does not correspond to a valid position in the list, return default.

List method: self.len()

Return the number of elements in the list.

List method: self.slice(start, length)

Return a list containing the elements of self starting at start and having at most length elements. If start is beyond the end of the list, return an empty list. If self is negative, count from the end of the list; if the position would be before the beginning of the list, use the beginning of the list as the starting position.

List method: self.map(func)

Return a list whose elements are the obtained by calling func on each element of the original list.

List cell method: self&.map(func)

Replace each element in the list by the result of calling func on it.

List method: self.each(func)

Call func on each element of the list. The values returned by func are discarded.

List method: self.filter(func)

Return a list whose elements are those in the original list for which func returns true.

List cell method: self&.filter(func)

Remove from the list those elements for which func returns false.

List method: self.reduce(initial_state, func)

Equivalent to self.stream().reduce(initial_state, func).

Dictionaries

Dictionaries are mappings from keys (which can be any value) to values. Dictionaries are generally immutable, but they provide mutable methods when contained within cells.

std/base class: Dict

The class of dictionary values.

The Dict class constructor takes a stream or streamable sequence of key/value pairs and returns a dictionary containing those key/value pairs.

std/base macro: : (key: val)

Return a tuple (key, val). If key is an identifier, it is not evaluated; the name of the identifier as a string is used instead. If it's not an identifier, it is evaluated. In particular, a block can be used to force evaluation of an identifier (i.e., {key}: val will force the evaluation of key). val is always evaluated.

Rationale

The current version of Fenius does not have keyword arguments, but this macro mimics the expected syntax to be added at some moment in The Future. In the meanwhile, dictionary functions that expect key/value arguments take pairs instead, and this macro makes it more convenient to build such pairs.

std/base function: dict(...pairs)

Construct a dictionary with the given key/value pairs. If the same key appears more than once, the last occurrence takes precedence.

Example:

  • dict(foo: 23, bar: 42)

Dict method: self.(==)(other), self.(===)(other)

Return true if self and other are both dictionaries with the same keys, with each key mapping to the same value.

Dict method: self.subscript(key)

Return the value associated with key in the dictionary. If the key is not present in the dictionary, an error is raised. Note that the subscript method is called by bracketed application, i.e., somedict[key] is equivalent to somedict.subscript(key).

Dict method: self.get(key, default)

Return the value associated with key in the dictionary. If the key is not present in the dictionary, return default.

Dict method: self.with(...pairs)

Return a new dictionary with the same key/value associations as self, plus the ones specified in pairs. If the pairs contain keys that are already present in self, the new values in pairs take precedence. If the same key appears more than once, the last occurrence takes precedence.

Dict method: self.without(...keys)

Return a new dictionary without the listed keys. If a listed key is not present in self, it is ignored.

Dict method: self.has_key(key)

Return true if self has the specified key.

Dict cell method: self&.with(...pairs)

Modify the dictionary with the new key/value associations.

Dict cell method: self&.without(...keys)

Remove the keys from the dictionary.

Dict method: self.pairs()

Return a stream of the key/value pairs of the dictionary. The order of the pairs is unspecified.

Dict method: self.keys()

Return a stream of the keys of the dictionary. The order of the keys is unspecified.

Dict method: self.values()

Return a stream of the values of the dictionary. The order of the values is unspecified.

Nil

nil is a constant used as a placeholder when there is no meaningful value to return. An empty block returns nil. The nil value belongs to the Nil class.

std/base class: Nil

The class of the nil constant.

The Nil class has no constructor.

Functions

Modules

Objects

Fenius has a simplistic object system, providing the ability to create new object types and define methods on them. The goal is to have these basic abilities in the initial version, while leaving more complex design problems such as inheritance, interfaces, etc. for the future.

Fenius provides a number of basic operators to construct new object types, and a class syntactic form that abstracts over them.

Function: make_class(name, constructor, methods, cell_methodds)

Construct a new class.

name is the name of the class as a string. This is used for printing and debugging.

constructor is a function that will be called when the class object is called. It should return a new instance of the object, typically by calling make_instance (see below).

methods is a dictionary mapping method names to functions. These functions must all take one argument (the object instance). If obj is an object, then obj.m will call the function stored at key "m" in the object's method dictionary, passing obj as an argument. Note that if you want to be able to call, say, obj.m(x, y), the method function must itself return a function taking x and y as arguments. In this way, both properties and conventional methods are implemented in a uniform way.

cell_methods is a dictionary mapping cell method names to functions. These functions must all take two arguments: the cell object, and the object instance. If cell is a cell containing an object obj, cell&.method will call the function stored at key "method" in the object's cell method dictionary, passing cell and obj as arguments.

Example:

let Person = make_class(
    "Person",
    (name, age) -> make_instance(Person, &dict(name: name, age: age)),
    dict(
        name: self -> { instance_data(self)&["name"] },
        salute: self -> { () -> "Hello, " ++ self.name ++ "!" },
    ),
    dict(
        set_name: (cell, self) -> { (new_name) -> instance_data(self)&.with(name: new_name); nil }
    )
)

This function returns the new class object. This object is callable: calling it will call the class's constructor function.

Function: make_instance(class, data)

Return an instance of a given class. class must be a class object as returned by make_class. data can be any value.

Function: instance_class(object)

Return the class of an object. For instances created with make_instance, this returns the class object used in the instantiation. For other objects, this returns the corresponding built-in class. Every Fenius object has a class, regardless of whether it was constructed with make_instance or not.

Function: instance_data(object)

Return the data of an instance object. For instances created with make_instance, this returns the data value used in the instantiation. For other objects, this may raise an error, or may return internal values used to implement the object. Programs should not rely on the results of calling instance_data on non-user-defined objects.

Syntax: object.method (the . operator)

Invoke an object's method.

This operator takes the method function named method from object's class' method dictionary and calls it with object as an argument.

Syntax: class

A syntactical form abstracting over the details of class creation.

The syntax of this form is class <name>(<field>, ...) [mutable] { <methods> ... }. name is the identifier to which the class object will be assigned. It will also be used in building the class' name string. The <field>s are zero or more comma-separated identifiers naming the class fields. When the class constructor is called, each argument must be a field: value pair; a dictionary mapping the field names to the given values will be constructed and used as instance data for the new instance. If the mutable option is specified, the dictionary will be wrapped in a cell.

The { <methods> ... } block is optional. If present, it defines the methods of the class. The following forms are accepted inside the block:

  • let <self:identifier>.<name:identifier>(<arg:identifier>, ...) = <body>

Define a method named <name>. The method function takes <self> as an argument and returns a function, which takes (<arg>, ...) as arguments and evaluates <body>.

  • let <self:identifier>.<name:identifier> = <body>

Define a method named <name>. The method function takes <self> as an argument and evaluates <body>.

  • let <cell:identifier>&<self:identifier>.<name:identifier>(<arg:identifier>, ...) = <body>

Define a cell method named <name>. The method function takes <cell> and <self> as arguments and returns a function, which takes (<arg>, ...) as arguments and evaluates <body>.

  • let <cell:identifier>&<self:identifier>.<name:identifier> = <body>

Define a cell method named <name>. The method function takes <cell> and <self> as arguments and evaluates <body>.

In addition to the methods defined in the method block, the following methods will be defined by default unless overridden in the block:

  • One accessor method for each field, with the same name as the field.
  • A self.(==)(other) method, which tests instance_class(self) == instance_class(other) && instance_data(self) == instance_data(other).
  • A self.(===)(other) method, which tests instance_class(self) === instance_class(other) && instance_data(self) === instance_data(other). (Note that for mutable classes, == and === give different results, because the instance data is a cell.)
  • A str() method, returning a conventional printed representation of the object.
  • A self.with(...pairs) method, which returns a new object with new values for the specified fields.
  • If the class is mutable, a cell&.with(...pairs) method, which modifies the values of the specified fields.

For example, a declaration like:

class Person(name, age) mutable {
    let self.say(text) = self.name ++ " says: " ++ text

    let self.age_in_years = self.age * 365

    let cell&self.set_name(new_name) = {

}

is roughly equivalent to:

let Person = make_class(
    "Person",
    (name, age) -> make_instance(Person, &dict(name: name, age: age)),
    dict(
        say: self -> {(name, age) -> self.name ++ " says: " ++ text},
        age_in_years: self -> self.age * 365,

        name: self -> instance_data(self)&["name"],
        age: self -> instance_data(self)&["age"],

        (==): self -> { other -> {
            (instance_class(self) == instance_class(other) &&
             instance_data(self) == instance_data(other))
         } }

        (===): self -> { other -> {
            (instance_class(self) === instance_class(other) &&
             instance_data(self) === instance_data(other))
         } }

        str: self -> { () -> "Person(name: " ++ name.str() ++ ", age: " ++ age.str() ++ ")" }

        with: self -> { (...pairs) -> make_instance(instance_class(self), &(instance_data(self)@.with(...pairs))) }
    ),
    dict(
        with: (cell, self) -> { (...pairs) -> instance_data(self)&.with(...pairs); cell }
    )
)

Class: Class

The class of class objects. Calling instance_class on an object constructed by make_class returns this value.

This class defines three methods:

  • name: return the class' name string.

  • constructor: return the class' constructor. For user-defined classes, this will be the function passed as the constructor argument to make_class. For built-in classes, this may either return an internal function used to construct objects of the class, or return a function that raises a "class has no constructor" error.

  • methods: return the class' method dictionary. This is always a valid dictionary of all of the class' methods, even for built-in classes.

Enumerations

Enumerations are a shorthand for defining sets of constants belonging to a single class.

Syntax: enum <name:identifier> (<variant:identifier>, ...)

Define a new enumeration class identified by <name>, and define one instance of that class for each <variant>.

For example, a declaration of the form:

enum Color (Blue, Yellow)

is roughly equivalent to:

let Color = make_class(
    "Color",
    label -> make_instance(Color, label),
    dict(
        label: self -> instance_data(self),
    )
)

let Blue = Color("Blue")

let Yellow = Color("Yellow")

except clients should not rely on the specific value used as instance_data to distinguish between the variants. (In this example, a plain string is used, i.e., it's not even a dictionary; but this should not be relied on. Instead, if you need to obtain the name of a variant as string, use the label method.)

Classes created by enum have a label method that can be used to retrieve the name of a given enumeration variant as a string. In the example above, Blue.label would return the string "Blue".

System interface

Files and ports

std/base class: Port

The class of port objects.

A port is a handler for an open file, device or network stream. Ports are inherently stateful, not only because they have process-level state such as a cursor into the file, but also because the contents of a file is generally mutable as well, not only by the current process but by external processes. Therefore, ports are always contained in a cell, and a port cannot be extracted from its cell.

The Port class has no constructor.

std/base function: open(pathname, mode)

Open the file with the given pathname. mode is one of "r", "w", "a", "r+", "w+", "a+", with the same semantics as the C fopen function. Return a port cell for the open file. If the file cannot be open in the given mode, raise an error.

No concept of encoding or newline transformation is currently supported.

Port cell method: self&.close()

Close the port.

Port cell method: self&.read_bytes(n)

Read n bytes from the port and return them as a string. If end of file is reached before n bytes are read, the resulting string will have a length less than n. In particular, if no characters have been read before end of file is reached, an empty string is returned.

Port cell method: self&.read_chars(n)

Read n UTF-8 Unicode codepoints from the port and return them as a string. If end of file is reached before n characters are read, the resulting string will have a length less than n.

Port cell method: self&.read_line()

Read a line from the port and return it as a string. A line is terminated by a "\n" character, optionally preceded by a "\r" character. The terminator characters are not part of the returned string. If end of file is reached with no line available, return nil.

Port cell method: self&.read_raw_line()

Read a line from the port and return it as a string. A line is terminated by a "\n" character, optionally preceded by a "\r" character. The terminator characters are part of the returned string. If end of file is reached with no line available, return nil.

Port cell method: self&.write_bytes(str)

Write the bytes in string to the port.

Port cell method: self&.write_chars(str)

Write the characters in string to the port. Currently, this is equivalent to write_chars, but may acquire different semantics in the future if encodings are implemented.

Port cell method: self&.write_line(string)

Write the bytes in string to the port, followed by a "\n" character.

Port cell method: self&.then(func)

Call func with the port cell as an argument, and close the port after func is finished. The port is closed even if the function raises an exception during execution.

std/base constant: stdin, stdout, stderr

Port cell associated with the standard input, standard output, and standard error output, respectively.

Filesystem manipulation

TODO: Quite a big API here! Probably should live in its own module, not std/base.

Command line arguments

std/base function: argv()

Return a list with the arguments passed to the process as strings. The first element of the list is the name the program was invoked with, and the remaining elements are the command-line arguments.