From 225a30537acfeb3efe2a7730e5d1a063cc537047 Mon Sep 17 00:00:00 2001 From: Marshall Lochbaum Date: Fri, 16 Apr 2021 20:39:10 -0400 Subject: Add return functions to the spec --- docs/spec/evaluate.html | 5 +++-- docs/spec/grammar.html | 10 ++++++---- docs/spec/index.html | 2 +- docs/spec/scope.html | 3 +++ 4 files changed, 13 insertions(+), 7 deletions(-) (limited to 'docs') diff --git a/docs/spec/evaluate.html b/docs/spec/evaluate.html index 9e221c53..6fdf0a0f 100644 --- a/docs/spec/evaluate.html +++ b/docs/spec/evaluate.html @@ -7,12 +7,12 @@

Specification: BQN evaluation

This page describes the semantics of the code constructs whose grammar is given in grammar.md. The formation rules there are not named, and here they are identified by either the name of the term or by copying the rule entirely if there are several alternative productions.

Here we assume that the referent of each identifier, or equivalently the connections between identifiers, have been identified according to the scoping rules.

-

Errors described in this page are "evaluation errors" and can be caught by the Catch (⎊) modifier. If an error is caught, evaluation halts without attempting to complete any in-progress node, and is restarted as part of the execution of Catch.

+

Evaluation is an ordered process, and any actions required to evaluate a node always have a specified order unless performing them in any order would have the same effect. Side effects that are relevant to ordering are setting and getting the value of a variable, causing an error, and returning (with β†’) from a block. Errors described in this page are "evaluation errors" and can be caught by the Catch (⎊) modifier. For caught errors and returns, evaluation halts without attempting to complete any in-progress node, and is restarted by Catch (for errors) or at the end of the appropriate block evaluation (for returns).

Programs and blocks

The result of parsing a valid BQN program is a PROGRAM, and the program is run by evaluating this term.

A PROGRAM or BODY is a list of STMTs, which are evaluated in program order. A result is always required for BODY nodes, and sometimes for PROGRAM nodes (for example, when loaded with β€’Import). If any identifiers in the node's scope are exported, or any of its statements is an EXPORT, then the result is the namespace created in order to evaluate the node. If a result is required but the namespace case doesn't apply, then the last STMT node must be an EXPR and its result is used. The statement EXPR evaluates some APL code and possibly assigns the results, while nothing evaluates any subject or Derv terms it contains but discards the results. An EXPORT statement performs no action.

A block consists of several BODY terms, some of which may have an accompanying header describing accepted inputs and how they are processed. An immediate block brImm can only have one BODY, and is evaluated by evaluating the code in it. Other types of blocks do not evaluate any BODY immediately, but instead return a function or modifier that obtains its result by evaluating a particular BODY. The BODY is identified and evaluated once the block has received enough inputs (operands or arguments), which for modifiers can take one or two calls: if two calls are required, then on the first call the operands are simply stored and no code is evaluated yet. Two calls are required if there is more than one BODY term, if the BODY contains the special names π•¨π•©π•€π•Žπ•π•Š, or if its header specifies arguments (the header-body combination is a _mCase or _cCase_). Otherwise only one is required.

-

To evaluate a block when enough inputs have been received, first the correct case must be identified. To do this, first each special case (FCase, _mCase, or _cCase_), excluding FCase nodes containing UndoHead, is checked in order to see if its arguments are strucurally compatible with the given arguments. That is, is headW is a subject, there must be a left argument matching that structure, and if headX is a subject, the right argument must match that structure. This means that 𝕨 not only matches any left argument but also no argument. The test for compatibility is the same as for multiple assignment described below, except that the header may contain constants, which must match the corresponding part of the given argument.If no special case matches, then an appropriate general case (FMain, _mMain, or _cMain_) is used: if there are two, the first is used with no left argument and the second with a left argument; if there are one, it is always used, and if there are none, an error results.

+

To evaluate a block when enough inputs have been received, first the correct case must be identified. To do this, first each special case (FCase, _mCase, or _cCase_), excluding FCase nodes containing UndoHead, is checked in order to see if its arguments are strucurally compatible with the given arguments. That is, is headW is a subject, there must be a left argument matching that structure, and if headX is a subject, the right argument must match that structure. This means that 𝕨 not only matches any left argument but also no argument. The test for compatibility is the same as for multiple assignment described below, except that the header may contain constants, which must match the corresponding part of the given argument. If no special case matches, then an appropriate general case (FMain, _mMain, or _cMain_) is used: if there are two, the first is used with no left argument and the second with a left argument; if there are one, it is always used, and if there are none, an error results.

The only remaining step before evaluating the BODY is to bind the inputs and other names. Special names are always bound when applicable: 𝕨𝕩𝕀 if arguments are used, 𝕨 if there is a left argument, π•—π•˜ if operands are used, and _𝕣 and _𝕣_ for modifiers and combinators, respectively. Any names in the header are also bound, allowing multiple assignment for arguments.

If there is no left argument, but the BODY contains 𝕨 at the top level, then it is conceptually re-parsed with 𝕨 replaced by Β· to give a monadic version before application; this modifies the syntax tree by replacing some instances of arg with nothing. However, it also causes an error if, in a function that is called with no left argument, 𝕨 is used as an operand or list element, where nothing is not allowed by the grammar. The same effect can also be achieved dynamically by treating Β· as a value and checking for it during execution. If it is used as a left argument, then the function should instead be called with no left argument (and similarly in trains); it it is used as a right argument, then the function and its left argument are evaluated but rather than calling the function Β· is "returned" immediately; and if it is used in another context then it causes an error.

Assignment

@@ -21,6 +21,7 @@

Modified assignment is the subject assignment rule lhs Derv "↩" subExpr. In this case, lhs should be evaluated as if it were a subExpr (the syntax is a subset of subExpr), and the result of the function application lhs Derv subExpr should be assigned to lhs, and is also the result of the modified assignment expression.

Expressions

We now give rules for evaluating an atom, Func, _mod1 or _mod2_ expression (the possible options for ANY). A literal or primitive sl, Fl, _ml, or _cl_ has a fixed value defined by the specification (literals and built-ins). An identifier s, F, _m, or _c_, if not preceded by atom ".", must have an associated variable due to the scoping rules, and returns this variable's value, or causes an error if it has not yet been set. If it is preceded by atom ".", then the atom node is evaluated first; its value must be a namespace, and the result is the value of the identifier's name in the namespace, or an error if the name is undefined. A parenthesized expression such as "(" _modExpr ")" simply returns the result of the interior expression. A braced construct such as BraceFunc is defined by the evaluation of the statements it contains after all parameters are accepted. Finally, a list "⟨" β‹„? ( ( EXPR β‹„ )* EXPR β‹„? )? "⟩" or ANY ( "β€Ώ" ANY )+ consists grammatically of a list of expressions. To evaluate it, each expression is evaluated in source order and their results are placed as elements of a rank-1 array. The two forms have identical semantics but different punctuation.

+

A Return node creates a return function. As discussed in the scoping rules, its identifier indicates a namespace from a particular block evaluation. When called, the function causes an error if that block has finished execution, or if the call includes a left argument 𝕨. Otherwise, evaluation stops immediately, and resumes at the end of the block where it returns the right argument 𝕩 from that block.

Rules in the table below are function and modifier evaluation.

diff --git a/docs/spec/grammar.html b/docs/spec/grammar.html index 46509c05..9babde63 100644 --- a/docs/spec/grammar.html +++ b/docs/spec/grammar.html @@ -32,10 +32,12 @@ |Operand_mod2_# Left partial application |_mASGN_m1Expr -

Functions can be formed by fully applying modifiers or as trains. modifiers are left-associative, so that the left operand (Operand) can include modifier applications but the right operand (subject | Func) cannot. Trains are right-associative, but bind less tightly than modifiers. Assignment is not allowed in the top level of a train: it must be parenthesized.

+

Functions can be formed by fully applying modifiers, as trains, or with the return token β†’, which behaves syntactically like a 1-modifier whose operand must be an identifier. Modifiers are left-associative, so that the left operand (Operand) can include modifier applications but the right operand (subject | Func) cannot. Trains are right-associative, but bind less tightly than modifiers. Assignment is not allowed in the top level of a train: it must be parenthesized.

Derv     = Func
          | Operand _mod1
          | Operand _mod2_ ( subject | Func )
+         | Return
+Return   = ( NAME | "π•Š" | "𝕣" ) "β†’"
 Operand  = subject
          | Derv
 Fork     = Derv
@@ -51,11 +53,11 @@
          | ( subject | nothing )? Derv arg
 nothing  = "Β·"
          | ( subject | nothing )? Derv nothing
-LHS_NAME = s | F | _m | _c_
-LHS_ANY  = LHS_NAME | lhsList
+NAME     = s | F | _m | _c_
+LHS_ANY  = NAME | lhsList
 LHS_ATOM = LHS_ANY | "(" lhsStr ")"
 LHS_ELT  = LHS_ANY | lhsStr
-LHS_ENTRY= LHS_ELT | lhs "⇐" LHS_NAME
+LHS_ENTRY= LHS_ELT | lhs "⇐" NAME
 lhsStr   = LHS_ATOM ( "β€Ώ" LHS_ATOM )+
 lhsList  = "⟨" β‹„? ( ( LHS_ENTRY β‹„ )* LHS_ENTRY β‹„? )? "⟩"
 lhs      = s | lhsList | lhsStr
diff --git a/docs/spec/index.html b/docs/spec/index.html
index 3b5539ee..d1673dc6 100644
--- a/docs/spec/index.html
+++ b/docs/spec/index.html
@@ -5,7 +5,7 @@
 
 
 

BQN specification

-

This document, and the others in this directory (linked in the list below) make up the pre-versioning BQN specification. The specification differs from the documentation in that its purpose is only to describe the exact details of BQN's operation in the most quickly accessible way, rather than to explain the central ideas of BQN functionality and how it might be used. The core of BQN, which excludes system-provided values, is now almost completely specified. One planned featuresβ€”an extension to allow low-rank elements in the argument to Joinβ€”has not yet been added, and the spec will continue to be edited further to improve clarity and cover any edge cases that have been missed.

+

This document, and the others in this directory (linked in the list below) make up the pre-versioning BQN specification. The specification differs from the documentation in that its purpose is only to describe the exact details of BQN's operation in the most quickly accessible way, rather than to explain the central ideas of BQN functionality and how it might be used. The core of BQN, which excludes system-provided values, is now almost completely specified. One planned featureβ€”an extension to allow low-rank elements in the argument to Joinβ€”has not yet been added, and the spec will continue to be edited further to improve clarity and cover any edge cases that have been missed.

Under this specification, a language implementation is a BQN pre-version implementation if it behaves as specified for all input programs. It is a BQN pre-version implementation with extensions if it behaves as specified in all cases where the specification does not require an error, but behaves differently in at least one case where it requires an error. It is a partial version of either of these if it doesn't conform to the description but differs from a conforming implementation only by rejecting with an error some programs that the conforming implementation accepts. As the specification is not yet versioned, other instances of the specification define these terms in different ways. An implementation can use one of these term if it conforms to any instance of the pre-versioning BQN specifications that defines them. When versioning is begun, there will be only one specification for each version.

The following documents are included in the BQN specification. A BQN program is a sequence of Unicode code points: to evaluate it, it is converted into a sequence of tokens using the token formation rules, then these tokens are arranged in a syntax tree according to the grammar, and then this tree is evaluated according to the evaluation semantics. The program may be evaluated in the presence of additional context such as a filesystem or command-line arguments; this context is presented to the program and manipulated through the system-provided values.

The definition for an identifier is chosen from the potential definitions based on their containing scopes: it is the one whose containing scope does not contain or match the containing scope of any other potential definition. If for any identifier there is no definition, then the program is not valid and results in an error. This can occur if the identifier has no potential definition, and also if two potential definitions appear in the same scope. In fact, under this scheme it is never valid to make two definitions with the same name at the top level of a single scope, because both definitions would be potential definitions for the one that comes second in program order. Both definitions have the same containing scope, and any potential definition must contain or match this scope, so no potential definition can be selected.

The definition of program order for identifier tokens follows the order of BQN execution. It corresponds to the order of a particular traversal of the abstract syntax tree for a program. To find the relative ordering of two identifiers in a program, we consider the highest-depth node that they both belong to; in this node they must occur in different components, or that component would be a higher-depth node containing both of them. In most nodes, the program order goes from right to left: components further to the right come earlier in program order. The exceptions are PROGRAM, BODY, NS_BODY, list, subject (for stranding), and body case (FCase, _mCase, _cCase_, FMain, _mMain, _cMain_, brSub, BrFunc, _brMod1, and _brMod2_) nodes, in which program order goes in the opposite order, from left to right (some assignment target nodes also contain lists or strands, but their ordering is irrelevant because if two identifiers with the same name appear in such a list, then it can't be a definition).

+

A subject label is the s term in a brSub node. As part of a header, it can serve as the definition for an identifier. However, it's defined to be a syntax error if another instance of this identifier appears, except in a Return node (which cannot access its value).

Special names

Special names such as 𝕩 or 𝕣 refer to variables, but have no definition and do not use scoping. Instead, they always refer to the immediately enclosing scope, and are defined automatically when the block is evaluated.

The six special names are π•¨π•©π•—π•˜π•€π•£, and the tokens π•Žπ•π”½π”Ύπ•Š, _𝕣, and _𝕣_ are alternate spellings of these names as described in the tokenization rules. Special names may be modified with ↩ assignment but cannot appear as the target of other kinds of assignment. Two special names represent the same identifier if they are the same name and appear in the same body. The initial value these names have is defined by the evaluation rules; the grammar for blocks ensures that all special names used in a block will be defined (possibly as the special value Β· in the case of 𝕨).

@@ -29,3 +30,5 @@

When a body in a block is evaluated, it creates a namespace, which contains a variable for each definition (that is, defined identifier instance) the body contains. Whenever another blockβ€”the block itself, not its contentsβ€”is evaluated during the execution of the block, it is linked to the currently-evaluating block, so that it will use the variables defined in this instance. By following these links repeatedly, an instance of a block is always linked to exactly one instance of each block that contains it. These links form a tree that is not necessarily related to the call stack of functions and modifiers. Using the links, the variable an identifier refers to is the one corresponding to that variable's definition in the linked instance of the containing scope for the definition.

The first access to a variable must be made by its definition (this also means it sets the variable). If a different instance of its identifier accesses it first, then an error results. This can happen because every scope contained in a particular scope sees all the definitions it uses, and such a scope could be called before the definition is run. Because of conditional execution, this property must be checked at run time in general; however, in cases where it is possible to statically determine that a program will always violate it, a BQN instance can give an error at compile time rather than run time.

A namespace defines a mapping from names to variables: if the given name is shared by an exported identifier in the body used to create that namespace, then that name maps to the variable corresponding to that identifier. The mapping is undefined for other names.

+

Returns

+

The name NAME | "π•Š" | "𝕣" in a Return node is resolved exactly like any other identifier. Following resolution, the block that defines the identifier must not be a namespace block (export variables or contain an EXPORT statement). Furthermore, if it is a NAME, then its definition must be an internal name for a containing block: s in brSub, F in FuncHead or FMain, _m in Mod1H1 or _mMain, or _c_ in Mod2H1 or _cMain_. When reached, the Return node's identifier is not accessed; instead, it is used to indicate the namespace that contains it, and through this the block evaluation that created that namespace.

-- cgit v1.2.3