From 4f618598f2f31bc466343e4d304f35b53a366da6 Mon Sep 17 00:00:00 2001 From: Marshall Lochbaum Date: Mon, 15 Mar 2021 16:59:32 -0400 Subject: Finish namespace specification --- docs/spec/evaluate.html | 8 ++++---- docs/spec/grammar.html | 18 ++++++++---------- docs/spec/scope.html | 7 ++++--- docs/spec/token.html | 6 +++--- docs/spec/types.html | 6 ++++-- 5 files changed, 23 insertions(+), 22 deletions(-) (limited to 'docs/spec') diff --git a/docs/spec/evaluate.html b/docs/spec/evaluate.html index be087db0..d1ab7b80 100644 --- a/docs/spec/evaluate.html +++ b/docs/spec/evaluate.html @@ -9,17 +9,17 @@

Here we assume that the referent of each identifier, or equivalently the connections between identifiers, have been identified according to the scoping rules.

Programs and blocks

The result of parsing a valid BQN program is a PROGRAM, and the program is run by evaluating this term.

-

A PROGRAM or BODY is a list of STMTs (for BODY, the last must be an EXPR, a particular kind of STMT), which are evaluated in program order. The statement EXPR evaluates some APL code and possibly assigns the results, while nothing evaluates any subject or Derv terms it contains but discards the results.

+

A PROGRAM or BODY is a list of STMTs, which are evaluated in program order. A result is always required for BODY nodes, and sometimes for PROGRAM nodes (for example, when loaded with β€’Import). If any identifiers in the node's scope are exported, or any of its statements is an EXPORT, then the result is the namespace created in order to evaluate the node. If a result is required but the namespace case doesn't apply, then the last STMT node must be an EXPR and its result is used. The statement EXPR evaluates some APL code and possibly assigns the results, while nothing evaluates any subject or Derv terms it contains but discards the results. An EXPORT statement performs no action.

A block consists of several BODY terms, some of which may have an accompanying header describing accepted inputs and how they are processed. An immediate block brImm can only have one BODY, and is evaluated by evaluating the code in it. Other types of blocks do not evaluate any BODY immediately, but instead return a function or modifier that obtains its result by evaluating a particular BODY. The BODY is identified and evaluated once the block has received enough inputs (operands or arguments), which for modifiers can take one or two calls: if two calls are required, then on the first call the operands are simply stored and no code is evaluated yet. Two calls are required if there is more than one BODY term, if the BODY contains the special names π•¨π•©π•€π•Žπ•π•Š, or if its header specifies arguments (the header-body combination is a _mCase or _cCase_). Otherwise only one is required.

To evaluate a block when enough inputs have been received, first the correct case must be identified. To do this, first each special case (FCase, _mCase, or _cCase_) is checked in order to see if its arguments are strucurally compatible with the given arguments. That is, is headW is a subject, there must be a left argument matching that structure, and if headX is a subject, the right argument must match that structure. This means that 𝕨 not only matches any left argument but also no argument. The test for compatibility is the same as for multiple assignment described below, except that the header may contain constants, which must match the corresponding part of the given argument.If no special case matches, then an appropriate general case (FMain, _mMain, or _cMain_) is used: if there are two, the first is used with no left argument and the second with a left argument; if there are one, it is always used, and if there are none, an error results.

The only remaining step before evaluating the BODY is to bind the inputs and other names. Special names are always bound when applicable: 𝕨𝕩𝕀 if arguments are used, 𝕨 if there is a left argument, π•—π•˜ if operands are used, and _𝕣 and _𝕣_ for modifiers and combinators, respectively. Any names in the header are also bound, allowing multiple assignment for arguments.

If there is no left argument, but the BODY contains 𝕨 at the top level, then it is conceptually re-parsed with 𝕨 replaced by Β· to give a monadic version before application; this modifies the syntax tree by replacing some instances of arg with nothing. However, it also causes an error if, in a function that is called with no left argument, 𝕨 is used as an operand or list element, where nothing is not allowed by the grammar. The same effect can also be achieved dynamically by treating Β· as a value and checking for it during execution. If it is used as a left argument, then the function should instead be called with no left argument (and similarly in trains); it it is used as a right argument, then the function and its left argument are evaluated but rather than calling the function Β· is "returned" immediately; and if it is used in another context then it causes an error.

Assignment

-

An assignment is one of the four rules containing ASGN, other than IMPORT. It is evaluated by first evaluating the right-hand-side subExpr, FuncExpr, _m1Expr, or _m2Exp_ expression, and then storing the result in the left-hand-side identifier or identifiers. The result of the assignment expression is the result of its right-hand side. Except for subjects, only a lone identifier is allowed on the left-hand side and storage sets it equal to the result. For subjects, multiple assignment with a list left-hand side is also allowed. Multiple assignment is performed recursively by assigning right-hand-side values to the left-hand-side targets, with single-identifier (s) assignment as the base case. When matching the right-hand side to a list left-hand side, the left hand side is treated as a list of lhs targets. The evaluated right-hand side must be a list (rank-1 array) of the same length, and is matched to these targets element-wise.

+

An assignment is one of the four rules containing ASGN. It is evaluated by first evaluating the right-hand-side subExpr, FuncExpr, _m1Expr, or _m2Exp_ expression, and then storing the result in the left-hand-side identifier or identifiers. The result of the assignment expression is the result of its right-hand side. Except for subjects, only a lone identifier is allowed on the left-hand side and storage sets it equal to the result. For subjects, destructuring assignment is performed when an lhs is lhsList or lhsStr. Destructuring assignment is performed recursively by assigning right-hand-side values to the left-hand-side targets, with single-identifier assignment as the base case.

+

The right-hand-side value, here called v, in destructuring assignment must be a list (rank 1 array) or namespace. If it's a list, then each LHS_ENTRY node must be an LHS_ELT. The left-hand side is treated as a list of lhs targets, and matched to v element-wise, with an error if the two lists differ in length. If v is a namespace, then the left-hand side must be an lhsStr where every LHS_ATOM is an LHS_NAME, or an lhsList where every LHS_ENTRY is an LHS_NAME or lhs "⇐" LHS_NAME, so that it can be considered a list of LHS_NAME nodes some of which are also associated with lhs nodes. To perform the assignment, the value of each name is obtained from the namespace v, giving an error if v does not define that name. The value is assigned to the lhs node if present (which may be a destructuring assignment or simple subject assignment), and otherwise assigned to the same LHS_NAME node used to get it from v.

Modified assignment is the subject assignment rule lhs Derv "↩" subExpr. In this case, lhs should be evaluated as if it were a subExpr (the syntax is a subset of subExpr), and the result of the function application lhs Derv subExpr should be assigned to lhs, and is also the result of the modified assignment expression.

-

The IMPORT rule resembles a multiple assignment. However, in this case the values passed do not form a list but rather a module or namespace, which in this specification is not a value accessible to the programmer. To evaluate the IMPORT the brNS side is evaluated, then each inner variable mentioned in the nsLHS term is extracted and assigned to the corresponding outer identifier. Typically the two will both share the LHS_NAME, but if ⇐ is used in an NS_VAR then the lhs term refers to the outer identifier and LHS_NAME to the inner one. Since IMPORT is a statement and not an expression, it doesn't have a result value.

Expressions

-

We now give rules for evaluating an atom, Func, _mod1 or _mod2_ expression (the possible options for ANY). A literal or primitive sl, Fl, _ml, or _cl_ has a fixed value defined by the specification (literals and built-ins). An identifier s, F, _m, or _c_ is evaluated by returning its value; because of the scoping rules it must have one when evaluated. A parenthesized expression such as "(" _modExpr ")" simply returns the result of the interior expression. A braced construct such as BraceFunc is defined by the evaluation of the statements it contains after all parameters are accepted. Finally, a list "⟨" β‹„? ( ( EXPR β‹„ )* EXPR β‹„? )? "⟩" or ANY ( "β€Ώ" ANY )+ consists grammatically of a list of expressions. To evaluate it, each expression is evaluated in source order and their results are placed as elements of a rank-1 array. The two forms have identical semantics but different punctuation.

+

We now give rules for evaluating an atom, Func, _mod1 or _mod2_ expression (the possible options for ANY). A literal or primitive sl, Fl, _ml, or _cl_ has a fixed value defined by the specification (literals and built-ins). An identifier s, F, _m, or _c_, if not preceded by atom ".", must have an associated variable due to the scoping rules, and returns this variable's value, or causes an error if it has not yet been set. If it is preceded by atom ".", then the atom node is evaluated first; its value must be a namespace, and the result is the value of the identifier's name in the namespace, or an error if the name is undefined. A parenthesized expression such as "(" _modExpr ")" simply returns the result of the interior expression. A braced construct such as BraceFunc is defined by the evaluation of the statements it contains after all parameters are accepted. Finally, a list "⟨" β‹„? ( ( EXPR β‹„ )* EXPR β‹„? )? "⟩" or ANY ( "β€Ώ" ANY )+ consists grammatically of a list of expressions. To evaluate it, each expression is evaluated in source order and their results are placed as elements of a rank-1 array. The two forms have identical semantics but different punctuation.

Rules in the table below are function and modifier evaluation.

diff --git a/docs/spec/grammar.html b/docs/spec/grammar.html index 876bcfa7..dca88643 100644 --- a/docs/spec/grammar.html +++ b/docs/spec/grammar.html @@ -16,10 +16,10 @@

Here we define the "atomic" forms of functions and modifiers, which are either single tokens or enclosed in paired symbols. Stranded vectors with β€Ώ, which binds more tightly than any form of execution, are also included.

ANY      = atom | Func | _mod1 | _mod2_
-_mod2_   = _c_ | _cl_ | "(" _m1Expr_ ")" | ( atom "." )? _brMod2_
-_mod1    = _m  | _ml  | "(" _m2Expr  ")" | ( atom "." )? _brMod1
-Func     =  F  |  Fl  | "(" FuncExpr ")" | ( atom "." )?  BrFunc
-atom     =  s  |  sl  | "(" subExpr  ")" | ( atom "." )?  brSub | list
+_mod2_   = ( atom "." )? _c_ | _cl_ | "(" _m1Expr_ ")" | _brMod2_
+_mod1    = ( atom "." )? _m  | _ml  | "(" _m2Expr  ")" | _brMod1
+Func     = ( atom "." )?  F  |  Fl  | "(" FuncExpr ")" |  BrFunc
+atom     = ( atom "." )?  s  |  sl  | "(" subExpr  ")" |  brSub | list
 list     = "⟨" β‹„? ( ( EXPR β‹„ )* EXPR β‹„? )? "⟩"
 subject  = atom | ANY ( "β€Ώ" ANY )+
 
@@ -46,7 +46,7 @@ FuncExpr=Train|FASGNFuncExpr -

Subject expressions are complicated by the possibility of list assignment. We also define nothing-statements, which have very similar syntax to subject expressions but do not permit assignment.

+

Subject expressions are complicated by the possibility of list and namespace assignment, which share the nodes lhsList and lhsStr and cannot be completely distinguished until execution. We also define nothing-statements, which have very similar syntax to subject expressions but do not permit assignment.

arg      = subExpr
          | ( subject | nothing )? Derv arg
 nothing  = "Β·"
@@ -55,12 +55,10 @@
 LHS_ANY  = LHS_NAME | lhsList
 LHS_ATOM = LHS_ANY | "(" lhsStr ")"
 LHS_ELT  = LHS_ANY | lhsStr
+LHS_ENTRY= LHS_ELT | lhs "⇐" LHS_NAME
 lhsStr   = LHS_ATOM ( "β€Ώ" LHS_ATOM )+
-lhsList  = "⟨" β‹„? ( ( LHS_ELT β‹„ )* LHS_ELT β‹„? )? "⟩"
-NS_VAR   = ( lhs "⇐" )? LHS_NAME
-lhsNs    = LHS_NAME ( "β€Ώ" LHS_NAME )+
-         | "⟨" β‹„? ( ( NS_VAR β‹„ )* NS_VAR β‹„? )? "⟩"
-lhs      = s | lhsList | lhsStr | lhsNs
+lhsList  = "⟨" β‹„? ( ( LHS_ENTRY β‹„ )* LHS_ENTRY β‹„? )? "⟩"
+lhs      = s | lhsList | lhsStr
 subExpr  = arg
          | lhs ASGN subExpr
          | lhs Derv "↩" subExpr       # Modified assignment
diff --git a/docs/spec/scope.html b/docs/spec/scope.html
index d5280e31..8ed5bab2 100644
--- a/docs/spec/scope.html
+++ b/docs/spec/scope.html
@@ -22,9 +22,10 @@
 

Special names such as 𝕩 or 𝕣 refer to variables, but have no definition and do not use scoping. Instead, they always refer to the immediately enclosing scope, and are defined automatically when the block is evaluated.

The six special names are π•¨π•©π•—π•˜π•€π•£, and the tokens π•Žπ•π”½π”Ύπ•Š, _𝕣, and _𝕣_ are alternate spellings of these names as described in the tokenization rules. Special names may be modified with ↩ assignment but cannot appear as the target of other kinds of assignment. Two special names represent the same identifier if they are the same name and appear in the same body. The initial value these names have is defined by the evaluation rules; the grammar for blocks ensures that all special names used in a block will be defined (possibly as the special value Β· in the case of 𝕨).

Imports and exports

-

The names to be used in an IMPORT, that is, LHS_NAME terms in an NS_VAR or nsLHS, are variable references inside that IMPORT's brNS term. If they appear without an accompanying lhs "⇐" term (in NS_VAR), then this is in addition to their role as identifiers within the actual enclosing scope, which works like any other assignment. These references behave as though they are at the end of the brNS term, that is, they "see" all definitions in the block. However, they must refer to identifiers that are exported by that block; references to any other variable cause an error much like references that have no definition.

-

An identifier is exported if the ASGN node in its definition is "⇐", or if it appears anywhere in an EXPORT term. An identifier can only be exported in the scope where it is defined, and not in a containing scope. An EXPORT term that includes an identifier from such a scope causes an error.

+

Names that are preceded by an atom "." term, or that appear as LHS_NAME terms in an NS_VAR or lhsNs, are variable references in a namespace: in the first case, the result of the atom node, and in the second, of the overall assignments subExpr right hand side. These names do not follow lexical scoping; in general they must be stored in order to perform a name lookup when the namespace is available. Such a name in lhsNs, or in NS_VAR with no accompanying lhs "⇐" term, additionally serves as an identifier within the actual enclosing scope, which works like any other assignment.

+

An identifier is exported if the ASGN node in its definition is "⇐", or if it appears anywhere in an EXPORT term. An identifier can only be exported in the scope where it is defined, and not in a containing scope. An EXPORT term that includes an identifier from such a scope causes an error.

Variables

A variable is an entity that permits two operations: it can be set to a particular value, and its value can be obtained, resulting in the last value it was set to. When either operation is performed it is referred to as accessing the variable.

-

When a body in a block is evaluated, a variable is created for each definition (that is, defined identifier instance) the body contains. Whenever another blockβ€”the block itself, not its contentsβ€”is evaluated during the execution of the block, it is linked to the currently-evaluating block, so that it will use the variables defined in this instance. By following these links repeatedly, an instance of a block is always linked to exactly one instance of each block that contains it. These links form a tree that is not necessarily related to the call stack of functions and modifiers. Using the links, the variable an identifier refers to is the one corresponding to that variable's definition in the linked instance of the containing scope for the definition.

+

When a body in a block is evaluated, it creates a namespace, which contains a variable for each definition (that is, defined identifier instance) the body contains. Whenever another blockβ€”the block itself, not its contentsβ€”is evaluated during the execution of the block, it is linked to the currently-evaluating block, so that it will use the variables defined in this instance. By following these links repeatedly, an instance of a block is always linked to exactly one instance of each block that contains it. These links form a tree that is not necessarily related to the call stack of functions and modifiers. Using the links, the variable an identifier refers to is the one corresponding to that variable's definition in the linked instance of the containing scope for the definition.

The first access to a variable must be made by its definition (this also means it sets the variable). If a different instance of its identifier accesses it first, then an error results. This can happen because every scope contained in a particular scope sees all the definitions it uses, and such a scope could be called before the definition is run. Because of conditional execution, this property must be checked at run time in general; however, in cases where it is possible to statically determine that a program will always violate it, a BQN instance can give an error at compile time rather than run time.

+

A namespace defines a mapping from names to variables: if the given name is shared by an exported identifier in the body used to create that namespace, then that name maps to the variable corresponding to that identifier. The mapping is undefined for other names.

diff --git a/docs/spec/token.html b/docs/spec/token.html index 0374e0b8..7209f982 100644 --- a/docs/spec/token.html +++ b/docs/spec/token.html @@ -9,7 +9,7 @@

BQN source code should be considered as a series of unicode code points, which we refer to as "characters". The separator between lines in a file is considered to be a single character, newline, even though some operating systems such as Windows typically represent it with a two-character CRLF sequence. Implementers should note that not all languages treat unicode code points as atomic, as exposing the UTF-8 or UTF-16 representation instead is common. For a language such as JavaScript that uses UTF-16, the double-struck characters π•¨π•Žπ•©π•π•—π”½π•˜π”Ύ are represented as two 16-bit surrogate characters, but BQN treats them as a single unit.

A BQN character literal consists of a single character between single quotes, such as 'a', and a string literal consists of any number of characters between double quotes, such as "" or "abc". Character and string literals take precedence with comments over other tokenization rules, so that # between quotes does not start a comment and whitespace between quotes is not removed, but a quote within a comment does not start a character literal. Almost any character can be included directly in a character or string literal without escaping. The only exception is the double quote character ", which must be written twice to include it in a string, as otherwise it would end the string instead. Character literals require no escaping at all, as the length is fixed. In particular, literals for the double and single quote characters are written ''' and '"', while length-1 strings containing these characters are "'" and """".

A comment consists of the hash character # and any following text until (not including) the next newline character. The initial # must not be part of a string literal started earlier. Comments are ignored entirely and do not form tokens.

-

Identifiers and numeric literals share the same token formation rule. These tokens are formed from the numeric characters Β―βˆžΟ€.0123456789 and alphabetic characters _abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ and the oddball 𝕣. Any sequence of these characters adjacent to each other forms a single token, which is a numeric literal if it begins with a numeric character and an identifier if it begins with an alphabetic character. If a token begins with an underscore then its first non-underscore character must be alphabetic: for example, _99 is not a valid token. Numeric literals are also subject to numeric literal rules, which specify which numeric literals are valid and which numbers they represent. If the token contains 𝕣 it must be either 𝕣, _𝕣, or _𝕣_ and is considered a special name (see below). As the value taken by this identifier can only be a modifier, the uppercase character ℝ is not allowed.

+

Identifiers and numeric literals share the same token formation rule. These tokens are formed from the numeric characters Β―βˆžΟ€0123456789 and alphabetic characters _abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ and the oddball 𝕣. Additionally, . is considered a numeric character if it is followed immediately by a digit (0123456789); otherwise it forms its own token. Any sequence of these characters adjacent to each other forms a single token, which is a numeric literal if it begins with a numeric character and an identifier if it begins with an alphabetic character. If a token begins with an underscore then its first non-underscore character must be alphabetic: for example, _99 is not a valid token. Numeric literals are also subject to numeric literal rules, which specify which numeric literals are valid and which numbers they represent. If the token contains 𝕣 it must be either 𝕣, _𝕣, or _𝕣_ and is considered a special name (see below). As the value taken by this identifier can only be a modifier, the uppercase character ℝ is not allowed.

Following this step, the whitespace characters space and tab are ignored, and do not form tokens. Only these whitespace characters, and the newline character, which does form a token, are allowed.

Otherwise, a single character forms a token. Only the specified set of characters can be used; others result in an error. The classes of characters are given below.

@@ -26,7 +26,7 @@ - + @@ -42,7 +42,7 @@ - +
Primitive Function+-Γ—Γ·β‹†βˆšβŒŠβŒˆ|¬∧∨<>β‰ =≀β‰₯β‰‘β‰’βŠ£βŠ’β₯ŠβˆΎβ‰β†‘β†“β†•Β«Β»βŒ½β‰/β‹β’βŠβŠ‘βŠβŠ’βˆŠβ·βŠ”+-Γ—Γ·β‹†βˆšβŒŠβŒˆ|¬∧∨<>β‰ =≀β‰₯β‰‘β‰’βŠ£βŠ’β₯ŠβˆΎβ‰β†‘β†“β†•Β«Β»βŒ½β‰/β‹β’βŠβŠ‘βŠβŠ’βˆŠβ·βŠ”!
Primitive 1-Modifier
Punctuation←⇐↩→(){}βŸ¨βŸ©β€Ώβ‹„, and newline←⇐↩→(){}βŸ¨βŸ©β€Ώβ‹„,. and newline
diff --git a/docs/spec/types.html b/docs/spec/types.html index 3e6f3696..74110a74 100644 --- a/docs/spec/types.html +++ b/docs/spec/types.html @@ -5,7 +5,7 @@

Specification: BQN types

-

BQN programs manipulate data of six types:

+

BQN programs manipulate data of seven types:

-

Of these, the first three are considered data types and the remaining three operation types. We first describe the operation types; the remainder of this page will be dedicated to the data types. A member of any operation type accepts some number of inputs and either returns a result or causes an error; inputs and the result are values of any type. When a function is given inputs (called), it may produce side effects before returning, such as manipulating variables and calling other functions within its scope, or performing I/O.

+

Of these, the first three are considered data types and the next three operation types. We first describe the operation types and the namespace; the remainder of this page will be dedicated to the data types. A member of any operation type accepts some number of inputs and either returns a result or causes an error; inputs and the result are values of any type. When a function is given inputs (called), it may produce side effects before returning, such as manipulating variables and calling other functions within its scope, or performing I/O.

+

A namespace holds the variables used to evaluate a block or program, as defined in the scoping rules. The observable aspects of a namespace are that it can be compared for equality with other namespaces and that it exposes variables associated with certain names, whose values can be queried or set.

To begin the data types, a character is a Unicode code point, that is, its value is a non-negative integer within the ranges defined by Unicode (however, it is distinct from this number as a BQN value). Characters are ordered by this numeric value. BQN deals with code points as abstract entities and does not expose encodings such as UTF-8 or UTF-16.

The precise type of a number may vary across BQN implementations or instances. A real number is a member of some supported subset of the extended real numbers, that is, the real numbers and positive or negative infinity. Some system must be defined for rounding an arbitrary real number to a member of this subset, and the basic arithmetic operations add, subtract, multiply, divide, and natural exponent (base e) are defined by performing these operations on exact real values and rounding the result. The Power function (dyadic ⋆) is also used but need not be exactly rounded. A complex number is a value with two real number components, a real part and an imaginary part. A BQN implementation can either support real numbers only, or complex numbers.

An array is a rectangular collection of data. It is defined by a shape, which is a list of non-negative integer lengths, and a ravel, which is a list of elements whose length (the array's bound) is the product of all lengths in the shape. Arrays are defined inductively: any value (of a value or function type) can be used as an element of an array, but it is not possible for an array to contain itself as an element, or an array that contains itself, and so on. Types other than array are called atomic types, and their members are called atoms.

-- cgit v1.2.3