diff options
| author | Marshall Lochbaum <mwlochbaum@gmail.com> | 2021-03-15 16:59:32 -0400 |
|---|---|---|
| committer | Marshall Lochbaum <mwlochbaum@gmail.com> | 2021-03-15 16:59:32 -0400 |
| commit | 4f618598f2f31bc466343e4d304f35b53a366da6 (patch) | |
| tree | f645f16425939a45ac82fcd0b3b477f9030bc7d9 /spec | |
| parent | e858f41dffaee272ffcf4b2cb63a49ad25ebf7d7 (diff) | |
Finish namespace specification
Diffstat (limited to 'spec')
| -rw-r--r-- | spec/evaluate.md | 10 | ||||
| -rw-r--r-- | spec/grammar.md | 18 | ||||
| -rw-r--r-- | spec/scope.md | 8 | ||||
| -rw-r--r-- | spec/token.md | 6 | ||||
| -rw-r--r-- | spec/types.md | 7 |
5 files changed, 26 insertions, 23 deletions
diff --git a/spec/evaluate.md b/spec/evaluate.md index c5d99751..625192f4 100644 --- a/spec/evaluate.md +++ b/spec/evaluate.md @@ -10,7 +10,7 @@ Here we assume that the referent of each identifier, or equivalently the connect The result of parsing a valid BQN program is a `PROGRAM`, and the program is run by evaluating this term. -A `PROGRAM` or `BODY` is a list of `STMT`s (for `BODY`, the last must be an `EXPR`, a particular kind of `STMT`), which are evaluated in program order. The statement `EXPR` evaluates some APL code and possibly assigns the results, while `nothing` evaluates any `subject` or `Derv` terms it contains but discards the results. +A `PROGRAM` or `BODY` is a list of `STMT`s, which are evaluated in program order. A result is always required for `BODY` nodes, and sometimes for `PROGRAM` nodes (for example, when loaded with `β’Import`). If any identifiers in the node's scope are exported, or any of its statements is an `EXPORT`, then the result is the namespace created in order to evaluate the node. If a result is required but the namespace case doesn't apply, then the last `STMT` node must be an `EXPR` and its result is used. The statement `EXPR` evaluates some APL code and possibly assigns the results, while `nothing` evaluates any `subject` or `Derv` terms it contains but discards the results. An `EXPORT` statement performs no action. A block consists of several `BODY` terms, some of which may have an accompanying header describing accepted inputs and how they are processed. An immediate block `brImm` can only have one `BODY`, and is evaluated by evaluating the code in it. Other types of blocks do not evaluate any `BODY` immediately, but instead return a function or modifier that obtains its result by evaluating a particular `BODY`. The `BODY` is identified and evaluated once the block has received enough inputs (operands or arguments), which for modifiers can take one or two calls: if two calls are required, then on the first call the operands are simply stored and no code is evaluated yet. Two calls are required if there is more than one `BODY` term, if the `BODY` contains the special names `π¨π©π€πππ`, or if its header specifies arguments (the header-body combination is a `_mCase` or `_cCase_`). Otherwise only one is required. @@ -22,15 +22,15 @@ If there is no left argument, but the `BODY` contains `π¨` at the top level, t ### Assignment -An *assignment* is one of the four rules containing `ASGN`, other than `IMPORT`. It is evaluated by first evaluating the right-hand-side `subExpr`, `FuncExpr`, `_m1Expr`, or `_m2Exp_` expression, and then storing the result in the left-hand-side identifier or identifiers. The result of the assignment expression is the result of its right-hand side. Except for subjects, only a lone identifier is allowed on the left-hand side and storage sets it equal to the result. For subjects, *multiple assignment* with a list left-hand side is also allowed. Multiple assignment is performed recursively by assigning right-hand-side values to the left-hand-side targets, with single-identifier (`s`) assignment as the base case. When matching the right-hand side to a list left-hand side, the left hand side is treated as a list of `lhs` targets. The evaluated right-hand side must be a list (rank-1 array) of the same length, and is matched to these targets element-wise. +An *assignment* is one of the four rules containing `ASGN`. It is evaluated by first evaluating the right-hand-side `subExpr`, `FuncExpr`, `_m1Expr`, or `_m2Exp_` expression, and then storing the result in the left-hand-side identifier or identifiers. The result of the assignment expression is the result of its right-hand side. Except for subjects, only a lone identifier is allowed on the left-hand side and storage sets it equal to the result. For subjects, *destructuring assignment* is performed when an `lhs` is `lhsList` or `lhsStr`. Destructuring assignment is performed recursively by assigning right-hand-side values to the left-hand-side targets, with single-identifier assignment as the base case. -*Modified assignment* is the subject assignment rule `lhs Derv "β©" subExpr`. In this case, `lhs` should be evaluated as if it were a `subExpr` (the syntax is a subset of `subExpr`), and the result of the function application `lhs Derv subExpr` should be assigned to `lhs`, and is also the result of the modified assignment expression. +The right-hand-side value, here called `v`, in destructuring assignment must be a list (rank 1 array) or namespace. If it's a list, then each `LHS_ENTRY` node must be an `LHS_ELT`. The left-hand side is treated as a list of `lhs` targets, and matched to `v` element-wise, with an error if the two lists differ in length. If `v` is a namespace, then the left-hand side must be an `lhsStr` where every `LHS_ATOM` is an `LHS_NAME`, or an `lhsList` where every `LHS_ENTRY` is an `LHS_NAME` or `lhs "β" LHS_NAME`, so that it can be considered a list of `LHS_NAME` nodes some of which are also associated with `lhs` nodes. To perform the assignment, the value of each name is obtained from the namespace `v`, giving an error if `v` does not define that name. The value is assigned to the `lhs` node if present (which may be a destructuring assignment or simple subject assignment), and otherwise assigned to the same `LHS_NAME` node used to get it from `v`. -The `IMPORT` rule resembles a multiple assignment. However, in this case the values passed do not form a list but rather a module or namespace, which in this specification is not a value accessible to the programmer. To evaluate the `IMPORT` the `brNS` side is evaluated, then each inner variable mentioned in the `nsLHS` term is extracted and assigned to the corresponding outer identifier. Typically the two will both share the `LHS_NAME`, but if `β` is used in an `NS_VAR` then the `lhs` term refers to the outer identifier and `LHS_NAME` to the inner one. Since `IMPORT` is a statement and not an expression, it doesn't have a result value. +*Modified assignment* is the subject assignment rule `lhs Derv "β©" subExpr`. In this case, `lhs` should be evaluated as if it were a `subExpr` (the syntax is a subset of `subExpr`), and the result of the function application `lhs Derv subExpr` should be assigned to `lhs`, and is also the result of the modified assignment expression. ### Expressions -We now give rules for evaluating an `atom`, `Func`, `_mod1` or `_mod2_` expression (the possible options for `ANY`). A literal or primitive `sl`, `Fl`, `_ml`, or `_cl_` has a fixed value defined by the specification ([literals](literal.md) and [built-ins](primitive.md)). An identifier `s`, `F`, `_m`, or `_c_` is evaluated by returning its value; because of the scoping rules it must have one when evaluated. A parenthesized expression such as `"(" _modExpr ")"` simply returns the result of the interior expression. A braced construct such as `BraceFunc` is defined by the evaluation of the statements it contains after all parameters are accepted. Finally, a list `"β¨" β? ( ( EXPR β )* EXPR β? )? "β©"` or `ANY ( "βΏ" ANY )+` consists grammatically of a list of expressions. To evaluate it, each expression is evaluated in source order and their results are placed as elements of a rank-1 array. The two forms have identical semantics but different punctuation. +We now give rules for evaluating an `atom`, `Func`, `_mod1` or `_mod2_` expression (the possible options for `ANY`). A literal or primitive `sl`, `Fl`, `_ml`, or `_cl_` has a fixed value defined by the specification ([literals](literal.md) and [built-ins](primitive.md)). An identifier `s`, `F`, `_m`, or `_c_`, if not preceded by `atom "."`, must have an associated variable due to the scoping rules, and returns this variable's value, or causes an error if it has not yet been set. If it is preceded by `atom "."`, then the `atom` node is evaluated first; its value must be a namespace, and the result is the value of the identifier's name in the namespace, or an error if the name is undefined. A parenthesized expression such as `"(" _modExpr ")"` simply returns the result of the interior expression. A braced construct such as `BraceFunc` is defined by the evaluation of the statements it contains after all parameters are accepted. Finally, a list `"β¨" β? ( ( EXPR β )* EXPR β? )? "β©"` or `ANY ( "βΏ" ANY )+` consists grammatically of a list of expressions. To evaluate it, each expression is evaluated in source order and their results are placed as elements of a rank-1 array. The two forms have identical semantics but different punctuation. Rules in the table below are function and modifier evaluation. | L | Left | Called | Right | R | Types diff --git a/spec/grammar.md b/spec/grammar.md index e9af5f68..af1bb52d 100644 --- a/spec/grammar.md +++ b/spec/grammar.md @@ -17,10 +17,10 @@ A program is a list of statements. Almost all statements are expressions. Namesp Here we define the "atomic" forms of functions and modifiers, which are either single tokens or enclosed in paired symbols. Stranded vectors with `βΏ`, which binds more tightly than any form of execution, are also included. ANY = atom | Func | _mod1 | _mod2_ - _mod2_ = _c_ | _cl_ | "(" _m1Expr_ ")" | ( atom "." )? _brMod2_ - _mod1 = _m | _ml | "(" _m2Expr ")" | ( atom "." )? _brMod1 - Func = F | Fl | "(" FuncExpr ")" | ( atom "." )? BrFunc - atom = s | sl | "(" subExpr ")" | ( atom "." )? brSub | list + _mod2_ = ( atom "." )? _c_ | _cl_ | "(" _m1Expr_ ")" | _brMod2_ + _mod1 = ( atom "." )? _m | _ml | "(" _m2Expr ")" | _brMod1 + Func = ( atom "." )? F | Fl | "(" FuncExpr ")" | BrFunc + atom = ( atom "." )? s | sl | "(" subExpr ")" | brSub | list list = "β¨" β? ( ( EXPR β )* EXPR β? )? "β©" subject = atom | ANY ( "βΏ" ANY )+ @@ -49,7 +49,7 @@ Functions can be formed by fully applying modifiers or as trains. modifiers are FuncExpr = Train | F ASGN FuncExpr -Subject expressions are complicated by the possibility of list assignment. We also define nothing-statements, which have very similar syntax to subject expressions but do not permit assignment. +Subject expressions are complicated by the possibility of list and namespace assignment, which share the nodes `lhsList` and `lhsStr` and cannot be completely distinguished until execution. We also define nothing-statements, which have very similar syntax to subject expressions but do not permit assignment. arg = subExpr | ( subject | nothing )? Derv arg @@ -59,12 +59,10 @@ Subject expressions are complicated by the possibility of list assignment. We al LHS_ANY = LHS_NAME | lhsList LHS_ATOM = LHS_ANY | "(" lhsStr ")" LHS_ELT = LHS_ANY | lhsStr + LHS_ENTRY= LHS_ELT | lhs "β" LHS_NAME lhsStr = LHS_ATOM ( "βΏ" LHS_ATOM )+ - lhsList = "β¨" β? ( ( LHS_ELT β )* LHS_ELT β? )? "β©" - NS_VAR = ( lhs "β" )? LHS_NAME - lhsNs = LHS_NAME ( "βΏ" LHS_NAME )+ - | "β¨" β? ( ( NS_VAR β )* NS_VAR β? )? "β©" - lhs = s | lhsList | lhsStr | lhsNs + lhsList = "β¨" β? ( ( LHS_ENTRY β )* LHS_ENTRY β? )? "β©" + lhs = s | lhsList | lhsStr subExpr = arg | lhs ASGN subExpr | lhs Derv "β©" subExpr # Modified assignment diff --git a/spec/scope.md b/spec/scope.md index e1288cdb..b25c0ce0 100644 --- a/spec/scope.md +++ b/spec/scope.md @@ -28,14 +28,16 @@ The six special names are `π¨π©πππ€π£`, and the tokens `πππ ### Imports and exports -The names to be used in an `IMPORT`, that is, `LHS_NAME` terms in an `NS_VAR` or `nsLHS`, are variable references inside that `IMPORT`'s `brNS` term. If they appear without an accompanying `lhs "β"` term (in `NS_VAR`), then this is in addition to their role as identifiers within the actual enclosing scope, which works like any other assignment. These references behave as though they are at the end of the `brNS` term, that is, they "see" all definitions in the block. However, they must refer to identifiers that are *exported* by that block; references to any other variable cause an error much like references that have no definition. +Names that are preceded by an `atom "."` term, or that appear as `LHS_NAME` terms in an `NS_VAR` or `lhsNs`, are variable references in a namespace: in the first case, the result of the `atom` node, and in the second, of the overall assignments `subExpr` right hand side. These names do not follow lexical scoping; in general they must be stored in order to perform a name lookup when the namespace is available. Such a name in `lhsNs`, or in `NS_VAR` with no accompanying `lhs "β"` term, additionally serves as an identifier within the actual enclosing scope, which works like any other assignment. -An identifier is exported if the `ASGN` node in its definition is `"β"`, or if it appears anywhere in an `EXPORT` term. An identifier can only be exported in the scope where it is defined, and not in a containing scope. An `EXPORT` term that includes an identifier from such a scope causes an error. +An identifier is *exported* if the `ASGN` node in its definition is `"β"`, or if it appears anywhere in an `EXPORT` term. An identifier can only be exported in the scope where it is defined, and not in a containing scope. An `EXPORT` term that includes an identifier from such a scope causes an error. ## Variables A *variable* is an entity that permits two operations: it can be *set* to a particular value, and its *value* can be obtained, resulting in the last value it was set to. When either operation is performed it is referred to as *accessing* the variable. -When a body in a block is evaluated, a variable is created for each definition (that is, defined identifier instance) the body contains. Whenever another blockβthe block itself, not its contentsβis evaluated during the execution of the block, it is linked to the currently-evaluating block, so that it will use the variables defined in this instance. By following these links repeatedly, an instance of a block is always linked to exactly one instance of each block that contains it. These links form a tree that is not necessarily related to the call stack of functions and modifiers. Using the links, the variable an identifier refers to is the one corresponding to that variable's definition in the linked instance of the containing scope for the definition. +When a body in a block is evaluated, it creates a *namespace*, which contains a variable for each definition (that is, defined identifier instance) the body contains. Whenever another blockβthe block itself, not its contentsβis evaluated during the execution of the block, it is linked to the currently-evaluating block, so that it will use the variables defined in this instance. By following these links repeatedly, an instance of a block is always linked to exactly one instance of each block that contains it. These links form a tree that is not necessarily related to the call stack of functions and modifiers. Using the links, the variable an identifier refers to is the one corresponding to that variable's definition in the linked instance of the containing scope for the definition. The first access to a variable must be made by its definition (this also means it sets the variable). If a different instance of its identifier accesses it first, then an error results. This can happen because every scope contained in a particular scope sees all the definitions it uses, and such a scope could be called before the definition is run. Because of conditional execution, this property must be checked at run time in general; however, in cases where it is possible to statically determine that a program will always violate it, a BQN instance can give an error at compile time rather than run time. + +A namespace defines a mapping from names to variables: if the given name is shared by an exported identifier in the body used to create that namespace, then that name maps to the variable corresponding to that identifier. The mapping is undefined for other names. diff --git a/spec/token.md b/spec/token.md index 91608c47..d97a6a73 100644 --- a/spec/token.md +++ b/spec/token.md @@ -10,7 +10,7 @@ A BQN *character literal* consists of a single character between single quotes, A comment consists of the hash character `#` and any following text until (not including) the next newline character. The initial `#` must not be part of a string literal started earlier. Comments are ignored entirely and do not form tokens. -Identifiers and numeric literals share the same token formation rule. These tokens are formed from the *numeric characters* `Β―βΟ.0123456789` and *alphabetic characters* `_abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ` and the oddball `π£`. Any sequence of these characters adjacent to each other forms a single token, which is a *numeric literal* if it begins with a numeric character and an *identifier* if it begins with an alphabetic character. If a token begins with an underscore then its first non-underscore character must be alphabetic: for example, `_99` is not a valid token. Numeric literals are also subject to [numeric literal rules](literal.md), which specify which numeric literals are valid and which numbers they represent. If the token contains `π£` it must be either `π£`, `_π£`, or `_π£_` and is considered a special name (see below). As the value taken by this identifier can only be a modifier, the uppercase character `β` is not allowed. +Identifiers and numeric literals share the same token formation rule. These tokens are formed from the *numeric characters* `Β―βΟ0123456789` and *alphabetic characters* `_abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ` and the oddball `π£`. Additionally, `.` is considered a numeric character if it is followed immediately by a digit (`0123456789`); otherwise it forms its own token. Any sequence of these characters adjacent to each other forms a single token, which is a *numeric literal* if it begins with a numeric character and an *identifier* if it begins with an alphabetic character. If a token begins with an underscore then its first non-underscore character must be alphabetic: for example, `_99` is not a valid token. Numeric literals are also subject to [numeric literal rules](literal.md), which specify which numeric literals are valid and which numbers they represent. If the token contains `π£` it must be either `π£`, `_π£`, or `_π£_` and is considered a special name (see below). As the value taken by this identifier can only be a modifier, the uppercase character `β` is not allowed. Following this step, the whitespace characters space and tab are ignored, and do not form tokens. Only these whitespace characters, and the newline character, which does form a token, are allowed. @@ -19,10 +19,10 @@ Otherwise, a single character forms a token. Only the specified set of character | Class | Characters |-----------------------|------------ | Null literal | `@` -| Primitive Function | `+-ΓΓ·ββββ\|Β¬β§β¨<>β =β€β₯β‘β’β£β’β₯βΎββββ«»β½β/ββββββββ·β` +| Primitive Function | `+-ΓΓ·ββββ\|Β¬β§β¨<>β =β€β₯β‘β’β£β’β₯βΎββββ«»β½β/ββββββββ·β!` | Primitive 1-Modifier | `` ΛΛΛΒ¨ββΌΒ΄Λ` `` | Primitive 2-Modifier | `βββΈββΎββΆβββ` | Special name | `π¨π©πππ€πππ½πΎπ` -| Punctuation | `βββ©β(){}β¨β©βΏβ,` and newline +| Punctuation | `βββ©β(){}β¨β©βΏβ,.` and newline In the BQN [grammar specification](grammar.md), the three primitive classes are grouped into terminals `Fl`, `_ml`, and `_cl`, while the punctuation characters are identified separately as keywords such as `"β"`. The special names are handled specially. The uppercase versions `πππ½πΎπ` and lowercase versions `π¨π©πππ€` are two spellings of the five underlying inputs and function. diff --git a/spec/types.md b/spec/types.md index eb9b8a1f..4549fd04 100644 --- a/spec/types.md +++ b/spec/types.md @@ -2,19 +2,22 @@ # Specification: BQN types -BQN programs manipulate data of six types: +BQN programs manipulate data of seven types: - Character - Number - Array - Function - 1-Modifier - 2-Modifier +- Namespace -Of these, the first three are considered *data types* and the remaining three *operation types*. We first describe the operation types; the remainder of this page will be dedicated to the data types. A member of any operation type accepts some number of *inputs* and either returns a *result* or causes an error; inputs and the result are values of any type. When a function is given inputs (*called*), it may produce side effects before returning, such as manipulating variables and calling other functions within its scope, or performing I/O. +Of these, the first three are considered *data types* and the next three *operation types*. We first describe the operation types and the namespace; the remainder of this page will be dedicated to the data types. A member of any operation type accepts some number of *inputs* and either returns a *result* or causes an error; inputs and the result are values of any type. When a function is given inputs (*called*), it may produce side effects before returning, such as manipulating variables and calling other functions within its scope, or performing I/O. - A *function* takes one (monadic call) or two (dyadic call) *arguments*. - A *1-modifier* takes one *operand*. - A *2-modifier* takes two *operands*. +A namespace holds the variables used to evaluate a block or program, as defined in the [scoping rules](scope.md). The observable aspects of a namespace are that it can be compared for equality with other namespaces and that it exposes variables associated with certain names, whose values can be queried or set. + To begin the data types, a *character* is a [Unicode](https://en.wikipedia.org/wiki/Unicode) code point, that is, its value is a non-negative integer within the ranges defined by Unicode (however, it is distinct from this number as a BQN value). Characters are ordered by this numeric value. BQN deals with code points as abstract entities and does not expose encodings such as UTF-8 or UTF-16. The precise type of a *number* may vary across BQN implementations or instances. A *real number* is a member of some supported subset of the [extended real numbers](https://en.wikipedia.org/wiki/Extended_real_number_line), that is, the real numbers and positive or negative infinity. Some system must be defined for rounding an arbitrary real number to a member of this subset, and the basic arithmetic operations add, subtract, multiply, divide, and natural exponent (base *e*) are defined by performing these operations on exact real values and rounding the result. The Power function (dyadic `β`) is also used but need not be exactly rounded. A *complex number* is a value with two real number *components*, a *real part* and an *imaginary part*. A BQN implementation can either support real numbers only, or complex numbers. |
