From 2afb23928e1984d475cc460e1672e8f6fa0e4dbe Mon Sep 17 00:00:00 2001 From: Marshall Lochbaum Date: Wed, 11 Aug 2021 17:21:31 -0400 Subject: Allow clicking on header to get fragment link --- docs/spec/evaluate.html | 8 ++++---- docs/spec/grammar.html | 2 +- docs/spec/index.html | 2 +- docs/spec/inferred.html | 30 +++++++++++++++--------------- docs/spec/literal.html | 2 +- docs/spec/primitive.html | 34 +++++++++++++++++----------------- docs/spec/scope.html | 12 ++++++------ docs/spec/system.html | 22 +++++++++++----------- docs/spec/token.html | 2 +- docs/spec/types.html | 2 +- 10 files changed, 58 insertions(+), 58 deletions(-) (limited to 'docs/spec') diff --git a/docs/spec/evaluate.html b/docs/spec/evaluate.html index 044f534c..2d1e5ce9 100644 --- a/docs/spec/evaluate.html +++ b/docs/spec/evaluate.html @@ -4,22 +4,22 @@ Specification: BQN evaluation -

Specification: BQN evaluation

+

Specification: BQN evaluation

This page describes the semantics of the code constructs whose grammar is given in grammar.md. The formation rules there are not named, and here they are identified by either the name of the term or by copying the rule entirely if there are several alternative productions.

Here we assume that the referent of each identifier, or equivalently the connections between identifiers, have been identified according to the scoping rules.

Evaluation is an ordered process, and any actions required to evaluate a node always have a specified order unless performing them in any order would have the same effect. Side effects that are relevant to ordering are setting and getting the value of a variable, causing an error, and returning (with β†’) from a block. Errors described in this page are "evaluation errors" and can be caught by the Catch (⎊) modifier. For caught errors and returns, evaluation halts without attempting to complete any in-progress node, and is restarted by Catch (for errors) or at the end of the appropriate block evaluation (for returns).

-

Programs and blocks

+

Programs and blocks

The result of parsing a valid BQN program is a PROGRAM, and the program is run by evaluating this term.

A PROGRAM or BODY is a list of STMTs, which are evaluated in program order. A result is always required for BODY nodes, and sometimes for PROGRAM nodes (for example, when loaded with β€’Import). If any identifiers in the node's scope are exported, or any of its statements is an EXPORT, then the result is the namespace created in order to evaluate the node. If a result is required but the namespace case doesn't apply, then the last STMT node must be an EXPR and its result is used. The statement EXPR evaluates some APL code and possibly assigns the results, while nothing evaluates any subject or Derv terms it contains but discards the results. An EXPORT statement performs no action.

A block consists of several BODY terms, some of which may have an accompanying header describing accepted inputs and how they are processed. An immediate block brImm can only have one BODY, and is evaluated by evaluating the code in it. Other types of blocks do not evaluate any BODY immediately, but instead return a function or modifier that obtains its result by evaluating a particular BODY. The BODY is identified and evaluated once the block has received enough inputs (operands or arguments), which for modifiers can take one or two calls: if two calls are required, then on the first call the operands are simply stored and no code is evaluated yet. Two calls are required if there is more than one BODY term, if the BODY contains the special names π•¨π•©π•€π•Žπ•π•Š, or if its header specifies arguments (the header-body combination is a _mCase or _cCase_). Otherwise only one is required.

To evaluate a block when enough inputs have been received, first the correct case must be identified. To do this, first each special case (FCase, _mCase, or _cCase_), excluding FCase nodes containing UndoHead, is checked in order to see if its arguments are strucurally compatible with the given arguments. That is, is headW is a subject, there must be a left argument matching that structure, and if headX is a subject, the right argument must match that structure. This means that 𝕨 not only matches any left argument but also no argument. The test for compatibility is the same as for multiple assignment described below, except that the header may contain constants, which must match the corresponding part of the given argument. If no special case matches, then an appropriate general case (FMain, _mMain, or _cMain_) is used: if there are two, the first is used with no left argument and the second with a left argument; if there are one, it is always used, and if there are none, an error results.

The only remaining step before evaluating the BODY is to bind the inputs and other names. Special names are always bound when applicable: 𝕨𝕩𝕀 if arguments are used, 𝕨 if there is a left argument, π•—π•˜ if operands are used, and _𝕣 and _𝕣_ for modifiers and combinators, respectively. Any names in the header are also bound, allowing multiple assignment for arguments.

If there is no left argument, but the BODY contains 𝕨 or π•Ž at the top level, then it is conceptually re-parsed with 𝕨 replaced by Β· to give a monadic version before application; this modifies the syntax tree by replacing some instances of subject, arg, or Operand with nothing. The token π•Ž is not allowed in this case and causes an error. Re-parsing 𝕨 can also cause an error if it's used as an operand or list element, where nothing is not allowed by the grammar. Note that these errors must not appear if the block is always called with two arguments. True re-parsing is not required, as the same effect can also be achieved dynamically by treating Β· as a value and checking for it during execution. If it's used as a left argument, then the function should instead be called with no left argument (and similarly in trains); if it's used as a right argument, then the function and its left argument are evaluated but rather than calling the function Β· is "returned" immediately; and if it's used in another context then it causes an error.

-

Assignment

+

Assignment

An assignment is one of the four rules containing ASGN. It is evaluated by first evaluating the right-hand-side subExpr, FuncExpr, _m1Expr, or _m2Exp_ expression, and then storing the result in the left-hand-side identifier or identifiers. The result of the assignment expression is the result of its right-hand side. Except for subjects, only a lone identifier is allowed on the left-hand side and storage sets it equal to the result. For subjects, destructuring assignment is performed when an lhs is lhsList or lhsStr. Destructuring assignment is performed recursively by assigning right-hand-side values to the left-hand-side targets, with single-identifier assignment as the base case.

The right-hand-side value, here called v, in destructuring assignment must be a list (rank 1 array) or namespace. If it's a list, then each LHS_ENTRY node must be an LHS_ELT. The left-hand side is treated as a list of lhs targets, and matched to v element-wise, with an error if the two lists differ in length. If v is a namespace, then the left-hand side must be an lhsStr where every LHS_ATOM is an LHS_NAME, or an lhsList where every LHS_ENTRY is an LHS_NAME or lhs "⇐" LHS_NAME, so that it can be considered a list of LHS_NAME nodes some of which are also associated with lhs nodes. To perform the assignment, the value of each name is obtained from the namespace v, giving an error if v does not define that name. The value is assigned to the lhs node if present (which may be a destructuring assignment or simple subject assignment), and otherwise assigned to the same LHS_NAME node used to get it from v.

Modified assignment is the subject assignment rule lhs Derv "↩" subExpr. In this case, lhs should be evaluated as if it were a subExpr (the syntax is a subset of subExpr), and the result of the function application lhs Derv subExpr should be assigned to lhs, and is also the result of the modified assignment expression.

-

Expressions

+

Expressions

We now give rules for evaluating an atom, Func, _mod1 or _mod2_ expression (the possible options for ANY). A literal or primitive sl, Fl, _ml, or _cl_ has a fixed value defined by the specification (literals and built-ins). An identifier s, F, _m, or _c_, if not preceded by atom ".", must have an associated variable due to the scoping rules, and returns this variable's value, or causes an error if it has not yet been set. If it is preceded by atom ".", then the atom node is evaluated first; its value must be a namespace, and the result is the value of the identifier's name in the namespace, or an error if the name is undefined. A parenthesized expression such as "(" _modExpr ")" simply returns the result of the interior expression. A braced construct such as BraceFunc is defined by the evaluation of the statements it contains after all parameters are accepted. Finally, a list "⟨" β‹„? ( ( EXPR β‹„ )* EXPR β‹„? )? "⟩" or ANY ( "β€Ώ" ANY )+ consists grammatically of a list of expressions. To evaluate it, each expression is evaluated in source order and their results are placed as elements of a rank-1 array. The two forms have identical semantics but different punctuation.

A Return node creates a return function. As discussed in the scoping rules, its identifier indicates a namespace from a particular block evaluation. When called, the function causes an error if that block has finished execution, or if the call includes a left argument 𝕨. Otherwise, evaluation stops immediately, and resumes at the end of the block where it returns the right argument 𝕩 from that block.

Rules in the table below are function and modifier evaluation.

diff --git a/docs/spec/grammar.html b/docs/spec/grammar.html index 3ad5bd25..da83d87c 100644 --- a/docs/spec/grammar.html +++ b/docs/spec/grammar.html @@ -4,7 +4,7 @@ Specification: BQN grammar -

Specification: BQN grammar

+

Specification: BQN grammar

BQN's grammar is given below. Terms are defined in a BNF variant. However, handling special names properly is possible but difficult in BNF, so they are explained in text along with the braced block grammar.

The symbols s, F, _m, and _c_ are identifier tokens with subject, function, 1-modifier, and 2-modifier classes respectively. Similarly, sl, Fl, _ml, and _cl_ refer to literals and primitives of those classes. While names in the BNF here follow the identifier naming scheme, this is informative only: syntactic roles are no longer used after parsing and cannot be inspected in a running program.

A program is a list of statements. Almost all statements are expressions. Namespace export statements, and valueless results stemming from Β·, or 𝕨 in a monadic brace function, can be used as statements but not expressions.

diff --git a/docs/spec/index.html b/docs/spec/index.html index b8f1d80e..fa8ad8c9 100644 --- a/docs/spec/index.html +++ b/docs/spec/index.html @@ -4,7 +4,7 @@ BQN specification -

BQN specification

+

BQN specification

This document, and the others in this directory (linked in the list below) make up the pre-versioning BQN specification. The specification differs from the documentation in that its purpose is only to describe the exact details of BQN's operation in the most quickly accessible way, rather than to explain the central ideas of BQN functionality and how it might be used. The core of BQN, which excludes system-provided values, is now almost completely specified. One planned featureβ€”an extension to allow low-rank elements in the argument to Joinβ€”has not yet been added, and the spec will continue to be edited further to improve clarity and cover any edge cases that have been missed.

Under this specification, a language implementation is a BQN pre-version implementation if it behaves as specified for all input programs. It is a BQN pre-version implementation with extensions if it behaves as specified in all cases where the specification does not require an error, but behaves differently in at least one case where it requires an error. It is a partial version of either of these if it doesn't conform to the description but differs from a conforming implementation only by rejecting with an error some programs that the conforming implementation accepts. As the specification is not yet versioned, other instances of the specification define these terms in different ways. An implementation can use one of these terms if it conforms to any instance of the pre-versioning BQN specifications that defines them. When versioning is begun, there will be only one specification for each version.

The following documents are included in the BQN specification. A BQN program is a sequence of Unicode code points: to evaluate it, it is converted into a sequence of tokens using the token formation rules, then these tokens are arranged in a syntax tree according to the grammar, and then this tree is evaluated according to the evaluation semantics. The program may be evaluated in the presence of additional context such as a filesystem or command-line arguments; this context is presented to the program and manipulated through the system-provided values.

diff --git a/docs/spec/inferred.html b/docs/spec/inferred.html index c25ad29c..c0deb923 100644 --- a/docs/spec/inferred.html +++ b/docs/spec/inferred.html @@ -4,11 +4,11 @@ Specification: BQN inferred properties -

Specification: BQN inferred properties

+

Specification: BQN inferred properties

BQN includes some simple deductive capabilities: detecting the type of empty array elements, the result of an empty reduction, and the Undo (⁼) and Under (⌾) modifiers. These tasks are a kind of proof-based or constraint programming, and can never be solved completely (some instances will be undecidable) but can be solved in more instances by ever-more sophisticated algorithms. To allow implementers to develop more advanced implementations while offering some stability and portability to programmers, two kinds of specification are given here. First, constraints are given on the behavior of inferred properties. These are not exact and require some judgment on the part of the implementer. Second, behavior for common or useful cases is specified more precisely. Non-normative suggestions are also given as a reference for implementers.

For the specified cases, the given functions and modifiers refer to those particular representations. It is not necessary to detect equivalent representations, for example to reduce (+-Γ—)⁼ to ∨⁼. However, it is necessary to identify computed functions and modifiers: for example F⁼ when the value of F in the expression is ∨, or (1βŠ‘βˆ§β€Ώβˆ¨)⁼.

Failing to compute an inferred property for a function or array as it's created cannot cause an error. An error can only be caused when the missing inferred property is needed for a computation.

-

Identities

+

Identities

When monadic Fold (Β΄) or Insert (˝) is called on an array of length 0, BQN attempts to infer a right identity value for the function in order to determine the result. A right identity value for a dyadic function 𝔽 is a value r such that e≑e𝔽r for any element e in the domain. For such a value r, the fold r 𝔽´ l is equivalent to 𝔽´ l for a non-empty list l, because the first application (Β―1βŠ‘l) 𝔽 r gives Β―1βŠ‘l, which is the starting point when no initial value is given. It's thus reasonable to define 𝔽´ l to be r 𝔽´ l for an empty list l as well, giving a result r.

For Fold, the result of 𝔽´ on an empty list is defined to be a right identity value for the range of 𝔽, if exactly one such value exists. If an identity can't be proven to uniquely exist, then an error results.

For Insert, 𝔽˝ on an array of length 0 is defined similarly, but also depends on the cell shape 1↓≒𝕩. The required domain is the arrays of that shape that also lie in the range of 𝔽 (over arbitrary arguments, not shape-restricted ones). Furthermore, an identity may be unique among all possible arguments as in the case of Fold, or it may be an array with shape 1↓≒𝕩 and be unique among arrays with that shape. For example, with cell shape 3β€Ώ2, all of 0, 2β₯Š0, and 3β€Ώ2β₯Š0 are identities for +, but 3β€Ώ2β₯Š0 can be used because it is the only indentity with shape 3β€Ώ2, while the other identities aren't unique and can't be used.

@@ -68,11 +68,11 @@

Additionally, the identity of ∾˝ must be recognized: if 0=≠𝕩 and 1<=𝕩, then βˆΎΛπ•© is (0∾2↓≒𝕩)β₯Šπ•©. If 1==𝕩, then there is no identity element, as the result of ∾ always has rank at least 1, but the cell rank is 0.

-

Fill elements

+

Fill elements

Any BQN array can have a fill element, which is a sort of "default" value for the array. The reference implementations use Fill to access this element, and it is used primarily for Take (↑), First (βŠ‘), and Nudge (Β«, Β»). One way to extract the fill element of an array a in BQN is βŠ‘0β₯Ša.

A fill element can be either 0, ' ', or an array of valid fill elements. If the fill element is an array, then it may also have a fill element (since it is an ordinary BQN array). The fill element is meant to describe the shared structure of the elements of an array: for example, the fill element of an array of numbers should be 0, while the fill element for an array of variable-length lists should probably be ⟨⟩. However, the fill element, unlike other inferred properties, does not satisfy any particular constraints that relate it to its array. The fill element of a primitive's result, including functions derived from primitive modifiers, must depend only on its inputs.

In addition to the requirements below, the fill element for the value of a string literal is ' '.

-

Required functions

+

Required functions

Combinators ⊣⊒!Λ™ΛœΒ΄Λβˆ˜β—‹βŠΈβŸœβŠ˜β—ΆβŸ do not affect fill element computation: if the combinator calls a function that computes a fill element, then that fill element must be retained if the result is passed to other functions or returned. ⍟ constructs arrays if its right operand is or contains arrays, and the fill elements of these arrays are not specified; converting 𝕩 to a fill element is a reasonable choice in some cases but not others.

Arithmetic primitivesβ€”all valences of +-Γ—Γ·β‹†βˆšβŒŠβŒˆ|Β¬ and dyadic ∧∨<>β‰ =≀β‰₯β€”obtain their fill elements by applying to the fill elements of the arguments. If this is an error, there is no fill element; otherwise, the fill element is the result, with all numbers in it changed to 0 and all characters changed to ' '.

Fill elements for many primitives are given in the table below. The "Fill" column indicates the strategy used to compute the result's fill. Fields 0, 𝕩, 0↑𝕩, and 0βš‡0𝕩 indicate the fill directly, while ⊒ and ∩ indicate that the fill is to be computed from the argument fills (if not all arguments have fills, then the fill element is unspecified). For ⊒, the fill element of the result is the fill element of 𝕩. For ∩, the fill is equal to the fill values for multiple arrays, provided that they are all equal (it's unspecified if they are not all equal). In the two argument case, these arrays are 𝕨 and 𝕩. In the one-argument case, they are the elements of 𝕩; however, if 𝕩 is empty, then the result's fill is the fill of the fill of 𝕩.

@@ -120,11 +120,11 @@

For Group and Group Indices (βŠ”), the fill element of the result and its elements are both specified: the fill element of each element of the result is the same as that of 𝕩 for Group, and is 0 for Group Indices. The fill element of the result is (0βš‡1𝕨)↑𝕩 for Group, and β₯ŠβŸœ<0βš‡1𝕩 for Group Indices.

Fill elements of iteration modifiers such as ¨⌜ are not specified. It is reasonable to define the fill element of π”½βŒœ or 𝔽¨ to be 𝔽 applied to the fill elements of the arguments. Regardless of definition, computing the fill element cannot cause side effects or an error.

-

Undo

+

Undo

The Undo 1-modifier ⁼, given an operand 𝔽 and argument 𝕩, and possibly a left argument 𝕨, finds a value y such that 𝕩≑𝕨𝔽y, that is, an element of the pre-image of 𝕩 under 𝔽 or π•¨π”½βŠ’. Thus it satisfies the constraint 𝕩 ≑ 𝕨𝔽𝕨𝔽⁼𝕩 (π•¨π”½βΌβŠ’ is a right inverse of π•¨π”½βŠ’) provided 𝔽⁼ and 𝔽 both complete without error. 𝔽⁼ should of course give an error if no inverse element exists, and can also fail if no inverse can be found. It is also preferred for 𝔽⁼ to give an error if there are many choices of inverse with no clear way to choose one of them: for example, 0β€Ώ0⍉m returns the diagonal of matrix m; 0β€Ώ0⍉⁼2β€Ώ3 requires values to be chosen for the off-diagonal elements in its result. It is better to give an error, encouraging the programmer to use a fully-specified approach like 2β€Ώ3⌾(0β€Ώ0βŠΈβ‰) applied to a matrix of initial elements, than to return a result that could be very different from other implementations.

When working with limited-precision numbers, it may be difficult or impossible to exactly invert the operand function. Instead, it is generally acceptable to perform a computation that, if done with unlimited precision, would exactly invert 𝔽 computed with unlimited precision. This principle is the basis for the numeric inverses specified below. It is also acceptable to find an inverse by numeric methods, provided that the error in the inverse value found relative to an unlimited-precision inverse can be kept close to the inherent error in the implementation's number format.

Regardless of which cases for Undo are supported, the result of a call, and whether it is an error, must depend only on the values of the inputs 𝔽, 𝕩, and (if present) 𝕨.

-

Required functions

+

Required functions

Function inverses are given for one or two arguments, with cases where inverse support is not required left blank.

For arithmetic functions the implementations below may in some cases not give the closest inverse (that is, there may be some other y so that F y is closer to x than F F⁼x). Even in these cases the exact functions given below must be used.

@@ -247,7 +247,7 @@
-

Optional functions

+

Optional functions

Several primitives are easily and uniquely undone, but doing so is not important for BQN programming. These primitives are listed below along with suggested algorithms to undo them. Unlike the implementations above, these functions are not valid in all cases, and the inputs must be validated or the results checked in order to use them.

@@ -300,7 +300,7 @@
-

Required modifiers

+

Required modifiers

The following cases of Self/Swap must be supported.

@@ -464,9 +464,9 @@
-

Undo headers

+

Undo headers

An UndoHead header specifies how a block function acts when undone. Like ordinary headers, undo headers are searched for a match when a block function F is undone, or when F˜ is undone with two arguments (including the two modifier cases π”½βŸœk and 𝔽𝔾k from the previous section). An UndoHead without "˜" matches the F⁼ case while one with "˜" matches the F˜⁼ case. The left and right arguments are matched to headW and headX as with ordinary headers, and the first matching case is evaluated to give the result of the Undo-derived function.

-

Under

+

Under

The Under 2-modifier ⌾ conceptually applies its left operand under the action of its right operand. Setting zβ†π•¨π”½βŒΎπ”Ύπ•©, it satisfies (𝕨𝔽○𝔾𝕩) ≑ 𝔾z. We might say that 𝔾 transforms values to a new domain, and βŒΎπ”Ύ lifts actions 𝔽 performed in this domain to the original domain of values. For example, addition in the logarithmic domain corresponds to multiplication in the linear domain: +⌾(⋆⁼) is Γ— (but less precise if computed in floating point).

Let v←𝕨𝔽○𝔾𝕩, so that v≑𝔾z. v is of course well-defined, so the inference step is to find z based on v and possibly the original inputs. We distinguish three cases for Under:

When implementing, there is no need to implement invertable Under specially: it can be handled as part of the structural and computation cases.

-

Mathematical definition of structural Under

+

Mathematical definition of structural Under

In general, structural Under requires information from the original right argument to be computed. Here we will define the structural inverse of structural function 𝔾 on v into 𝕩, where 𝕩 gives this information. The value π•¨π”½βŒΎπ”Ύπ•© is then the structural inverse of 𝔾 on 𝕨𝔽○𝔾𝕩 into 𝕩.

We define a structure to be either the value Β· or an array of structures (substitute 0 or any other specific value for Β· if you'd like structures to be a subset of BQN arrays; the value is irrelevant). A given structure s captures a BQN value or structure 𝕩 if it is Β·, or if s and 𝕩 are arrays of the same shape, and each element of s captures the corresponding element of 𝕩. Thus a structure shares some or all of the structural information in arrays it captures, but none of the data.

A structure transformation consists of an initial structure s and a result structure t, as well as a relation between the two: each instance of Β· in t is assigned the location of an instance of Β· in s. If s captures a value 𝕩, we say that the structural transformation captures 𝕩 as well. Given such a value 𝕩, the transformation is applied to 𝕩 by replacing each Β· in t with the corresponding value from 𝕩, found by taking the same location in 𝕩 as the one in s given by the transformation.

@@ -488,10 +488,10 @@

Following this analysis, z can be constructed by replacing each instance of Β· in s with the component of 𝕩 or v indicated, and it follows that z is well-defined if it existsβ€”and it exists if and only if t captures v and values in v that correspond to the same position in s have the same value.

A structural function decomposition is a possibly infinite family of structure transformations such that any possible BQN value is captured by at most one of these transformations. It can be applied to any value: if some transformation captures the value, then apply that transformation, and otherwise give an error. A function is a structural function if there is a structural function decomposition that matches it: that is, for any input either both functions give an error or the results match.

For a structural function 𝔾, the structural inverse of 𝔾 on v into 𝕩 is the inverse of G on v into 𝕩, where G is the structure transformation that captures 𝕩 from some structural function decomposition Gd matching 𝔾. If no decomposition has an initial structural matching 𝕩 then the structural inverse does not exist.

-

Well-definedness

+

Well-definedness

In order to show that the structural inverse of a structural function is well-defined, we must show that it does not depend on the choice of structural function decomposition. That is, for a given 𝕩, if G and H are structure transformations from different decompositions of 𝔾 both capturing 𝕩, then the structural inverse of G on v into 𝕩 matches that of H on v into 𝕩. Call these inverses y and z. Now begin by supposing that H captures y and G captures z; we will show this later. From the definition of a structural inverse, v≑G y, so that v≑𝔾 y, and because H captures y we know that 𝔾 y is H y, so we have v≑H y as well. Let S w indicate the set of all structure transformations F such that w ≑○F 𝕩 (this is not a BQN value, both because it is a set and because it's usually infinite): from the definition of z we know that S z is a strict superset of S w for any w other than z with v≑H w. It follows that either y≑z or S y is a strict subset of S z. By symmetry the same relation holds exchanging y and z, but it's not possible for S y to be a strict subset of S z and vice-versa. The only remaining possibility is that y≑z.

We now need to show that H captures y (the proof that G captures z is of course the same as H and G are symmetric). To do this we must show that any array in the initial structure of H corresponds to a matching array in y. For convenience, we will call the initial structures of the two transformations iG and iH, and the final structures fG and fH, and use the notation pβŠ‘a to indicate the value of array a at position p. Choose the position of an array in H, and assume by induction that each array containing it already has the desired property; this implies that this position exists in y as well although we know nothing about its contents. G captures y, so iG is Β· at this position or some parent position; call this position in iG p. There are now two cases: either G makes use of this pβ€”at least one position in fG corresponds to itβ€”or it doesn't. If it doesn't, then the contents of y at p are the same as those of 𝕩. Since H captures 𝕩, iH matches 𝕩 and hence y as well at p. If it does, then let s be a position in fG that corresponds to p (if there are multiple possibilities, choose one). From v≑G y, we know that sβŠ‘v matches pβŠ‘y. We know that fH captures v, so that sβŠ‘fH captures sβŠ‘v, or pβŠ‘y. But we can show that the value of sβŠ‘fH is the same as pβŠ‘iH, which would prove that H captures y at p. To show this, construct an array xp by replacing the value of 𝕩 at p with pβŠ‘iH (to be more careful in our handling of types, we might replace every Β· with some value that never appears in 𝕩). Both H and G capture xp: clearly they capture it outside p, while at p itself, iG is Β· and iH is equal to pβŠ‘xp. Now (H xp)≑(G xp) because both functions match 𝔾 on their domains. Therefore sβŠ‘H xp matches sβŠ‘G xp, which by the definition of s matches pβŠ‘xp, which matches pβŠ‘iH. But sβŠ‘H xp comes from replacing each atom in sβŠ‘fH with an atom in xp that's captured by a Β· in iH. Because it matches pβŠ‘iH, every atom in sβŠ‘H xp is Β·, but the only instances of Β· in xp come from our inserted copy of pβŠ‘iH and each is immediately captured by the corresponding Β· in iH. It follows that sβŠ‘H xp, and consequently sβŠ‘fH, is exactly pβŠ‘iH, completing the proof.

-

Required structural inverses

+

Required structural inverses

The following primitive functions be fully supported by structural Under. Each manipulates its right argument structurally.

@@ -570,7 +570,7 @@
-

A structural Under algorithm

+

A structural Under algorithm

This section offers the outline for a procedure that computes most structural inverses that a programmer would typically use. The concept is to build a special result array whose elements are not BQN values but instead indicate positions within the initial argument. This structural array is applied to the initial argument by replacing its elements with the values at those positions, and inverted by placing elements back in the original array at these indices, checking for any conflicts. If operations like dyadic ∾ are allowed, then a structural array might have some indices that are prefixes or parents of others, making it slightly different from a structural transformation as defined above (although it could be represented as a structural transformation by expanding some of these). This requires additional checking to ensure that elements of previously inserted elements can't be modified.

Structural functions can be applied to structural arrays directly, after ensuring that they have the necessary depth as given below. An array's depth can be increased by expanding each position in it into an array of child positions, or, if that position contains an atom and the structural function in question would tolerate an atom, enclosing it.

@@ -610,7 +610,7 @@

Not all primitives in the table above are required. Of note are =β‰ β‰’, which accept a structural array but return an ordinary value; this might be used as a left argument later. If the final result is not structural, then the function in question can't be structural, and the attempt to find a structural inverse can be aborted.

-

Non-structural case

+

Non-structural case

The behavior of invertible and computational Under is fully dependent on that of Undo, and does not need to be repeated here. However, it is important to discuss when this definition can be applied: specifically, either

This means that block instance equality indicates identity in the context of mutability: two block instances are equal if any change of state in one would be reflected in the other as well. The concept of identity holds even if the blocks in question have no way of changing or accessing state. For example, =β—‹{𝕩⋄{𝕩}}˜@ is 0 while =Λœβ—‹{𝕩⋄{𝕩}}@ is 1.

-

Array functionality

+

Array functionality

Several subsets of primitives, or dedicated operations, are used to manipulate arrays in the reference implementation.

-

Inferred functionality

+

Inferred functionality

Inferred properties are specified in their own document, not in the reference implementation.

-

Other provided functionality

+

Other provided functionality

-

Commentary on other primitives

+

Commentary on other primitives

As noted above, see reference.bqn for the authoritative definitions. Commentary here gives an overall description and highlights implementation subtleties and edge cases.

-

Combinators

+

Combinators

There's little to say about BQN's true combinators, since each is simply a pattern of function application. All primitive combinators use their operands as functions, and thus treat a data operand as a constant function.

The somewhat complicated definition of Valences could be replaced with {𝔽𝕩;𝕨𝔾𝕩} using headers. However, reference.bqn uses a simple subset of BQN's syntax that doesn't include headers. Instead, the definition relies on the fact that 𝕨 works like Β· if no left argument is given: (1˙𝕨)-0 is 1-0 or 1 if 𝕨 is present and (1Λ™Β·)-0 otherwise: this reduces to Β·-0 or 0.

-

Array properties

+

Array properties

The reference implementations extend Shape (β‰’) to atoms as well as arrays, in addition to implementing other properties. In all cases, an atom behaves as if it has shape ⟨⟩. The functions in this section never cause an error.

-

Arithmetic

+

Arithmetic

Arithmetic functions not already provided are defined in layer 1. These definitions, like the provided functions, apply to atoms only; they should be extended to arrays using the _perv modifier from layer 2.

-

Iteration modifiers

+

Iteration modifiers

Modifiers for iteration are defined in layers 1, 2, and 4. Two 2-modifiers, βš‡ and βŽ‰, use a list of numbers obtained by applying the right operand to the arguments in order to control application. This list has one to three elements: if all three are given then they correspond to the monadic, left, and right arguments; if one is given then it controls all three; and if two are given then they control the left argument, and the right and monadic arguments.

The iteration modifiers βŒœΒ¨βš‡Λ˜βŽ‰ process elements or cells in index order, that is, according to lexicographic ordering of indices or according to simple numeric ordering of the indices in the Deshaped (β₯Š) arguments. When both arguments are mapped over independently, the left argument is mapped over "first", or as an outer loop: one part of the left argument is paired with each part of the right in turn, then the next part of the left argument, and so on.

Table (⌜) and Each (Β¨) map over the elements of arrays to produce result elements. They convert atom arguments to unit arrays. With one argument, the two modifiers are the same; with two, they differ in how they pair elements. Table pairs every element of the left argument with every element of the right, giving a result shape π•¨βˆΎβ—‹β‰’π•©. Each uses leading axis agreement: it requires one argument's shape to be a prefix of the other's (if the arguments have the same rank, then the shapes must match and therefore be mutual prefixes). This causes each element of the lower-rank argument to correspond to a cell of the higher-rank one; it's repeated to pair it with each element of that cell. The result shape is the shape of the higher-rank argument.

@@ -118,13 +118,13 @@

Fold (Β΄), Insert (˝), and Scan (`) repeatedly apply a function between parts of an array. Fold requires the argument to have rank 1 and applies the operand between its elements, while Insert requires it to have rank 1 or more and applies it between the cells. For each of these two functions, the operand is applied beginning at the end of the array, and an identity value is returned if the array is empty. While these functions reduce multiple values to a single result, Scan returns many results and preserves the shape of its argument. It requires the argument to have rank at least 1, and applies the function between elements along columnsβ€”that is, from one element in a major cell to the one in the same position of the next major cell. This application begins at the first major cell of the array. Scan never uses the identity element of its operand because if the argument is empty then the result, which has the same shape, will be empty as well.

A left argument for any of the three reduction-based modifiers indicates an initial value to be used, so that the first application of the operand function applies not to two values from 𝕩 but instead to a value from 𝕨 and a value from 𝕩. In Fold and Insert, the entire value 𝕨 is the initial value, while in Scan, 𝕨 is an array of initial values, which must have shape 1↓≒𝕩.

Repeat (⍟) applies the operand function, or its inverse, several times in sequence. The right operand must consist only of integer atoms (arranged in arrays of any depth), and each number there is replaced with the application of the left operand that many times to the arguments. If a left argument is present, then it's reused each time, as if it were bound to the operand function. For a negative number -n, the function is "applied" -n times by undoing it n times. In both directions, the total number of times the function is applied is the maximum of all numbers present: results must be saved if intermediate values are needed.

-

Restructuring

+

Restructuring

Enclose (<) forms a unit array that contains its argument.

Merge (>) combines the outer axes of an array of arrays with inner axes: it requires that all elements of its argument have the same shape, and creates an array such that (i∾j)βŠ‘>𝕩 is iβŠ‘jβŠ‘π•©. It also accepts atom elements of 𝕩, converting them to unit arrays, or an atom argument, which is returned unchanged. Solo and Couple (≍) turn one or two arguments into major cells of the result and can be defined easily in terms of Merge.

Join To (∾) combines its two arguments along an existing initial axis, unless both arguments are units, in which case it creates an axis and is identical to Couple (≍). The arguments must differ in rank by at most 1, and the result rank is equal to the maximum of 1 and the higher argument rank. Each argument with rank less than the result, and each major cell of an argument with rank equal to it, becomes a major cell of the result, with cells from the left argument placed before those from the right. Join (∾) generalizes the equal-rank subset of this behavior to an array of values instead of just two. The argument must be an array (unlike Merge), and its elements must all the same rank, which is at least the argument rank. Atom elements are treated as unit arrays. Then "outer" argument axes are matched up with leading "inner" element axes, and elements are joined along these axes. In order to allow this, the length of an element along a particular axis must depend only on the position along the corresponding axis in the argument. An empty argument to Join is return unchanged, as though the element rank is equal to the argument rank.

Deshape (β₯Š) differs from the provided function (which returns the element list of an array) only in that it accepts an atom, returning a one-element list containing it. Reshape (β₯Š) is extended in numerous ways. It accepts any list of natural numbers (including as a unit array or atom) for the left argument and any right argument; 𝕩 is deshaped first so that it is treated as a list of elements. These elements are repeated cyclically to fill the result array in ravel order. If 𝕩 is empty then a non-empty requested result shape causes an error. Furthermore, at most one element of 𝕨 can be a "length code": one of the primitives βˆ˜βŒŠβŒ½β†‘. In this case, a target length is computed from the number of elements in 𝕩 divided by the product of the other elements of 𝕨 (which must not be zero). If the target length is an integer then it is used directly for the length code. Otherwise, an error is given if the length code is ∘, and the target length is rounded down if the code is ⌊ and up if it's ⌽ or ↑. With code ⌽, elements are repeated cyclically as usual, but with code ↑, the extra elements after each argument element is used are fill values for 𝕩.

Transpose (⍉) reorders axes of its argument to place the first axis last; if the argument has one or fewer axes then it's enclosed if it's an atom and otherwise returned unchanged. Reorder Axes (⍉) requires the left argument to be a list or unit of natural numbers, with length at most the rank of the right argument. This list is extended to match the right argument rank exactly by repeatedly appending the least unused natural number (for example, given 1β€Ώ3β€Ώ0β€Ώ0, 2 is appended). After extension, it specifies a result axis for each axis of the right argument. There must be no gaps in the list: that is, with the result rank equal to one plus the greatest value present, every result axis must appear at least once. Now each argument axis is "sent to" the specified result axis: in terms of indices, iβŠ‘π•¨β‰π•© is (π•¨βŠi)βŠ‘π•© if 𝕨 is complete. If multiple argument axes correspond to the same result axis, then a diagonal is taken, and it's as long as the shortest of those argument axes. Like Transpose, Reorder Axes encloses 𝕩 if it's an atom, so that its result is always an array.

-

Indices and selection

+

Indices and selection

Each element in an array sβ₯Še is associated with an index, which is a list of natural numbers i such that ∧´i<s. The list of all indices, which corresponds to the element list e, contains all such lists i in lexicographic order. That is, index i comes before j exactly when the two indices are not the same, and i has the smaller value at the first position where they are unequal. The index of an element along a particular axis a is the value aβŠ‘i.

Range (↕) is extended to apply to a list of natural numbers, in addition to the provided case of a single natural number (an enclosed natural number 𝕩 should still result in an error). For a list 𝕩, the result is an array of shape 𝕩 in which the value at a given index is that index, as a list of natural numbers. That is, i≑iβŠ‘β†•π•© for any list of natural numbers i with ∧´i<𝕩.

Pick (βŠ‘) is extended to array left arguments. In this case, it requires every depth-1 array in the nested structure of 𝕨 to be a valid index list for 𝕩, and every atom to be contained in one of these lists. The result is 𝕨 with each index list replaced by the element of 𝕩 at that index. In the simple case where 𝕨 itself is an index list, the result is the element of 𝕩 at index 𝕨.

@@ -133,14 +133,14 @@

First Cell (⊏) selects the initial major cell of 𝕩, giving an error if 𝕩 has rank 0 or length 0.

Group (βŠ”) performs an opposite operation to Select, so that 𝕨 specifies not the argument index that result values come from, but the result index that argument values go to. The general case is that 𝕨 is a list of arrays of numbers; if it has depth less than 2 it's converted to this form by first enclosing it if it's an atom, then placing it in a length-1 list. After this transformation, the result rank is ≠𝕨, and each result element has rank (≠𝕨)+(=𝕩)-+Β΄=¨𝕨, with the initial ≠𝕨 axes corresponding to elements of 𝕨 and the remainder to trailing axes of 𝕩. Each atom in 𝕨 can be either a natural number or Β―1 (which indicates the corresponding position in 𝕩 will be omitted). If Β―1 doesn't appear, the result has the property that each cell of 𝕩 appears in the corresponding element of π•¨βŠπ•¨βŠ”π•©. More concretely, the length of the result along axis a is the maximum value in aβŠ‘π•¨ plus one, or zero if aβŠ‘π•¨ is empty. Axis a corresponds to =aβŠ‘π•¨ axes in 𝕩, and an element of the result at position i along this axis contains all positions in 𝕩 where i=aβŠ‘π•¨. There may be multiple such positions, and they're arranged along axis a of that result element according to their index order in 𝕩. The shapes of components of 𝕨 must match the corresponding axes of 𝕩, except for rank-1 components of 𝕨, which can match or have an extra element. This element, which like the others is either a natural number or Β―1, gives the minimum length of the result axis corresponding to the component of 𝕨 in question, but otherwise does not affect the result. Group Indices treats its argument 𝕩 as a left argument for Group and uses a right argument made up of indices, which is ↕≠𝕩 if 𝕩 has depth 1 and β†•βˆΎβ‰’Β¨π•© if it has depth 2. Because the depth-1 case uses atomic indices, 𝕩 is required to be a list (and it can't be an atom). Much like Range, the result has depth one higher than the argument.

Indices (/) applies to a list of natural numbers, and returns a list of natural numbers. The result contains iβŠ‘π•© copies of each natural number index i for 𝕩, in increasing order.

-

Structural manipulation

+

Structural manipulation

Monadic structural functions work on the first axis of the argument, so they require it to have rank at least 1. Reverse (⌽) reverses the ordering of the major cells of 𝕩. Nudge (Β») shifts them forward, removing the last and placing a major cell made up of fill elements at the beginning, while Nudge Back (Β«) does the same in the reverse direction, so it removes the first cell and places fills at the end. Prefixes (↑) and Suffixes (↓) each return lists with length one higher than 𝕩, whose elements are arrays with the same rank as 𝕩. For Prefixes, the element of the result at index i contains the first i major cells of 𝕩 in order, and for Suffixes, it contains all but these major cells.

The remainder of the functions discussed in this section are dyadic. For all of these, an atom value for 𝕩 is treated as an array by enclosing it before acting, so that the result is never an atom. There are four functions for which 𝕨 is a list of whole numbersβ€”but an atomic number or enclosed number is also permitted, and treated as a 1-element listβ€”and its elements are matched with leading axes of 𝕩. These functions independently manipulate each axis: one way to define such a process is to consider lists running along the axis, where every element of the index is fixed except one. A change to this axis retains the fixed indices, but can move elements from one location to another along the variable index, add fill elements, or split the axis into two axes. A change to a different axis can rearrange these lists along the original axis, but can't affect the placement of elements within them. In the reference implementations, working on leading axes is accomplished using the Cells (˘) modifier recursively, so that action on the first axes doesn't use Cells, on the next is affected by Cells once, then twice, and so on.

Rotate (⌽) is the simplest of these four functions: each element of 𝕨 gives an amount to rotate the corresponding axis, where a rotation of r moves the element at index i+r to i when all indices are taken modulo the length of the axis. Windows (↕) splits each axis of 𝕩 that corresponds to an element of 𝕨 in two, so that the result has one set of axes corresponding to elements of 𝕨, then another, then the unchanged trailing axes. The second set of axes has lengths given by 𝕨 (which must consist of natural numbers), while the first has lengths s¬𝕨, where s contains the lengths of leading axes of 𝕩. Position i in the first set of axes and j in the second corresponds to i+j in the argument, so that fixing one of these positions and varying the other gives a slice of the argument. In both Rotate and Windows, the length of 𝕨 is at most the rank of 𝕩.

Take (↑) offers several possibilities. The absolute value of 𝕨 gives the final lengths of the axes in the result. It may be positive to indicate that the axis aligns with 𝕩 at the beginning, or negative to indicate it aligns at the end. A zero value gives no result elements, so there is no need to consider alignment. If the absolute value of an element of 𝕨 is smaller than or equal to the corresponding length in 𝕩, then the first or last few elements are taken along that axis. If it is larger, then instead fill elements are added to the end (if positive) or beginning (if negative) to make up the difference in length. Drop (↓) gives 𝕨 a similar meaning, but excludes all elements that Take includes (maintaining the order of the retained ones). The result of Drop never uses fill elements. In a case where Take would use fill elements, it would include all positions from 𝕩, so Drop should include none of them, and the result will have length 0 for that axis. Take and Drop are extended to allow an argument with length greater than the rank of 𝕩. In this case leading length-1 axes are added to 𝕩 so that its rank matches 𝕨 before taking or dropping.

Replicate (/) is similar to the four dyadic structural functions above, but 𝕨 gives a list of containing lists of natural numbers, or plain or enclosed natural numbers, instead of a simple list. If 𝕨 has depth less than 2, it's considered to be a single value corresponding to one axis of 𝕩, while if it has depth 2 then it's a list of values. If 𝕨 is the empty list ⟨⟩ then it is defined to be in the second case despite having a depth of 1. On a single axis of 𝕩 the corresponding value r from 𝕨 is either a list or a unit: if it's a unit then it is repeated to match the length of that axis of 𝕩, and if it's a list it must already have the same length as that axis. Each number in r now specifies the number of times to repeat the corresponding position in 𝕩. This is equivalent to calling Indices on r and using the result for selection.

Shift Before (Β») and Shift After (Β«) are derived from Join To and share most of its behavior. The difference is that only a portion of the result of Join To is returned, matching the length of 𝕩. This portion comes from the beginning for Shift Before and the end for Shift After. The only difference in conditions between the shift functions and Join To is that Join To allows the result to have higher rank than 𝕩. Shifts do not, so the rank of 𝕩 be at least 1 and at least as high as 𝕨.

-

Searching

+

Searching

Match (≑) indicates whether two values are considered equivalent. It always returns 0 or 1, and never causes an error. If both arguments are atoms then it is identical to =, and if one is an atom and the other an array then it returns 0. If both arguments are arrays then it returns 1 only if they have the same shape and all pairs of corresponding elements match. Fill elements aren't taken into account, so that arrays that match might still differ in behavior. Not Match simply returns the complement of Match, ¬≑.

Monadic search functions compare the major cells of 𝕩 to each other. 𝕩 must have rank at least 1. Except for Deduplicate (⍷), the result is a list of numbers with the same length as 𝕩.

Find (⍷) indicates positions where 𝕨 appears as a contiguous subarray of a =𝕨-cell of 𝕩. It has one result element for each such subarray of 𝕩, whose value is 1 if that subarray matches 𝕩 and 0 otherwise. Find cannot result in an error unless the rank of 𝕨 is higher than that of 𝕩. If 𝕨 is longer along one axis than the corresponding trailing axis of 𝕩, then the result has length 0 along that axis. Any atom argument to Find is automatically enclosed.

-

Sorting

+

Sorting

Sorting functions are those that depend on BQN's array ordering. There are three kinds of sorting function, with two functions of each kind: one with an upward-pointing glyph that uses an ascending ordering (these function names are suffixed with "Up"), and one with a downward-pointing glyph and the reverse, descending, ordering ("Down"). Below, these three kinds of function are described, then the ordering rules. Except for the right argument of Bins, all arguments must have rank at least 1.

Sort (∧∨) reorders the major cells of its argument so that a major cell with a lower index comes earlier in the ordering than a major cell with a higher index, or matches it. If it's possible for matching arrays to differ in behavior because of different (including undefined versus defined) fill elements, then these arrays must maintain their ordering (a stable sort is required).

Grade (⍋⍒) returns a permutation describing the way the argument array would be sorted. For this reason the reference implementations simply define Sort to be selection by the grade. One way to define Grade is as a sorted version of the index list ↕≠𝕩. An index i is ordered according to the corresponding major cell iβŠπ•©. However, ties in the ordering are broken by ordering the index values themselves, so that no two indices are ever considered equal, and the result of sorting is well-defined (for Sort this is not an issueβ€”matching cells are truly interchangeable). This property means that a stable sorting algorithm must be used to implement Grade functions. While cells might be ordered ascending or descending, indices are always ordered ascending, so that for example index i is placed before index j if either iβŠπ•© comes earlier in the ordering than jβŠπ•©, or if they match and i<j.

diff --git a/docs/spec/scope.html b/docs/spec/scope.html index be592157..5576f6c8 100644 --- a/docs/spec/scope.html +++ b/docs/spec/scope.html @@ -4,10 +4,10 @@ Specification: BQN variable scoping -

Specification: BQN variable scoping

+

Specification: BQN variable scoping

BQN uses lexical scoping for variables, where scopes correspond roughly to blocks, or pairs of curly braces separated by semicolons. At the top level in a scope, new variables are visible only after they are defined, but in the scopes it contains, all variables defined in that scope are visible. This system is specified more precisely below.

A running BQN program manipulates variables during its execution, but it is important to distinguish these variables from the identifiers that refer to them. As defined in the tokenization rules, an identifier is a particular kind of token found in a program's source code. The lexical scoping rules in this page define which identifiers are considered the same; these identifiers will refer to the same variables when the program is run. While each variable has only one identifier, an identifier can refer to any number of variables because a new variable is created for that identifier each time its containing scope is instantiated (that is, each time the contents of the block are evaluated).

-

Identifier equivalence with lexical scoping

+

Identifier equivalence with lexical scoping

In this section the concept of an identifier's definition, a possibly different instance of that identifier, is specified. The definition determines when identifiers refer to the "same thing". In concrete terms, identifiers with the same definition all manipulate the same variable in a particular instance of the definition's containing scope.

A scope is a PROGRAM, brSub, FCase, FMain, _mCase, _mMain, _cCase_, _cMain_, or brNS node as defined by the BQN grammar. An identifier instance is an s, F, _m, or _c_ node; its containing scope is the "smallest" scope that contains itβ€”the scope that contains the identifier but not any other scopes containing the identifier. An identifier instance is defined when it is contained in the left hand side of an ← assignment expression, that is, the leftmost component of one of the five grammatical rules with ASGN, provided that the ASGN node is "←" or "⇐", or in a scope header, that is, a component immediately preceding ":". Each identifier instance in a valid BQN program corresponds to exactly one such defined identifier, called its definition, and two instances are considered to refer to the same identifier if they have the same definition.

Two identifier instances have the same name if their tokens, as strings, match after removing all underscores _ and ignoring case (so that the letters a to z are equal to their uppercase equivalents A to Z for this comparison). However, instances with the same name are not necessarily the same identifier, as they must also have the same definition. A defined identifier is a potential definition of another identifier instance if the two have the same name, and either:

@@ -19,16 +19,16 @@

The definition for an identifier is chosen from the potential definitions based on their containing scopes: it is the one whose containing scope does not contain or match the containing scope of any other potential definition. If for any identifier there is no definition, then the program is not valid and results in an error. This can occur if the identifier has no potential definition, and also if two potential definitions appear in the same scope. In fact, under this scheme it is never valid to make two definitions with the same name at the top level of a single scope, because both definitions would be potential definitions for the one that comes second in program order. Both definitions have the same containing scope, and any potential definition must contain or match this scope, so no potential definition can be selected.

The definition of program order for identifier tokens follows the order of BQN execution. It corresponds to the order of a particular traversal of the abstract syntax tree for a program. To find the relative ordering of two identifiers in a program, we consider the highest-depth node that they both belong to; in this node they must occur in different components, or that component would be a higher-depth node containing both of them. In most nodes, the program order goes from right to left: components further to the right come earlier in program order. The exceptions are PROGRAM, BODY, NS_BODY, list, subject (for stranding), and body case (FCase, _mCase, _cCase_, FMain, _mMain, _cMain_, brSub, BrFunc, _brMod1, and _brMod2_) nodes, in which program order goes in the opposite order, from left to right (some assignment target nodes also contain lists or strands, but their ordering is irrelevant because if two identifiers with the same name appear in such a list, then it can't be a definition).

A subject label is the s term in a brSub node. As part of a header, it can serve as the definition for an identifier. However, it's defined to be a syntax error if another instance of this identifier appears, except in a Return node (which cannot access its value).

-

Special names

+

Special names

Special names such as 𝕩 or 𝕣 refer to variables, but have no definition and do not use scoping. Instead, they always refer to the immediately enclosing scope, and are defined automatically when the block is evaluated.

The six special names are π•¨π•©π•—π•˜π•€π•£, and the tokens π•Žπ•π”½π”Ύπ•Š, _𝕣, and _𝕣_ are alternate spellings of these names as described in the tokenization rules. Special names may be modified with ↩ assignment but cannot appear as the target of other kinds of assignment. Two special names represent the same identifier if they are the same name and appear in the same body. The initial value these names have is defined by the evaluation rules; the grammar for blocks ensures that all special names used in a block will be defined (possibly as the special value Β· in the case of 𝕨).

-

Imports and exports

+

Imports and exports

Names that are preceded by an atom "." term, or that appear as LHS_NAME terms in an NS_VAR or lhsNs, are variable references in a namespace: in the first case, the result of the atom node, and in the second, of the overall assignments subExpr right hand side. These names do not follow lexical scoping; in general they must be stored in order to perform a name lookup when the namespace is available. Such a name in lhsNs, or in NS_VAR with no accompanying lhs "⇐" term, additionally serves as an identifier within the actual enclosing scope, which works like any other assignment.

An identifier is exported if the ASGN node in its definition is "⇐", or if it appears anywhere in an EXPORT term. An identifier can only be exported in the scope where it is defined, and not in a containing scope. An EXPORT term that includes an identifier from such a scope causes an error.

-

Variables

+

Variables

A variable is an entity that permits two operations: it can be set to a particular value, and its value can be obtained, resulting in the last value it was set to. When either operation is performed it is referred to as accessing the variable.

When a body in a block is evaluated, it creates a namespace, which contains a variable for each definition (that is, defined identifier instance) the body contains. Whenever another blockβ€”the block itself, not its contentsβ€”is evaluated during the execution of the block, it is linked to the currently-evaluating block, so that it will use the variables defined in this instance. By following these links repeatedly, an instance of a block is always linked to exactly one instance of each block that contains it. These links form a tree that is not necessarily related to the call stack of functions and modifiers. Using the links, the variable an identifier refers to is the one corresponding to that variable's definition in the linked instance of the containing scope for the definition.

The first access to a variable must be made by its definition (this also means it sets the variable). If a different instance of its identifier accesses it first, then an error results. This can happen because every scope contained in a particular scope sees all the definitions it uses, and such a scope could be called before the definition is run. Because of conditional execution, this property must be checked at run time in general; however, in cases where it is possible to statically determine that a program will always violate it, a BQN instance can give an error at compile time rather than run time.

A namespace defines a mapping from names to variables: if the given name is shared by an exported identifier in the body used to create that namespace, then that name maps to the variable corresponding to that identifier. The mapping is undefined for other names.

-

Returns

+

Returns

The name NAME | "π•Š" | "𝕣" in a Return node is resolved exactly like any other identifier. Following resolution, the block that defines the identifier must not be a namespace block (export variables or contain an EXPORT statement). Furthermore, if it is a NAME, then its definition must be an internal name for a containing block: s in brSub, F in FuncHead or FMain, _m in Mod1H1 or _mMain, or _c_ in Mod2H1 or _cMain_. When reached, the Return node's identifier is not accessed; instead, it is used to indicate the namespace that contains it, and through this the block evaluation that created that namespace.

diff --git a/docs/spec/system.html b/docs/spec/system.html index 53cde76c..76d36a2f 100644 --- a/docs/spec/system.html +++ b/docs/spec/system.html @@ -4,11 +4,11 @@ Specification: BQN system-provided values -

Specification: BQN system-provided values

+

Specification: BQN system-provided values

This portion of the spec is still potentially subject to major changes.

The β€’ symbol is used to access values other than primitives provided by BQN.

All system values described in the BQN specification are optional: an implementation does not have to include any of them. However, if a system value with one of the names given below is included, then it must have the specified behavior. For namespaces this rule applies to individual fields as well: a namespace may be provided with only some of the fields, but a field with one of the given names must behave as specified.

-

Execution and scope manipulation

+

Execution and scope manipulation

@@ -37,7 +37,7 @@

The effect of β€’Eval should be the same as if its argument were written as source code in the scope where β€’Eval appears. It can define variables, and modify those in the current scope or a parent.

β€’ScopedEval creates as new scope for evaluation as it is loaded. Other than its syntactic role, it is effectively equivalent to {β€’Eval}. Parent scopes are visible from the created scope; to make a scope without this property use β€’BQN"β€’Eval" or β€’BQN"β€’ScopedEval".

-

Scripts

+

Scripts

@@ -74,11 +74,11 @@

β€’path simply gives the path of the file in which it appears. It includes a trailing slash but not the name of the file itself.

β€’name gives the name, including the extension, of the file in which it appears. It doesn't include the path.

β€’Exit immediately terminates the running BQN process. If the argument is a valid return code (on Unix, an integer), it is returned; otherwise, the default return code (the one returned when the end of the program is reached) is used.

-

Files

+

Files

The system namespace value β€’file deals with file operations. For the purposes of β€’file, paths in the filesystem are always strings. As with β€’Import, file paths may be relative or absolute, and relative paths are relative to β€’path, except in β€’file.At which allows 𝕨 to specify an alternate base directory. The value β€’path used for a particular instance of β€’file is determined by the file that contains that instance.

When a β€’file function returns a file path or portion of a path, the path is always absolute and canonical, with . and .. components removed.

Possible fields of β€’file are given in the subsections below.

-

File paths

+

File paths

The following functions manipulate paths and don't access files. Each takes a relative or absolute path 𝕩, and At may also take a base directory 𝕨.

@@ -118,7 +118,7 @@
-

File metadata

+

File metadata

Metadata functions may query information about a file or directory but do not read to or write from it. Each takes a path 𝕩, and some functions also allow new data in 𝕨. The returned data in any case is the specified property.

@@ -172,7 +172,7 @@
  • 'b': Block device
  • 'c': Character device
  • -

    File access

    +

    File access

    File access functions read or write files, either by manipulating files as a whole or interacting with the contents. Whole-file functions cannot overwrite target files: that is, Rename and Copy must give an error if a file exists at 𝕨, and CreateDir if a file exists at 𝕩, while Chars, Lines, and Bytes can overwrite the contents of an existing file 𝕨. However, these three functions must give an error if 𝕨 exists and is a directory.

    @@ -255,8 +255,8 @@
    -

    Open file object

    -

    Input and output

    +

    Open file object

    +

    Input and output

    @@ -285,7 +285,7 @@

    β€’Out prints a string to stdout, with a trailing newline. β€’Show displays a BQN value to the programmer (the representation is not specified, and does not need to be plain text). β€’Fmt returns a string (not a character table: lines are separated by linefeeds) indicating how 𝕩 would be printed by the interactive environment. Both β€’Show and β€’Fmt may take a left argument configuring how the value should be formatted.

    β€’Repr attempts to return a string so that β€’BQN β€’Repr 𝕩 matches 𝕩. If 𝕩 contains any mutable values (operations or namespaces), this is not possible. However, if such a values is stateless, in the sense that they don't access variables outside of their own scopes, it is permissible for β€’Repr to return source code that would create a value with identical behavior.

    -

    Operation properties

    +

    Operation properties

    @@ -411,7 +411,7 @@
    -

    Time

    +

    Time

    diff --git a/docs/spec/token.html b/docs/spec/token.html index e2b851a3..11e661f9 100644 --- a/docs/spec/token.html +++ b/docs/spec/token.html @@ -4,7 +4,7 @@ Specification: BQN token formation -

    Specification: BQN token formation

    +

    Specification: BQN token formation

    This page describes BQN's token formation rules (token formation is also called scanning). Most tokens in BQN are a single character long, but quoted characters and strings, identifiers, and numbers can consist of multiple characters, and comments, spaces, and tabs are discarded during token formation.

    BQN source code should be considered as a series of unicode code points, which we refer to as "characters". The separator between lines in a file is considered to be a single character, newline, even though some operating systems such as Windows typically represent it with a two-character CRLF sequence. Implementers should note that not all languages treat unicode code points as atomic, as exposing the UTF-8 or UTF-16 representation instead is common. For a language such as JavaScript that uses UTF-16, the double-struck characters π•¨π•Žπ•©π•π•—π”½π•˜π”Ύ are represented as two 16-bit surrogate characters, but BQN treats them as a single unit.

    A BQN character literal consists of a single character between single quotes, such as 'a', and a string literal consists of any number of characters between double quotes, such as "" or "abc". Character and string literals take precedence with comments over other tokenization rules, so that # between quotes does not start a comment and whitespace between quotes is not removed, but a quote within a comment does not start a character literal. Almost any character can be included directly in a character or string literal without escaping. The only exception is the double quote character ", which must be written twice to include it in a string, as otherwise it would end the string instead. Character literals require no escaping at all, as the length is fixed. In particular, literals for the double and single quote characters are written ''' and '"', while length-1 strings containing these characters are "'" and """".

    diff --git a/docs/spec/types.html b/docs/spec/types.html index ea44bb76..ce432dbd 100644 --- a/docs/spec/types.html +++ b/docs/spec/types.html @@ -4,7 +4,7 @@ Specification: BQN types -

    Specification: BQN types

    +

    Specification: BQN types

    BQN programs manipulate data of seven types: