From 0c716e4c6b7c2c44bbfd02b6503cae66af7b7480 Mon Sep 17 00:00:00 2001 From: Marshall Lochbaum Date: Fri, 28 Jan 2022 16:34:41 -0500 Subject: Separate syntax highlighting category for header/body characters ;:? --- docs/spec/complex.html | 2 +- docs/spec/evaluate.html | 8 ++++---- docs/spec/grammar.html | 44 ++++++++++++++++++++++---------------------- docs/spec/inferred.html | 2 +- docs/spec/literal.html | 6 +++--- docs/spec/primitive.html | 2 +- 6 files changed, 32 insertions(+), 32 deletions(-) (limited to 'docs/spec') diff --git a/docs/spec/complex.html b/docs/spec/complex.html index a5d4826e..3ef70e13 100644 --- a/docs/spec/complex.html +++ b/docs/spec/complex.html @@ -8,7 +8,7 @@

Complex numbers are an optional extension to BQN's numeric system. If they are supported, the following functionality must also be supported. This extension is a draft and is versioned separately from the rest of the BQN specification.

A complex number is a value with two components, a real part and an imaginary part. The type of each component is a real number, as described in the type specification. However, this type replaces the number type given there.

The numeric literal notation is extended with the character i, which separates two real-valued components (in effect, it has lower "precedence" than other characters like e and Β―). If a second component is present (using i or I), that component's value is multiplied by the imaginary unit i and added to the first component; otherwise the value is the first component's value without modification. As with real numbers, the exact complex number given is rounded to fit the number system in use.

-
complexnumber = number ( ( "i" | "I" ) number )?
+
complexnumber = number ( ( "i" | "I" ) number )?
 

Basic arithmetic functions +-Γ—Γ· are extended to complex numbers. A monadic case for the function + is added, which returns the conjugate argument: a number with real part equal to the real part of 𝕩 and imaginary part negated relative to 𝕩.

The primitive function ⍳ is added: the character ⍳ forms a primitive function token, and its value is the function {π•¨βŠ’βŠ˜+0j1×𝕩}. This function multiplies 𝕩 by i, then adds 𝕨 if given.

diff --git a/docs/spec/evaluate.html b/docs/spec/evaluate.html index ef0bd0ca..9bb02864 100644 --- a/docs/spec/evaluate.html +++ b/docs/spec/evaluate.html @@ -20,9 +20,9 @@

Assignment

An assignment is one of the four rules containing ASGN. It is evaluated by first evaluating the right-hand-side subExpr, FuncExpr, _m1Expr, or _m2Exp_ expression, and then storing the result in the left-hand-side identifier or identifiers. The result of the assignment expression is the result of its right-hand side. Except for subjects, only a lone identifier is allowed on the left-hand side and storage sets it equal to the result. For subjects, destructuring assignment is performed when an lhs is lhsList or lhsStr. Destructuring assignment is performed recursively by assigning right-hand-side values to the left-hand-side targets, with single-identifier assignment as the base case. The target "Β·" is also possible in place of a NAME, and performs no assignment.

The right-hand-side value, here called v, in destructuring assignment must be a list (rank 1 array) or namespace. If it's a list, then each LHS_ENTRY node must be an LHS_ELT. The left-hand side is treated as a list of lhs targets, and matched to v element-wise, with an error if the two lists differ in length. If v is a namespace, then the left-hand side must be an lhsStr where every LHS_ATOM is an NAME, or an lhsList where every LHS_ENTRY is an NAME or lhs "⇐" NAME, so that it can be considered a list of NAME nodes some of which are also associated with lhs nodes. To perform the assignment, the value of each name is obtained from the namespace v, giving an error if v does not define that name. The value is assigned to the lhs node if present (which may be a destructuring assignment or simple subject assignment), and otherwise assigned to the same NAME node used to get it from v.

-

Modified assignment is the subject assignment rule lhs Derv "↩" subExpr?. In this case, lhs is evaluated as if it were a subExpr (the syntax is a subset of subExpr), and passed as an argument to Derv. The full application is lhs Derv subExpr, if subExpr is given, and Derv lhs otherwise. Its value is assigned to lhs, and is also the result of the modified assignment expression.

+

Modified assignment is the subject assignment rule lhs Derv "↩" subExpr?. In this case, lhs is evaluated as if it were a subExpr (the syntax is a subset of subExpr), and passed as an argument to Derv. The full application is lhs Derv subExpr, if subExpr is given, and Derv lhs otherwise. Its value is assigned to lhs, and is also the result of the modified assignment expression.

Expressions

-

We now give rules for evaluating an atom, Func, _mod1 or _mod2_ expression (the possible options for ANY). A literal or primitive sl, Fl, _ml, or _cl_ has a fixed value defined by the specification (literals and built-ins). An identifier s, F, _m, or _c_, if not preceded by atom ".", must have an associated variable due to the scoping rules, and returns this variable's value, or causes an error if it has not yet been set. If it is preceded by atom ".", then the atom node is evaluated first; its value must be a namespace, and the result is the value of the identifier's name in the namespace, or an error if the name is undefined. A parenthesized expression such as "(" _modExpr ")" simply returns the result of the interior expression. A braced construct such as BraceFunc is defined by the evaluation of the statements it contains after all parameters are accepted. Finally, a list "⟨" β‹„? ( ( EXPR β‹„ )* EXPR β‹„? )? "⟩" or ANY ( "β€Ώ" ANY )+ consists grammatically of a list of expressions. To evaluate it, each expression is evaluated in source order and their results are placed as elements of a rank-1 array. The two forms have identical semantics but different punctuation.

+

We now give rules for evaluating an atom, Func, _mod1 or _mod2_ expression (the possible options for ANY). A literal or primitive sl, Fl, _ml, or _cl_ has a fixed value defined by the specification (literals and built-ins). An identifier s, F, _m, or _c_, if not preceded by atom ".", must have an associated variable due to the scoping rules, and returns this variable's value, or causes an error if it has not yet been set. If it is preceded by atom ".", then the atom node is evaluated first; its value must be a namespace, and the result is the value of the identifier's name in the namespace, or an error if the name is undefined. A parenthesized expression such as "(" _modExpr ")" simply returns the result of the interior expression. A braced construct such as BraceFunc is defined by the evaluation of the statements it contains after all parameters are accepted. Finally, a list "⟨" β‹„? ( ( EXPR β‹„ )* EXPR β‹„? )? "⟩" or ANY ( "β€Ώ" ANY )+ consists grammatically of a list of expressions. To evaluate it, each expression is evaluated in source order and their results are placed as elements of a rank-1 array. The two forms have identical semantics but different punctuation.

Rules in the table below are function and modifier evaluation.

@@ -38,7 +38,7 @@ - + @@ -82,7 +82,7 @@ - + diff --git a/docs/spec/grammar.html b/docs/spec/grammar.html index 3b18a102..fc8e2542 100644 --- a/docs/spec/grammar.html +++ b/docs/spec/grammar.html @@ -8,19 +8,19 @@

BQN's grammar is given below. Terms are defined in a BNF variant. However, handling special names properly is possible but difficult in BNF, so they are explained in text along with the braced block grammar.

The symbols s, F, _m, and _c_ are identifier tokens with subject, function, 1-modifier, and 2-modifier classes respectively. Similarly, sl, Fl, _ml, and _cl_ refer to literals and primitives of those classes. While names in the BNF here follow the identifier naming scheme, this is informative only: syntactic roles are no longer used after parsing and cannot be inspected in a running program.

A program is a list of statements. Almost all statements are expressions. Namespace export statements, and valueless results stemming from Β·, or 𝕨 in a monadic brace function, can be used as statements but not expressions.

-
PROGRAM  = β‹„? ( STMT β‹„ )* STMT β‹„?
+
PROGRAM  = β‹„? ( STMT β‹„ )* STMT β‹„?
 STMT     = EXPR | nothing | EXPORT
 β‹„        = ( "β‹„" | "," | \n )+
 EXPR     = subExpr | FuncExpr | _m1Expr | _m2Expr_
-EXPORT   = LHS_ELT? "⇐"
+EXPORT   = LHS_ELT? "⇐"
 

Here we define the "atomic" forms of functions and modifiers, which are either single tokens or enclosed in paired symbols. Stranded lists with β€Ώ, which binds more tightly than any form of execution, are also included.

ANY      = atom | Func | _mod1 | _mod2_
-_mod2_   = ( atom "." )? _c_ | _cl_ | "(" _m1Expr_ ")" | _blMod2_
-_mod1    = ( atom "." )? _m  | _ml  | "(" _m2Expr  ")" | _blMod1
-Func     = ( atom "." )?  F  |  Fl  | "(" FuncExpr ")" |  BlFunc
-atom     = ( atom "." )?  s  |  sl  | "(" subExpr  ")" |  blSub | list
-list     = "⟨" β‹„? ( ( EXPR β‹„ )* EXPR β‹„? )? "⟩"
+_mod2_   = ( atom "." )? _c_ | _cl_ | "(" _m1Expr_ ")" | _blMod2_
+_mod1    = ( atom "." )? _m  | _ml  | "(" _m2Expr  ")" | _blMod1
+Func     = ( atom "." )?  F  |  Fl  | "(" FuncExpr ")" |  BlFunc
+atom     = ( atom "." )?  s  |  sl  | "(" subExpr  ")" |  blSub | list
+list     = "⟨" β‹„? ( ( EXPR β‹„ )* EXPR β‹„? )? "⟩"
 subject  = atom | ANY ( "β€Ώ" ANY )+
 

Starting at the highest-order objects, modifiers have simple syntax. In most cases the syntax for ← and ↩ is the same, but only ↩ can be used for modified assignment. The export arrow ⇐ can be used in the same ways as ←, but it can also be used at the beginning of a header to force a namespace result, or with no expression on the right in an EXPORT statement.

@@ -46,12 +46,12 @@

Subject expressions consist mainly of function application. We also define nothing-statements, which have very similar syntax to subject expressions but do not permit assignment. They can be used as an STMT or in place of a left argument.

arg      = subExpr
-         | ( subject | nothing )? Derv arg
+         | ( subject | nothing )? Derv arg
 nothing  = "Β·"
-         | ( subject | nothing )? Derv nothing
+         | ( subject | nothing )? Derv nothing
 subExpr  = arg
          | lhs ASGN subExpr
-         | lhs Derv "↩" subExpr?      # Modified assignment
+         | lhs Derv "↩" subExpr?      # Modified assignment
 

The target of subject assignment can be compound to allow for destructuring. List and namespace assignment share the nodes lhsList and lhsStr and cannot be completely distinguished until execution. The term sl in LHS_SUB is used for header inputs below: as an additional rule, it cannot be used in the lhs term of a subExpr node.

NAME     = s | F | _m | _c_
@@ -61,7 +61,7 @@
 LHS_ELT  = LHS_ANY | lhsStr
 LHS_ENTRY= LHS_ELT | lhs "⇐" NAME
 lhsStr   = LHS_ATOM ( "β€Ώ" LHS_ATOM )+
-lhsList  = "⟨" β‹„? ( ( LHS_ENTRY β‹„ )* LHS_ENTRY β‹„? )? "⟩"
+lhsList  = "⟨" β‹„? ( ( LHS_ENTRY β‹„ )* LHS_ENTRY β‹„? )? "⟩"
 lhsComp  = LHS_SUB | lhsStr | "(" lhs ")"
 lhs      = s | lhsComp
 
@@ -81,19 +81,19 @@

There are some extra possibilities for a header that specifies arguments. As a special rule, a monadic function header specifically can omit the function when the argument is not just a name (as this would conflict with a subject label). Additionally, an inference header doesn't affect evaluation of the function, but describes how an inferred property (Undo) should be computed. Here "˜" and "⁼" are both specific instances of the _ml token.

ARG_HEAD = LABEL
-         | headW? IMM_HEAD      "⁼"? headX
+         | headW? IMM_HEAD      "⁼"? headX
          | headW  IMM_HEAD "˜"  "⁼"  headX
-         |        FuncName "˜"? "⁼"
+         |        FuncName "˜"? "⁼"
          | lhsComp
 
-

A braced block contains bodies, which are lists of statements, separated by semicolons and possibly preceded by headers, which are separated from the body with a colon. A non-final expression can be made into a predicate by following it with the separator-like ?. Multiple bodies allow different handling for various cases, which are pattern-matched by headers. A block can have any number of bodies with headers. After these there can be bodies without headersβ€”up to one for an immediate block and up to two for a block with arguments. If a block with arguments has one such body, it's ambivalent, but two of them refer to the monadic and dyadic cases.

-
BODY     = β‹„? ( STMT β‹„ | EXPR β‹„? "?" β‹„? )* STMT β‹„?
+

A braced block contains bodies, which are lists of statements, separated by semicolons and possibly preceded by headers, which are separated from the body with a colon. A non-final expression can be made into a predicate by following it with the separator-like ?. Multiple bodies allow different handling for various cases, which are pattern-matched by headers. A block can have any number of bodies with headers. After these there can be bodies without headersβ€”up to one for an immediate block and up to two for a block with arguments. If a block with arguments has one such body, it's ambivalent, but two of them refer to the monadic and dyadic cases.

+
BODY     = β‹„? ( STMT β‹„ | EXPR β‹„? "?" β‹„? )* STMT β‹„?
 CASE     = BODY
-I_CASE   = β‹„? IMM_HEAD β‹„? ":" BODY
-A_CASE   = β‹„? ARG_HEAD β‹„? ":" BODY
+I_CASE   = β‹„? IMM_HEAD β‹„? ":" BODY
+A_CASE   = β‹„? ARG_HEAD β‹„? ":" BODY
 IMM_BLK  = "{" ( I_CASE ";" )* ( I_CASE | CASE ) "}"
-ARG_BLK  = "{" ( A_CASE ";" )* ( A_CASE | CASE ( ";" CASE )? ) "}"
-blSub    = "{" ( β‹„? s β‹„? ":" )? BODY "}"
+ARG_BLK  = "{" ( A_CASE ";" )* ( A_CASE | CASE ( ";" CASE )? ) "}"
+blSub    = "{" ( β‹„? s β‹„? ":" )? BODY "}"
 BlFunc   =           ARG_BLK
 _blMod1  = IMM_BLK | ARG_BLK
 _blMod2_ = IMM_BLK | ARG_BLK
@@ -145,10 +145,10 @@
 
𝕨( subject | nothing )?( subject | nothing )? Derv arg 𝕩{(𝕨L𝕩)C(𝕨R𝕩)}
nothing?nothing? Derv Fork { C(𝕨R𝕩)}
-

The rules for special names can be expressed in BNF by making many copies of all expression rules above. For each "level", or row in the table, a new version of every rule should be made that allows that level but not higher ones, and another version should be made that requires exactly that level. The values themselves should be included in s, F, _m, and _c_ for these copies. Then the "allowed" rules are made simply by replacing the terms they contain (excluding blSub and so on) with the same "allowed" versions, and "required" rules are constructed using both "allowed" and "required" rules. For every part of a production rule, an alternative should be created that requires the relevant name in that part while allowing it in the others. For example, ( subject | nothing )? Derv arg would be transformed to

+

The rules for special names can be expressed in BNF by making many copies of all expression rules above. For each "level", or row in the table, a new version of every rule should be made that allows that level but not higher ones, and another version should be made that requires exactly that level. The values themselves should be included in s, F, _m, and _c_ for these copies. Then the "allowed" rules are made simply by replacing the terms they contain (excluding blSub and so on) with the same "allowed" versions, and "required" rules are constructed using both "allowed" and "required" rules. For every part of a production rule, an alternative should be created that requires the relevant name in that part while allowing it in the others. For example, ( subject | nothing )? Derv arg would be transformed to

arg_req1 = subExpr_req1
          | ( subject_req1 | nothing_req1 ) Derv_allow1 arg_allow1
-         | ( subject_allow1 | nothing_allow1 )? Derv_req1 arg_allow1
-         | ( subject_allow1 | nothing_allow1 )? Derv_allow1 arg_req1
+         | ( subject_allow1 | nothing_allow1 )? Derv_req1 arg_allow1
+         | ( subject_allow1 | nothing_allow1 )? Derv_allow1 arg_req1
 

Quite tedious. The explosion of rules is partly due to the fact that the brace-typing rule falls into a weaker class of grammars than the other rules. Most of BQN is deterministic context-free but brace-typing is not, only context-free. Fortunately brace typing does not introduce the parsing difficulties that can be present in a general context-free grammar, and it can easily be performed in linear time: after scanning but before parsing, move through the source code maintaining a stack of the current top-level set of braces. Whenever a colon or special name is encountered, annotate that set of braces to indicate that it is present. When a closing brace is encountered and the top brace is popped off the stack, the type is needed if there was no colon, and can be found based on which names were present. One way to present this information to the parser is to replace the brace tokens with new tokens that indicate the type.

diff --git a/docs/spec/inferred.html b/docs/spec/inferred.html index fd247c65..11c27867 100644 --- a/docs/spec/inferred.html +++ b/docs/spec/inferred.html @@ -381,7 +381,7 @@ ⌜ -{!0<≑𝕩⋄ π”½βΌβŒœπ•©;} +{!0<≑𝕩⋄ π”½βΌβŒœπ•©;} Monadic case only diff --git a/docs/spec/literal.html b/docs/spec/literal.html index c171dbf5..0dd938cd 100644 --- a/docs/spec/literal.html +++ b/docs/spec/literal.html @@ -8,9 +8,9 @@

A literal is a single token that indicates a fixed character, number, or array. While literals indicate values of a data type, primitives indicate values of an operation type: function, 1-modifier, or 2-modifier.

Two types of literal deal with text. As the source code is considered to be a sequence of unicode code points ("characters"), and these code points are also used for BQN's character data type, the representation of a text literal is very similar to its value. In a text literal, the newline character is always represented using the ASCII line feed character, code point 10. A character literal is enclosed with single quotes ' and its value is identical to the single character between them. A string literal is enclosed in double quotes ", and any double quotes between them must come in pairs, as a lone double quote marks the end of the literal. The value of a string literal is a rank-1 array whose elements are the characters in between the enclosing quotes, after replacing each pair of double quotes with only one such quote. The null literal is the token @ and represents the null character, code point 0.

The format of a numeric literal is more complicated. From the tokenization rules, a numeric literal consists of a numeric character (one of Β―βˆžΟ€.0123456789) followed by any number of numeric or alphabetic characters. Some numeric literals are valid and indicate a number, while others are invalid and cause an error. The grammar for valid numbers is given below in a BNF variant. The alphabetic character allowed is "e" or "E", which functions as in scientific notation. Not included in this grammar are underscoresβ€”they can be placed anywhere in a number, including after the last non-underscore character, and are ignored entirely.

-
number    = "¯"? ( "∞" | mantissa ( ( "e" | "E" ) exponent )? )
-exponent  = "Β―"? digit+
-mantissa  = "Ο€" | digit+ ( "." digit+ )?
+
number    = "¯"? ( "∞" | mantissa ( ( "e" | "E" ) exponent )? )
+exponent  = "Β―"? digit+
+mantissa  = "Ο€" | digit+ ( "." digit+ )?
 digit     = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
 

The digits or arabic numerals correspond to the numbers from 0 to 9 in the conventional way (also, each corresponds to its code point value minus 48). A sequence of digits gives a natural number by evaluating it in base 10: the number is 0 for an empty sequence, and otherwise the last digit's numerical value plus ten times the number obtained from the remaining digits. The symbol ∞ indicates infinity and Ο€ indicates the ratio pi of a perfect circle's circumference to its diameter. The high minus symbol Β― indicates that the number containing it is to be negated. When an exponent is provided (with e or E), the corresponding mantissa is multiplied by ten to that power, giving the value mantissaΓ—10⋆exponent.

diff --git a/docs/spec/primitive.html b/docs/spec/primitive.html index b1c115f2..05ed2dcc 100644 --- a/docs/spec/primitive.html +++ b/docs/spec/primitive.html @@ -87,7 +87,7 @@
  • Before/Bind (⊸)
  • After/Bind (⟜)
  • -

    The somewhat complicated definition of Valences could be replaced with {𝔽𝕩;𝕨𝔾𝕩} using headers. However, reference.bqn uses a simple subset of BQN's syntax that doesn't include headers. Instead, the definition relies on the fact that 𝕨 works like Β· if no left argument is given: (1˙𝕨)-0 is 1-0 or 1 if 𝕨 is present and (1Λ™Β·)-0 otherwise: this reduces to Β·-0 or 0.

    +

    The somewhat complicated definition of Valences could be replaced with {𝔽𝕩;𝕨𝔾𝕩} using headers. However, reference.bqn uses a simple subset of BQN's syntax that doesn't include headers. Instead, the definition relies on the fact that 𝕨 works like Β· if no left argument is given: (1˙𝕨)-0 is 1-0 or 1 if 𝕨 is present and (1Λ™Β·)-0 otherwise: this reduces to Β·-0 or 0.

    Array properties

    The reference implementations extend Shape (β‰’) to atoms as well as arrays, in addition to implementing other properties. In all cases, an atom behaves as if it has shape ⟨⟩. The functions in this section never cause an error.

      -- cgit v1.2.3