From 229e2cd2f5c78b13c483a8559dead2c8f31d8e42 Mon Sep 17 00:00:00 2001 From: Marshall Lochbaum Date: Sat, 18 Jul 2020 18:26:52 -0400 Subject: Terminology changes: subject, 1/2-modifier, Box/Unbox to Enclose/Merge, blocks --- docs/spec/grammar.html | 122 ++++++++++++++++++++++++------------------------- 1 file changed, 61 insertions(+), 61 deletions(-) (limited to 'docs/spec/grammar.html') diff --git a/docs/spec/grammar.html b/docs/spec/grammar.html index 56803787..09d0bafa 100644 --- a/docs/spec/grammar.html +++ b/docs/spec/grammar.html @@ -1,35 +1,35 @@

BQN's grammar is given below. Terms are defined in a BNF variant. However, handling special names properly is possible but difficult in BNF, so they are explained in text along with the braced block grammar.

-

The symbols v, F, _m, and _c_ are identifier tokens with value, function, modifier, and composition classes respectively. Similarly, vl, Fl, _ml, and _cl_ refer to value literals (numeric and character literals, or primitives) of those classes. While names in the BNF here follow the identifier naming scheme, this is informative only: syntactic classes are no longer used after parsing and cannot be inspected in a running program.

+

The symbols s, F, _m, and _c_ are identifier tokens with subject, function, 1-modifier, and 2-modifier classes respectively. Similarly, sl, Fl, _ml, and _cl_ refer to literals and primitives of those classes. While names in the BNF here follow the identifier naming scheme, this is informative only: syntactic classes are no longer used after parsing and cannot be inspected in a running program.

A program is a list of statements. Almost all statements are expressions. However, explicit definitions and valueless results stemming from Β·, or 𝕨 in a monadic brace function, can be used as statements but not expressions.

PROGRAM  = β‹„? ( ( STMT β‹„ )* STMT β‹„? )?
 STMT     = EXPR | DEF | nothing
 β‹„        = ( "β‹„" | "," | \n )+
-EXPR     = valExpr | FuncExpr | _modExpr | _cmpExp_
+EXPR     = subExpr | FuncExpr | _m1Expr | _m2Expr_
 
-

Here we define the "atomic" forms of functions and operators, which are either single tokens or enclosed in paired symbols. Stranded vectors with β€Ώ, which binds more tightly than any form of execution, are also included.

-
ANY      = atom    | Func     | _mod     | _comp_
-_comp_   = _c_ | _cl_ | "(" _cmpExp_ ")" | _brComp_
-_mod     = _m  | _ml  | "(" _modExpr ")" | _brMod  
-Func     =  F  |  Fl  | "(" FuncExpr ")" |  BrFunc 
-atom     =  v  |  vl  | "(" valExpr  ")" |  brVal | list
+

Here we define the "atomic" forms of functions and modifiers, which are either single tokens or enclosed in paired symbols. Stranded vectors with β€Ώ, which binds more tightly than any form of execution, are also included.

+
ANY      = atom    | Func     | _mod1    | _mod2_
+_mod2_   = _c_ | _cl_ | "(" _m1Expr_ ")" | _brMod2_
+_mod1    = _m  | _ml  | "(" _m2Expr  ")" | _brMod1
+Func     =  F  |  Fl  | "(" FuncExpr ")" |  BrFunc
+atom     =  s  |  sl  | "(" subExpr  ")" |  brSub | list
 list     = "⟨" β‹„? ( ( EXPR β‹„ )* EXPR β‹„? )? "⟩"
-value    = atom | ANY ( "β€Ώ" ANY )+
+subject  = atom | ANY ( "β€Ώ" ANY )+
 
-

Starting at the highest-order objects, modifiers and compositions have fairly simple syntax. In most cases the syntax for ← and ↩ is the same, but only ↩ can be used for modified assignment.

+

Starting at the highest-order objects, modifiers have fairly simple syntax. In most cases the syntax for ← and ↩ is the same, but only ↩ can be used for modified assignment.

ASGN     = "←" | "↩"
-_cmpExp_ = _comp_
-         | _c_ ASGN _cmpExp_
-_modExpr = _mod
-         | _comp_ ( value | Func )    # Right partial application
-         | Operand _comp_             # Left partial application
-         | _m  ASGN _modExpr
+_m2Expr_ = _mod2_
+         | _c_ ASGN _m2Expr_
+_m1Expr  = _mod1
+         | _mod2_ ( subject | Func )  # Right partial application
+         | Operand _mod2_             # Left partial application
+         | _m  ASGN _m1Expr
 
-

Functions can be formed by fully applying operators or as trains. Operators are left-associative, so that the left operand (Operand) can include operators but the right operand (value | Func) cannot. Trains are right-associative, but bind less tightly than operators. Assignment is not allowed in the top level of a train: it must be parenthesized.

+

Functions can be formed by fully applying modifiers or as trains. modifiers are left-associative, so that the left operand (Operand) can include modifier applications but the right operand (subject | Func) cannot. Trains are right-associative, but bind less tightly than modifiers. Assignment is not allowed in the top level of a train: it must be parenthesized.

Derv     = Func
-         | Operand _mod
-         | Operand _comp_ ( value | Func )
-Operand  = value
+         | Operand _mod1
+         | Operand _mod2_ ( subject | Func )
+Operand  = subject
          | Derv
 Fork     = Derv
          | Operand Derv Fork          # 3-train
@@ -39,63 +39,63 @@
 FuncExpr = Train
          | F ASGN FuncExpr
 
-

Value expressions are complicated by the possibility of list assignment. We also define nothing-statements, which have very similar syntax to value expressions but do not permit assignment.

-
arg      = valExpr
-         | ( value | nothing )? Derv arg
+

Subject expressions are complicated by the possibility of list assignment. We also define nothing-statements, which have very similar syntax to subject expressions but do not permit assignment.

+
arg      = subExpr
+         | ( subject | nothing )? Derv arg
 nothing  = "Β·"
-         | ( value | nothing )? Derv nothing
-LHS_ANY  = lhsValue | F | _m | _c_
+         | ( subject | nothing )? Derv nothing
+LHS_ANY  = lhsSub | F | _m | _c_
 LHS_ATOM = LHS_ANY | "(" lhsStr ")"
 LHS_ELT  = LHS_ANY | lhsStr
-lhsValue = v
+lhsSub   = s
          | "⟨" β‹„? ( ( LHS_ELT β‹„ )* LHS_ELT β‹„? )? "⟩"
 lhsStr   = LHS_ATOM ( "β€Ώ" LHS_ATOM )+
-lhs      = lhsValue | lhsStr
-valExpr  = arg
-         | lhs ASGN valExpr
-         | lhs Derv "↩" valExpr       # Modified assignment
+lhs      = lhsSub | lhsStr
+subExpr  = arg
+         | lhs ASGN subExpr
+         | lhs Derv "↩" subExpr       # Modified assignment
 
-

A header looks like a name for the thing being headed, or its application to inputs (possibly twice in the case of modifiers and compositions). As with assignment, it is restricted to a simple form with no extra parentheses. The full list syntax is allowed for arguments. As a special rule, a monadic function header specifically can omit the function when the argument is not just a name (as this would conflict with a value label). The following cases define only headers with arguments, which are assumed to be special cases; there can be any number of these. Headers without arguments can only refer to the general caseβ€”note that operands are not pattern matchedβ€”so there can be at most two of these kinds of headers, indicating the monadic and dyadic cases.

-
headW    = value | "𝕨"
-headX    = value | "𝕩"
+

A header looks like a name for the thing being headed, or its application to inputs (possibly twice in the case of modifiers). As with assignment, it is restricted to a simple form with no extra parentheses. The full list syntax is allowed for arguments. As a special rule, a monadic function header specifically can omit the function when the argument is not just a name (as this would conflict with a subject label). The following cases define only headers with arguments, which are assumed to be special cases; there can be any number of these. Headers without arguments can only refer to the general caseβ€”note that operands are not pattern matchedβ€”so there can be at most two of these kinds of headers, indicating the monadic and dyadic cases.

+
headW    = subject | "𝕨"
+headX    = subject | "𝕩"
 HeadF    = F | "𝕗" | "𝔽"
 HeadG    = F | "π•˜" | "𝔾"
-ModH1    = HeadF ( _m  | "_𝕣"  )
-CmpH1    = HeadF ( _c_ | "_𝕣_" ) HeadG
+Mod1H1   = HeadF ( _m  | "_𝕣"  )
+Mod2H1   = HeadF ( _c_ | "_𝕣_" ) HeadG
 FuncHead = headW? ( F | "π•Š" ) headX
-         | vl | "(" valExpr ")" | brVal | list   # value,
-         | ANY ( "β€Ώ" ANY )+                      # but not v
-_modHead = headW? ModH1 headX
-_cmpHed_ = headW? CmpH1 headX
+         | sl | "(" subExpr ")" | brSub | list   # subject,
+         | ANY ( "β€Ώ" ANY )+                      # but not s
+_m1Head  = headW? Mod1H1 headX
+_m2Head_ = headW? Mod2H1 headX
 
-

A braced block contains bodies, which are lists of statements, separated by semicolons and possibly preceded by headers, which are separated from the body with a colon. Multiple bodies allow different handling for various cases, which are pattern-matched by headers. For a value block there are no inputs, so there can only be one possible case and one body. Functions and operators allow any number of "matched" bodies, with headers that have arguments, followed by at most two "main" bodies with either no headers or headers without arguments. If there is one main body, it is ambivalent, but two main bodies refer to the monadic and dyadic cases.

+

A braced block contains bodies, which are lists of statements, separated by semicolons and possibly preceded by headers, which are separated from the body with a colon. Multiple bodies allow different handling for various cases, which are pattern-matched by headers. For an immediate block there are no inputs, so there can only be one possible case and one body. Functions and modifiers allow any number of "matched" bodies, with headers that have arguments, followed by at most two "main" bodies with either no headers or headers without arguments. If there is one main body, it is ambivalent, but two main bodies refer to the monadic and dyadic cases.

BODY     = β‹„? ( STMT β‹„ )* EXPR β‹„?
 FCase    = β‹„? FuncHead ":" BODY
-_mCase   = β‹„? _modHead ":" BODY
-_cCase_  = β‹„? _cmpHed_ ":" BODY
-FMain    = ( β‹„?    F            ":" )? BODY
-_mMain   = ( β‹„? ( _m  | ModH1 ) ":" )? BODY
-_cMain_  = ( β‹„? ( _c_ | CmpH1 ) ":" )? BODY
-brVal    = "{" ( β‹„? v ":" )? BODY "}"
+_mCase   = β‹„? _m1Head  ":" BODY
+_cCase_  = β‹„? _m2Head_ ":" BODY
+FMain    = ( β‹„?    F             ":" )? BODY
+_mMain   = ( β‹„? ( _m  | Mod1H1 ) ":" )? BODY
+_cMain_  = ( β‹„? ( _c_ | Mod2H1 ) ":" )? BODY
+brSub    = "{" ( β‹„? s ":" )? BODY "}"
 BrFunc   = "{" (  FCase  ";" )* (  FCase  |  FMain ( ";"  FMain )? ) "}"
-_brMod   = "{" ( _mCase  ";" )* ( _mCase  | _mMain ( ";" _mMain )? ) "}"
-_brComp_ = "{" ( _cCase_ ";" )* ( _cCase_ | _cMan_ ( ";" _cMan_ )? ) "}"
+_brMod1  = "{" ( _mCase  ";" )* ( _mCase  | _mMain ( ";" _mMain )? ) "}"
+_brMod2_ = "{" ( _cCase_ ";" )* ( _cCase_ | _cMan_ ( ";" _cMan_ )? ) "}"
 
-

Two additional rules apply to blocks, based on the special name associations in the table below. First, each block allows the special names in its column to be used as the given token types within BODY terms (not headers). Except for the spaces labelled "None", each column is cumulative and a given entry also includes all the entries above it. Second, for BrFunc, _brMod, and _brComp_ terms, if no header is given, then at least one BODY term in it must contain one of the names on, and not above, the corresponding row. Otherwise the syntax would be ambiguous, since for example a simple "{" BODY "}" sequence could have any type.

+

Two additional rules apply to blocks, based on the special name associations in the table below. First, each block allows the special names in its column to be used as the given token types within BODY terms (not headers). Except for the spaces labelled "None", each column is cumulative and a given entry also includes all the entries above it. Second, for BrFunc, _brMod1, and _brMod2_ terms, if no header is given, then at least one BODY term in it must contain one of the names on, and not above, the corresponding row. Otherwise the syntax would be ambiguous, since for example a simple "{" BODY "}" sequence could have any type.

- + - + - + @@ -111,28 +111,28 @@ - + - + - + - +
Termvs F _m_c__c_ other
brVal, PROGRAMbrSub, PROGRAM None None None";"
_brMod_brMod1 𝕗𝕣 𝔽_𝕣_𝕣
_brComp__brMod2_ π•˜ 𝔾 None_𝕣__𝕣_
-

The rules for special name can be expressed in BNF by making many copies of all expression rules above. For each "level", or row in the table, a new version of every rule should be made that allows that level but not higher ones, and another version should be made that requires exactly that level. The values themselves should be included in v, F, _m, and _c_ for these copies. Then the "allowed" rules are made simply by replacing the terms they contain (excluding brVal and so on) with the same "allowed" versions, and "required" rules are constructed using both "allowed" and "required" rules. For every part of a production rule, an alternative should be created that requires the relevant name in that part while allowing it in the others. For example, ( value | nothing )? Derv arg would be transformed to

-
arg_req1 = valExpr_req1
-         | ( value_req1 | nothing_req1 ) Derv_allow1 arg_allow1
-         | ( value_allow1 | nothing_allow1 )? Derv_req1 arg_allow1
-         | ( value_allow1 | nothing_allow1 )? Derv_allow1 arg_req1
+

The rules for special name can be expressed in BNF by making many copies of all expression rules above. For each "level", or row in the table, a new version of every rule should be made that allows that level but not higher ones, and another version should be made that requires exactly that level. The values themselves should be included in s, F, _m, and _c_ for these copies. Then the "allowed" rules are made simply by replacing the terms they contain (excluding brSub and so on) with the same "allowed" versions, and "required" rules are constructed using both "allowed" and "required" rules. For every part of a production rule, an alternative should be created that requires the relevant name in that part while allowing it in the others. For example, ( subject | nothing )? Derv arg would be transformed to

+
arg_req1 = subExpr_req1
+         | ( subject_req1 | nothing_req1 ) Derv_allow1 arg_allow1
+         | ( subject_allow1 | nothing_allow1 )? Derv_req1 arg_allow1
+         | ( subject_allow1 | nothing_allow1 )? Derv_allow1 arg_req1
 

Quite tedious. The explosion of rules is partly due to the fact that the brace-typing rule falls into a weaker class of grammars than the other rules. Most of BQN is deterministic context-free but brace-typing is not, only context-free. Fortunately brace typing does not introduce the parsing difficulties that can be present in a general context-free grammar, and it can easily be performed in linear time: after scanning but before parsing, move through the source code maintaining a stack of the current top-level set of braces. Whenever a colon or special name is encountered, annotate that set of braces to indicate that it is present. When a closing brace is encountered and the top brace is popped off the stack, the type is needed if there was no colon, and can be found based on which names were present. One way to present this information to the parser is to replace the brace tokens with new tokens that indicate the type.

-- cgit v1.2.3