diff options
| author | Marshall Lochbaum <mwlochbaum@gmail.com> | 2020-06-19 13:06:48 -0400 |
|---|---|---|
| committer | Marshall Lochbaum <mwlochbaum@gmail.com> | 2020-06-19 13:06:48 -0400 |
| commit | 2d6e5a6ad2fcc77289fb0a8fd34b0bf4cdbd035d (patch) | |
| tree | c0cb1396aea105b54f990e80aaadfcc0278fd2bc /spec/grammar.md | |
| parent | 6adf8fca28b3c7ebdad7153b019d7013cc3ceff7 (diff) | |
Specify grammar
Diffstat (limited to 'spec/grammar.md')
| -rw-r--r-- | spec/grammar.md | 61 |
1 files changed, 61 insertions, 0 deletions
diff --git a/spec/grammar.md b/spec/grammar.md new file mode 100644 index 00000000..8135f70a --- /dev/null +++ b/spec/grammar.md @@ -0,0 +1,61 @@ +BQN's grammar is given below. All terms except `BraceFunc` `_braceMod` `_braceComp_` are defined in a [BNF](https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form) variant; distinguishing these three terms is not strictly context-free and they are explained near the end. + +The symbols `v`, `F`, `_m`, and `_c_` are identifier tokens with value, function, modifier, and composition classes respectively. Similarly, `vl`, `Fl`, `_ml`, and `_cl_` refer to value literals (numeric and character literals, or primitives) of those classes. While names in the BNF here follow the identifier naming scheme, this is informative only: syntactic classes are no longer used after parsing and cannot be inspected in a running program. + +A program is a list of statements. Almost all statements are expressions. However, valueless results stemming from "ยท", or "๐จ" in a monadic brace function, can be used as statements but not expressions. + + PROGRAM = โ? ( ( STMT โ )* STMT โ? )? + STMT = EXPR | nothing + โ = ( "โ" | "," | \n )+ + EXPR = valExpr | FuncExpr | _modExpr | _cmpExp_ + +Here we define the "atomic" forms of functions and operators, which are either single tokens or enclosed in paired symbols. Stranded vectors with `โฟ`, which binds more tightly than any form of execution, are also included. + + ANY = atom | Func | _mod | _comp_ + _comp_ = _c_ | _cl_ | _braceComp_ | "(" _cmpExp_ ")" + _mod = _m | _ml | _braceMod | "(" _modExpr ")" + Func = F | Fl | BraceFunc | "(" FuncExpr ")" + atom = v | vl | list | "(" valExpr ")" + list = "โจ" โ? ( ( EXPR โ )* EXPR โ? )? "โฉ" + value = atom | ANY ( "โฟ" ANY )+ + +Starting at the highest-order objects, modifiers and compositions have fairly simple syntax. In most cases the syntax for `โ` and `โฉ` is the same, but some special forms are defined only for one of them. + + ASGN = "โ" | "โฉ" + _cmpExp_ = _comp_ + | _c_ ASGN _cmpExp_ + _modExpr = _mod + | _comp_ ( value | Func ) โ Right partial application + | Operand _comp_ โ Left partial application + | _m ASGN _modExpr + +Functions can be formed by fully applying operators or as trains. Operators are left-associative, so that the left operand (`Operand`) can include operators but the right operand (`value | Func`) cannot. Trains are right-associative, but bind less tightly than operators. Assignment is not allowed in the top level of a train: it must be parenthesized. + + Derv = Func + | Operand _mod + | Operand _comp_ ( value | Func ) + Operand = value + | Derv + Fork = Func + | Operand Func Fork โ 3-train + FuncExpr = Fork + | Func Fork โ 2-train + | F ASGN FuncExpr + +Value expressions are complicated by the possibility of list assignment. We also define nothing-statements, which have very similar syntax to value expressions but do not permit assignment. + + arg = valExpr + | ( value | nothing )? Derv arg + nothing = "ยท" + | ( value | nothing )? Derv nothing + LHS_ATOM = v | F | _m | _c_ | "(" lhs ")" + lhs = v + | "โจ" โ? ( ( lhs โ )* lhs โ? )? "โฉ" + | LHS_ATOM ( "โฟ" LHS_ATOM )+ + valExpr = arg + | v ASGN valExpr + | v Derv "โฉ" valExpr โ Modified assignment + +One aspect of BQN grammar is not context-free: determining the syntactic class of a brace definition. The terms `BraceFunc` `_braceMod` `_braceComp_` all obey the syntax for `BRACED` given below. Then the class is determined by the presence of `๐` and `๐` (including alternate class spellings) at the top level, that is, not contained within further pairs of braces. If `๐` is present, it is a `_braceCmp_`; otherwise, if `๐` is present it it a `_braceMod` and otherwise a `BraceFunc`. The presence of `๐จ` or `๐ฉ` has an effect on the evaluation of modifiers and combinators but not their syntactic class. + + BRACED = "{" โ? ( STMT โ )* EXPR โ? "}" |
