From 2d6e5a6ad2fcc77289fb0a8fd34b0bf4cdbd035d Mon Sep 17 00:00:00 2001 From: Marshall Lochbaum Date: Fri, 19 Jun 2020 13:06:48 -0400 Subject: Specify grammar --- spec/README.md | 2 +- spec/grammar.md | 61 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 62 insertions(+), 1 deletion(-) create mode 100644 spec/grammar.md diff --git a/spec/README.md b/spec/README.md index eea0b868..93891915 100644 --- a/spec/README.md +++ b/spec/README.md @@ -5,7 +5,7 @@ This directory gives a (currently incomplete) specification for BQN. The specifi The following aspects define BQN and are or will be specified: - Token formation - Numeric and character literals -- Syntactic class and grammar +- [Grammar](grammar.md) - Array model and notation - Evaluation semantics - Built-in operations ([reference implementations](reference.bqn)) diff --git a/spec/grammar.md b/spec/grammar.md new file mode 100644 index 00000000..8135f70a --- /dev/null +++ b/spec/grammar.md @@ -0,0 +1,61 @@ +BQN's grammar is given below. All terms except `BraceFunc` `_braceMod` `_braceComp_` are defined in a [BNF](https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form) variant; distinguishing these three terms is not strictly context-free and they are explained near the end. + +The symbols `v`, `F`, `_m`, and `_c_` are identifier tokens with value, function, modifier, and composition classes respectively. Similarly, `vl`, `Fl`, `_ml`, and `_cl_` refer to value literals (numeric and character literals, or primitives) of those classes. While names in the BNF here follow the identifier naming scheme, this is informative only: syntactic classes are no longer used after parsing and cannot be inspected in a running program. + +A program is a list of statements. Almost all statements are expressions. However, valueless results stemming from "ยท", or "๐•จ" in a monadic brace function, can be used as statements but not expressions. + + PROGRAM = โ‹„? ( ( STMT โ‹„ )* STMT โ‹„? )? + STMT = EXPR | nothing + โ‹„ = ( "โ‹„" | "," | \n )+ + EXPR = valExpr | FuncExpr | _modExpr | _cmpExp_ + +Here we define the "atomic" forms of functions and operators, which are either single tokens or enclosed in paired symbols. Stranded vectors with `โ€ฟ`, which binds more tightly than any form of execution, are also included. + + ANY = atom | Func | _mod | _comp_ + _comp_ = _c_ | _cl_ | _braceComp_ | "(" _cmpExp_ ")" + _mod = _m | _ml | _braceMod | "(" _modExpr ")" + Func = F | Fl | BraceFunc | "(" FuncExpr ")" + atom = v | vl | list | "(" valExpr ")" + list = "โŸจ" โ‹„? ( ( EXPR โ‹„ )* EXPR โ‹„? )? "โŸฉ" + value = atom | ANY ( "โ€ฟ" ANY )+ + +Starting at the highest-order objects, modifiers and compositions have fairly simple syntax. In most cases the syntax for `โ†` and `โ†ฉ` is the same, but some special forms are defined only for one of them. + + ASGN = "โ†" | "โ†ฉ" + _cmpExp_ = _comp_ + | _c_ ASGN _cmpExp_ + _modExpr = _mod + | _comp_ ( value | Func ) โ Right partial application + | Operand _comp_ โ Left partial application + | _m ASGN _modExpr + +Functions can be formed by fully applying operators or as trains. Operators are left-associative, so that the left operand (`Operand`) can include operators but the right operand (`value | Func`) cannot. Trains are right-associative, but bind less tightly than operators. Assignment is not allowed in the top level of a train: it must be parenthesized. + + Derv = Func + | Operand _mod + | Operand _comp_ ( value | Func ) + Operand = value + | Derv + Fork = Func + | Operand Func Fork โ 3-train + FuncExpr = Fork + | Func Fork โ 2-train + | F ASGN FuncExpr + +Value expressions are complicated by the possibility of list assignment. We also define nothing-statements, which have very similar syntax to value expressions but do not permit assignment. + + arg = valExpr + | ( value | nothing )? Derv arg + nothing = "ยท" + | ( value | nothing )? Derv nothing + LHS_ATOM = v | F | _m | _c_ | "(" lhs ")" + lhs = v + | "โŸจ" โ‹„? ( ( lhs โ‹„ )* lhs โ‹„? )? "โŸฉ" + | LHS_ATOM ( "โ€ฟ" LHS_ATOM )+ + valExpr = arg + | v ASGN valExpr + | v Derv "โ†ฉ" valExpr โ Modified assignment + +One aspect of BQN grammar is not context-free: determining the syntactic class of a brace definition. The terms `BraceFunc` `_braceMod` `_braceComp_` all obey the syntax for `BRACED` given below. Then the class is determined by the presence of `๐•—` and `๐•˜` (including alternate class spellings) at the top level, that is, not contained within further pairs of braces. If `๐•˜` is present, it is a `_braceCmp_`; otherwise, if `๐•—` is present it it a `_braceMod` and otherwise a `BraceFunc`. The presence of `๐•จ` or `๐•ฉ` has an effect on the evaluation of modifiers and combinators but not their syntactic class. + + BRACED = "{" โ‹„? ( STMT โ‹„ )* EXPR โ‹„? "}" -- cgit v1.2.3