aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--spec/README.md2
-rw-r--r--spec/grammar.md61
2 files changed, 62 insertions, 1 deletions
diff --git a/spec/README.md b/spec/README.md
index eea0b868..93891915 100644
--- a/spec/README.md
+++ b/spec/README.md
@@ -5,7 +5,7 @@ This directory gives a (currently incomplete) specification for BQN. The specifi
The following aspects define BQN and are or will be specified:
- Token formation
- Numeric and character literals
-- Syntactic class and grammar
+- [Grammar](grammar.md)
- Array model and notation
- Evaluation semantics
- Built-in operations ([reference implementations](reference.bqn))
diff --git a/spec/grammar.md b/spec/grammar.md
new file mode 100644
index 00000000..8135f70a
--- /dev/null
+++ b/spec/grammar.md
@@ -0,0 +1,61 @@
+BQN's grammar is given below. All terms except `BraceFunc` `_braceMod` `_braceComp_` are defined in a [BNF](https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form) variant; distinguishing these three terms is not strictly context-free and they are explained near the end.
+
+The symbols `v`, `F`, `_m`, and `_c_` are identifier tokens with value, function, modifier, and composition classes respectively. Similarly, `vl`, `Fl`, `_ml`, and `_cl_` refer to value literals (numeric and character literals, or primitives) of those classes. While names in the BNF here follow the identifier naming scheme, this is informative only: syntactic classes are no longer used after parsing and cannot be inspected in a running program.
+
+A program is a list of statements. Almost all statements are expressions. However, valueless results stemming from "ยท", or "๐•จ" in a monadic brace function, can be used as statements but not expressions.
+
+ PROGRAM = โ‹„? ( ( STMT โ‹„ )* STMT โ‹„? )?
+ STMT = EXPR | nothing
+ โ‹„ = ( "โ‹„" | "," | \n )+
+ EXPR = valExpr | FuncExpr | _modExpr | _cmpExp_
+
+Here we define the "atomic" forms of functions and operators, which are either single tokens or enclosed in paired symbols. Stranded vectors with `โ€ฟ`, which binds more tightly than any form of execution, are also included.
+
+ ANY = atom | Func | _mod | _comp_
+ _comp_ = _c_ | _cl_ | _braceComp_ | "(" _cmpExp_ ")"
+ _mod = _m | _ml | _braceMod | "(" _modExpr ")"
+ Func = F | Fl | BraceFunc | "(" FuncExpr ")"
+ atom = v | vl | list | "(" valExpr ")"
+ list = "โŸจ" โ‹„? ( ( EXPR โ‹„ )* EXPR โ‹„? )? "โŸฉ"
+ value = atom | ANY ( "โ€ฟ" ANY )+
+
+Starting at the highest-order objects, modifiers and compositions have fairly simple syntax. In most cases the syntax for `โ†` and `โ†ฉ` is the same, but some special forms are defined only for one of them.
+
+ ASGN = "โ†" | "โ†ฉ"
+ _cmpExp_ = _comp_
+ | _c_ ASGN _cmpExp_
+ _modExpr = _mod
+ | _comp_ ( value | Func ) โ Right partial application
+ | Operand _comp_ โ Left partial application
+ | _m ASGN _modExpr
+
+Functions can be formed by fully applying operators or as trains. Operators are left-associative, so that the left operand (`Operand`) can include operators but the right operand (`value | Func`) cannot. Trains are right-associative, but bind less tightly than operators. Assignment is not allowed in the top level of a train: it must be parenthesized.
+
+ Derv = Func
+ | Operand _mod
+ | Operand _comp_ ( value | Func )
+ Operand = value
+ | Derv
+ Fork = Func
+ | Operand Func Fork โ 3-train
+ FuncExpr = Fork
+ | Func Fork โ 2-train
+ | F ASGN FuncExpr
+
+Value expressions are complicated by the possibility of list assignment. We also define nothing-statements, which have very similar syntax to value expressions but do not permit assignment.
+
+ arg = valExpr
+ | ( value | nothing )? Derv arg
+ nothing = "ยท"
+ | ( value | nothing )? Derv nothing
+ LHS_ATOM = v | F | _m | _c_ | "(" lhs ")"
+ lhs = v
+ | "โŸจ" โ‹„? ( ( lhs โ‹„ )* lhs โ‹„? )? "โŸฉ"
+ | LHS_ATOM ( "โ€ฟ" LHS_ATOM )+
+ valExpr = arg
+ | v ASGN valExpr
+ | v Derv "โ†ฉ" valExpr โ Modified assignment
+
+One aspect of BQN grammar is not context-free: determining the syntactic class of a brace definition. The terms `BraceFunc` `_braceMod` `_braceComp_` all obey the syntax for `BRACED` given below. Then the class is determined by the presence of `๐•—` and `๐•˜` (including alternate class spellings) at the top level, that is, not contained within further pairs of braces. If `๐•˜` is present, it is a `_braceCmp_`; otherwise, if `๐•—` is present it it a `_braceMod` and otherwise a `BraceFunc`. The presence of `๐•จ` or `๐•ฉ` has an effect on the evaluation of modifiers and combinators but not their syntactic class.
+
+ BRACED = "{" โ‹„? ( STMT โ‹„ )* EXPR โ‹„? "}"