diff options
| -rw-r--r-- | doc/context.md | 4 | ||||
| -rw-r--r-- | problems.md | 2 | ||||
| -rw-r--r-- | spec/grammar.md | 13 |
3 files changed, 14 insertions, 5 deletions
diff --git a/doc/context.md b/doc/context.md index 7a9e218d..0aab2b93 100644 --- a/doc/context.md +++ b/doc/context.md @@ -42,12 +42,12 @@ BQN's variables use another system. Unlike primitives, variables can be spelled The associations between spelling and syntactic role are considered part of BQN's [token formation rules](../spec/token.md). +One rule for typing is also best considered to be a pre-parsing rule like the spelling system: the role of a brace construct `{}` is determined by which special arguments it uses: it's a value if there are none, but a `π¨` or `π©` makes it at least a function, an `π½` makes it a modifier or composition, and a `πΎ` always makes it a composition. + ## BQN's grammar A formal treatment is included in [the spec](../spec/grammar.md). BQN's grammarβthe ways syntactic roles interactβfollows the original APL model (plus trains) closely, with allowances for new features like list notation. In order to keep BQN's syntax context-free, the syntactic role of any expression must be known, just like tokens. -BQN fails to be context-free in one fairly mild way: the role of a brace construct `{}` is determined by which special arguments it uses. This means the grammar is not context-free in the technical sense, but since the "context" in this case is carried between the braces and cannot be left out it's not harmful in the same way as variable values. Informally it still makes sense to call BQN "context-free". - Here is a table of the APL-derived operator and function application rules: | left | main | right | output | name diff --git a/problems.md b/problems.md index 91ef48da..a646b200 100644 --- a/problems.md +++ b/problems.md @@ -97,7 +97,7 @@ It's an awkward inconsistency. Prefixes and Suffixes have to have a nested resul A positive operand to Rank indicates the cell rank, so positive zero means to act on 0-cells. A negative operand indicates the frame length, so negative zero should act on the entire array. But it can't because it's equal to positive zero. Similar issue with Depth. Positive/negative is not really the right way to encode the frame/cell distinction, but it's convenient. Fortunately β can be used in place of negative zero, but there can still be problems if the rank is computed. ### Must read the body to find explicit definition's type -You have to scan for `ππ½ππΎ` (and so does a compiler). A little inelegant, and requires a fair amount of preprocessing to be able to say the parser is still context-free. +You have to scan for `π¨ππ©πππ½ππΎ` (and so does a compiler). A little inelegant, and requires preprocessing to be able to use a deterministic context-free parser. ### Can't take Prefixes or Suffixes on multiple axes This is a natural array operation to do, and results in an array with a joinable structure, but as Prefixes and Suffixes are monadic there's no way to specify the number of axes to use. diff --git a/spec/grammar.md b/spec/grammar.md index 7154e706..cac1db02 100644 --- a/spec/grammar.md +++ b/spec/grammar.md @@ -1,4 +1,4 @@ -BQN's grammar is given below. All terms except `BraceFunc` `_braceMod` `_braceComp_` are defined in a [BNF](https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form) variant; distinguishing these three terms is not strictly context-free and they are explained near the end. +BQN's grammar is given below. All terms except `braceVal`, `BraceFunc`, `_braceMod`, and `_braceComp_` are defined in a [BNF](https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form) variant; distinguishing these four terms is possible but difficult in BNF and they are explained near the end. The symbols `v`, `F`, `_m`, and `_c_` are identifier tokens with value, function, modifier, and composition classes respectively. Similarly, `vl`, `Fl`, `_ml`, and `_cl_` refer to value literals (numeric and character literals, or primitives) of those classes. While names in the BNF here follow the identifier naming scheme, this is informative only: syntactic classes are no longer used after parsing and cannot be inspected in a running program. @@ -69,6 +69,15 @@ In an explicit definition, the left hand side looks like application of a functi | F _c_ F valLHS = lhs? ( F | FuncLHS ) lhs -One aspect of BQN grammar is not context-free: determining the syntactic class of a brace definition. The terms `braceVal`, `BraceFunc` `_braceMod` `_braceComp_` all obey the syntax for `BRACED` given below. Then the class is determined by the presence of `π¨`, `π©`, `π`, and `π` (including alternate class spellings) at the top level, that is, not contained within further pairs of braces. If `π` is present, it is a `_braceCmp_`; otherwise, if `π` is present it is a `_braceMod`; otherwise is is a `BraceFunc` if `π¨` or `π©` are present and a `braceVal` if no special names appear. +The terms `braceVal`, `BraceFunc`, `_braceMod`, and `_braceComp_` all obey the syntax for `BRACED` given below. Then the class is determined by the presence of `π¨`, `π©`, `π`, and `π` (including alternate class spellings) at the top level, that is, not contained within further pairs of braces. If `π` is present, it is a `_braceCmp_`; otherwise, if `π` is present it is a `_braceMod`; otherwise is is a `BraceFunc` if `π¨` or `π©` are present and a `braceVal` if no special names appear. BRACED = "{" β? ( STMT β )* EXPR β? "}" + +This rule can be expressed in BNF by making many copies of all the rules above. For each "level" (no special names; arguments; `π`; `π`), a new version of every rule should be made that allows that level but not higher ones, and another version should be made that requires exactly that level. The values themselves should be included in `v`, `F`, `_m`, and `_c_` for these copies. Then the "allowed" rules are made simply by replacing the terms they contain with the same "allowed" versions, and "required" rules are constructed using both "allowed" and "required" rules. For every part of a production rule, an alternative should be created that requires the relevant name in that part while allowing it in the others. For example, `( value | nothing )? Derv arg` would be transformed to + + arg_req1 = valExpr_req1 + | ( value_req1 | nothing_req1 ) Derv_allow1 arg_allow1 + | ( value_allow1 | nothing_allow1 )? Derv_req1 arg_allow1 + | ( value_allow1 | nothing_allow1 )? Derv_allow1 arg_req1 + +Quite tedious. The explosion of rules is partly due to the fact that the brace-typing rule falls into a weaker class of grammars than the other rules. The rest of BQN is [deterministic context-free](https://en.wikipedia.org/wiki/Deterministic_context-free_grammar) while brace-typing is not, only context-free. Fortunately brace typing does not introduce the parsing difficulties that can be present in a general context-free grammar, and it can easily be performed in linear time: after [scanning](token.md) but before parsing, move through the source code maintaining a stack of the current top-level set of braces. Whenever a special name is encountered, annotate that set of braces to indicate that the name is present. When a closing brace is encountered and the top brace is popped off the stack, the type can be found based on which names were present. One way to present this information to the parser is to replace the brace tokens with new tokens that indicate the type. |
