From 6553132505093fce4b7a3b2c95ad7d945d97e168 Mon Sep 17 00:00:00 2001 From: Marshall Lochbaum Date: Sun, 17 Apr 2022 17:14:50 -0400 Subject: Style fixes, and remove last uses of brace to mean block --- docs/spec/grammar.html | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) (limited to 'docs/spec/grammar.html') diff --git a/docs/spec/grammar.html b/docs/spec/grammar.html index 3f9bce5f..19425e06 100644 --- a/docs/spec/grammar.html +++ b/docs/spec/grammar.html @@ -5,9 +5,9 @@

Specification: BQN grammar

-

BQN's grammar is given below. Terms are defined in a BNF variant. However, handling special names properly is possible but difficult in BNF, so they are explained in text along with the braced block grammar.

+

BQN's grammar is given below. Terms are defined in a BNF variant. However, handling special names properly is possible but difficult in BNF, so they are explained in text along with the block grammar.

The symbols s, F, _m, and _c_ are identifier tokens with subject, function, 1-modifier, and 2-modifier classes respectively. Similarly, sl, Fl, _ml, and _cl_ refer to literals and primitives of those classes. While names in the BNF here follow the identifier naming scheme, this is informative only: syntactic roles are no longer used after parsing and cannot be inspected in a running program.

-

A program is a list of statements. Almost all statements are expressions. Namespace export statements, and valueless results stemming from ·, or 𝕨 in a monadic brace function, can be used as statements but not expressions.

+

A program is a list of statements. Almost all statements are expressions. Namespace export statements, and valueless results stemming from ·, or 𝕨 in a monadic block function, can be used as statements but not expressions.

PROGRAM  = ? ( STMT  )* STMT ?
 STMT     = EXPR | nothing | EXPORT
         = ( "⋄" | "," | \n )+
@@ -86,7 +86,7 @@
          |        FuncName "˜"? "⁼"
          | lhsComp
 
-

A braced block contains bodies, which are lists of statements, separated by semicolons and possibly preceded by headers, which are separated from the body with a colon. A non-final expression can be made into a predicate by following it with the separator-like ?. Multiple bodies allow different handling for various cases, which are pattern-matched by headers. A block can have any number of bodies with headers. After these there can be bodies without headers—up to one for an immediate block and up to two for a block with arguments. If a block with arguments has one such body, it's ambivalent, but two of them refer to the monadic and dyadic cases.

+

A block is written with braces. It contains bodies, which are lists of statements, separated by semicolons. Multiple bodies can handle different cases, as determined by headers and predicates. A header is written before its body with a separating colon, and an expression other than the last in a body can be made into a predicate by following it with the separator-like ?. A block can have any number of bodies with headers. After these there can be bodies without headers—up to one for an immediate block and up to two for a block with arguments. If a block with arguments has one such body, it's ambivalent, but two of them refer to the monadic and dyadic cases.

BODY     = ? ( STMT  | EXPR ? "?" ? )* STMT ?
 CASE     = BODY
 I_CASE   = ? IMM_HEAD ? ":" BODY
@@ -98,7 +98,7 @@
 _blMod1  = IMM_BLK | ARG_BLK
 _blMod2_ = IMM_BLK | ARG_BLK
 
-

Three additional rules apply to blocks, allowing the ambiguous grammar above to be disambiguated. They are shown in the table below. First, each block type allows the special names in its row to be used as the given token types within BODY terms (not headers). Except for the spaces labelled "None", each of these four columns is cumulative, so that a given entry also includes all the entries above it. Second, a block can't contain one of the tokens from the "label" column of a different row. Third, each BrFunc, _brMod1, and _brMod2_ term must contain one of the names on, and not above, the corresponding row (including the "label" column).

+

Three additional rules apply to blocks, allowing the ambiguous grammar above to be disambiguated. They are shown in the table below. First, each block type allows the special names in its row to be used as the given token types within BODY terms (not headers). Except for the spaces labelled "None", each of these four columns is cumulative, so that a given entry also includes all the entries above it. Second, a block can't contain one of the tokens from the "label" column of a different row. Third, each BlFunc, _blMod1, and _blMod2_ term must contain one of the names on, and not above, the corresponding row (including the "label" column).

@@ -151,4 +151,4 @@ |(subject_allow1|nothing_allow1)?Derv_req1arg_allow1|(subject_allow1|nothing_allow1)?Derv_allow1arg_req1 -

Quite tedious. The explosion of rules is partly due to the fact that the brace-typing rule falls into a weaker class of grammars than the other rules. Most of BQN is deterministic context-free but brace-typing is not, only context-free. Fortunately brace typing does not introduce the parsing difficulties that can be present in a general context-free grammar, and it can easily be performed in linear time: after scanning but before parsing, move through the source code maintaining a stack of the current top-level set of braces. Whenever a colon or special name is encountered, annotate that set of braces to indicate that it is present. When a closing brace is encountered and the top brace is popped off the stack, the type is needed if there was no colon, and can be found based on which names were present. One way to present this information to the parser is to replace the brace tokens with new tokens that indicate the type.

+

Quite tedious. The explosion of rules is partly due to the fact that the block-typing rule falls into a weaker class of grammars than the other rules. Most of BQN is deterministic context-free but block-typing is not, only context-free. Fortunately block typing does not introduce the parsing difficulties that can be present in a general context-free grammar, and it can easily be performed in linear time: after scanning but before parsing, move through the source code maintaining a stack of the current top-level set of braces. Whenever a colon or special name is encountered, annotate that set of braces to indicate that it is present. When a closing brace is encountered and the top brace is popped off the stack, the type is needed if there was no colon, and can be found based on which names were present. One way to present this information to the parser is to replace the brace tokens with new tokens that indicate the type.

-- cgit v1.2.3