diff options
| author | Marshall Lochbaum <mwlochbaum@gmail.com> | 2020-07-17 21:25:22 -0400 |
|---|---|---|
| committer | Marshall Lochbaum <mwlochbaum@gmail.com> | 2020-07-17 21:25:22 -0400 |
| commit | 15908ba604c2a27b84a30d7ce91ceb7a8c1064aa (patch) | |
| tree | d7a456e917af68385310043041fdb303259476d7 /docs/spec | |
| parent | 7d3af92a20237eed83b943fa74aa2a086e657658 (diff) | |
Mirror repository tree in docs/ and add html spec documents
Diffstat (limited to 'docs/spec')
| -rw-r--r-- | docs/spec/README.html | 14 | ||||
| -rw-r--r-- | docs/spec/evaluate.html | 94 | ||||
| -rw-r--r-- | docs/spec/grammar.html | 138 | ||||
| -rw-r--r-- | docs/spec/literal.html | 14 | ||||
| -rw-r--r-- | docs/spec/token.html | 40 | ||||
| -rw-r--r-- | docs/spec/types.html | 20 |
6 files changed, 320 insertions, 0 deletions
diff --git a/docs/spec/README.html b/docs/spec/README.html new file mode 100644 index 00000000..6a4ba53f --- /dev/null +++ b/docs/spec/README.html @@ -0,0 +1,14 @@ +<head><link href="../style.css" rel="stylesheet"/></head> +<h1 id="bqn-specification">BQN specification</h1> +<p>This directory gives a (currently incomplete) specification for BQN. The specification differs from the documentation in <code><span class='Value'>doc</span><span class='Function'>/</span></code> in that its purpose is only to describe the exact details of BQN's operation in the most quickly accessible way, rather than to explain the core ideas of BQN functionality and how it might be used. Since it is easier to specify than to document, the specification is currently more complete than the documentation; for example, it includes nearly all primitives.</p> +<p>The following aspects define BQN and are or will be specified:</p> +<ul> +<li><a href="types.html">Types</a></li> +<li><a href="token.html">Token formation</a></li> +<li><a href="literal.html">Literals</a></li> +<li><a href="grammar.html">Grammar</a></li> +<li>Scoping</li> +<li><a href="evaluate.html">Evaluation semantics</a></li> +<li>Primitives (<a href="reference.bqn">reference implementations</a>)</li> +</ul> + diff --git a/docs/spec/evaluate.html b/docs/spec/evaluate.html new file mode 100644 index 00000000..6552f80e --- /dev/null +++ b/docs/spec/evaluate.html @@ -0,0 +1,94 @@ +<head><link href="../style.css" rel="stylesheet"/></head> +<p>This page describes the semantics of the code constructs whose grammar is given in <a href="grammar.html">grammar.md</a>. The formation rules there are not named, and here they are identified by either the name of the term or by copying the rule entirely if there are several alternative productions.</p> +<p>Here we assume that the referent of each identifier, or equivalently the connections between identifiers, have been identified according to the <a href="scope.html">scoping rules</a>.</p> +<h3 id="programs-and-blocks">Programs and blocks</h3> +<p>The result of parsing a valid BQN program is a <code><span class='Function'>PROGRAM</span></code>, and the program is run by evaluating this term.</p> +<p>A <code><span class='Function'>PROGRAM</span></code> or <code><span class='Function'>BODY</span></code> is a list of <code><span class='Function'>STMT</span></code>s (for <code><span class='Function'>BODY</span></code>, the last must be an <code><span class='Function'>EXPR</span></code>, a particular kind of <code><span class='Function'>STMT</span></code>), which are evaluated in program order. The statement <code><span class='Value'>nothing</span></code> does nothing when evaluated, while <code><span class='Function'>EXPR</span></code> evaluates some APL code and possibly assigns the results, as described below.</p> +<p>A block consists of several <code><span class='Function'>BODY</span></code> terms, some of which may have an accompanying header describing accepted inputs and how they are processed. A value block <code><span class='Value'>brVal</span></code> can only have one <code><span class='Function'>BODY</span></code>, and is evaluated by evaluating the code in it. Other types of blocks do not evaluate any <code><span class='Function'>BODY</span></code> immediately, but instead return a function, modifier, or operator that obtains its result by evaluating a particular <code><span class='Function'>BODY</span></code>. The <code><span class='Function'>BODY</span></code> is identified and evaluated once the block has received enough inputs (operands or arguments), which for modifiers and compositions can take one or two calls: if two calls are required, then on the first call the operands are simply stored and no code is evaluated yet. Two calls are required if there is more than one <code><span class='Function'>BODY</span></code> term, if the <code><span class='Function'>BODY</span></code> contains the special names <code><span class='Value'>π¨π©π€</span><span class='Function'>πππ</span></code>, or if its header specifies arguments (the header-body is a <code><span class='Modifier'>_mCase</span></code> or <code><span class='Composition'>_cCase_</span></code>). Otherwise only one is required.</p> +<p>To evaluate a block when enough inputs have been received, first the correct case must be identified. To do this, first each special case (<code><span class='Function'>FCase</span></code>, <code><span class='Modifier'>_mCase</span></code>, or <code><span class='Composition'>_cCase_</span></code>) is checked in order to see if its arguments are strucurally compatible with the given arguments. That is, is <code><span class='Value'>headW</span></code> is a <code><span class='Value'>value</span></code>, there must be a left argument matching that structure, and if <code><span class='Value'>headX</span></code> is a <code><span class='Value'>value</span></code>, the right argument must match that structure. This means that <code><span class='Value'>π¨</span></code> not only matches any left argument but also no argument. The test for compatibility is the same as for multiple assignment described below, except that the header may contain constants, which must match the corresponding part of the given argument.If no special case matches, then an appropriate general case (<code><span class='Function'>FMain</span></code>, <code><span class='Modifier'>_mMain</span></code>, or <code><span class='Composition'>_cMain_</span></code>) is used: if there are two, the first is used with no left argument and the second with a left argument; if there are one, it is always used, and if there are none, an error results.</p> +<p>The only remaining step before evaluating the <code><span class='Function'>BODY</span></code> is to bind the inputs and other names. Special names are always bound when applicable: <code><span class='Value'>π¨π©π€</span></code> if arguments are used, <code><span class='Value'>π¨</span></code> if there is a left argument, <code><span class='Value'>ππ</span></code> if operands are used, and <code><span class='Composition'>_</span><span class='Value'>π£</span></code> and <code><span class='Composition'>_</span><span class='Value'>π£</span><span class='Composition'>_</span></code> for modifiers and combinators, respectively. Any names in the header are also bound, allowing multiple assignment for arguments.</p> +<p>If there is no left argument, but the <code><span class='Function'>BODY</span></code> contains <code><span class='Value'>π¨</span></code> at the top level, then it is conceptually re-parsed with <code><span class='Value'>π¨</span></code> replaced by <code><span class='Nothing'>Β·</span></code> to give a monadic version before application. As the only effect when this re-parsed form is valid is to change some instances of <code><span class='Value'>arg</span></code> to <code><span class='Value'>nothing</span></code>, this can be achieved efficiently by annotating parts of the AST that depend on <code><span class='Value'>π¨</span></code> as conditionally-nothing. However, it also causes an error if <code><span class='Value'>π¨</span></code> is used as an operand or list element, where <code><span class='Value'>nothing</span></code> is not allowed by the grammar.</p> +<h3 id="assignment">Assignment</h3> +<p>An <em>assignment</em> is one of the four rules containing <code><span class='Function'>ASGN</span></code>. It is evaluated by first evaluating the right-hand-side <code><span class='Value'>valExpr</span></code>, <code><span class='Function'>FuncExpr</span></code>, <code><span class='Modifier'>_modExpr</span></code>, or <code><span class='Composition'>_cmpExp_</span></code> expression, and then storing the result in the left-hand-side identifier or identifiers. The result of the assignment expression is the result of its right-hand side. Except for values, only a lone identifier is allowed on the left-hand side and storage is obvious. For values, <em>multiple assignment</em> with a list left-hand side is also allowed. Multiple assignment is performed recursively by assigning right-hand-side values to the left-hand-side targets, with single-identifier (<code><span class='Value'>v</span></code>) assignment as the base case. When matching the right-hand side to a list left-hand side, the left hand side is treated as a list of <code><span class='Value'>lhs</span></code> targets. The evaluated right-hand side must be a list (rank-1 array) of the same length, and is matched to these targets element-wise.</p> +<p><em>Modified assignment</em> is the value assignment rule <code><span class='Value'>lhs</span> <span class='Function'>Derv</span> <span class='String'>"β©"</span> <span class='Value'>valExpr</span></code>. In this case, <code><span class='Value'>lhs</span></code> should be evaluated as if it were a <code><span class='Value'>valExpr</span></code> (the syntax is a subset of <code><span class='Value'>valExpr</span></code>), and the result of the function application <code><span class='Value'>lhs</span> <span class='Function'>Derv</span> <span class='Value'>valExpr</span></code> should be assigned to <code><span class='Value'>lhs</span></code>, and is also the result of the modified assignment expression.</p> +<h3 id="expressions">Expressions</h3> +<p>We now give rules for evaluating an <code><span class='Value'>atom</span></code>, <code><span class='Function'>Func</span></code>, <code><span class='Modifier'>_mod</span></code> or <code><span class='Composition'>_comp_</span></code> expression (the possible options for <code><span class='Function'>ANY</span></code>). A literal <code><span class='Value'>vl</span></code>, <code><span class='Function'>Fl</span></code>, <code><span class='Modifier'>_ml</span></code>, or <code><span class='Composition'>_cl_</span></code> has a fixed value defined by the specification (<a href="literal.html">value literals</a> and <a href="primitive.html">built-ins</a>). An identifier <code><span class='Value'>v</span></code>, <code><span class='Function'>F</span></code>, <code><span class='Modifier'>_m</span></code>, or <code><span class='Composition'>_c_</span></code> is evaluated by returning its value; because of the scoping rules it must have one when evaluated. A parenthesized expression such as <code><span class='String'>"("</span> <span class='Modifier'>_modExpr</span> <span class='String'>")"</span></code> simply returns the result of the interior expression. A braced construct such as <code><span class='Function'>BraceFunc</span></code> is defined by the evaluation of the statements it contains after all parameters are accepted. Finally, a list <code><span class='String'>"β¨"</span> <span class='Separator'>β</span><span class='Value'>?</span> <span class='Paren'>(</span> <span class='Paren'>(</span> <span class='Function'>EXPR</span> <span class='Separator'>β</span> <span class='Paren'>)</span><span class='Value'>*</span> <span class='Function'>EXPR</span> <span class='Separator'>β</span><span class='Value'>?</span> <span class='Paren'>)</span><span class='Value'>?</span> <span class='String'>"β©"</span></code> or <code><span class='Function'>ANY</span> <span class='Paren'>(</span> <span class='String'>"βΏ"</span> <span class='Function'>ANY</span> <span class='Paren'>)</span><span class='Function'>+</span></code> consists grammatically of a list of expressions. To evaluate it, each expression is evaluated in source order and their results are placed as elements of a rank-1 array. The two forms have identical semantics but different punctuation.</p> +<p>Rules in the table below are function and operator evaluation.</p> +<table> +<thead> +<tr> +<th>L</th> +<th>Left</th> +<th>Called</th> +<th>Right</th> +<th>R</th> +<th>Types</th> +</tr> +</thead> +<tbody> +<tr> +<td><code><span class='Value'>π¨</span></code></td> +<td><code><span class='Paren'>(</span> <span class='Value'>value</span> <span class='Function'>|</span> <span class='Value'>nothing</span> <span class='Paren'>)</span><span class='Value'>?</span></code></td> +<td><code><span class='Function'>Derv</span></code></td> +<td><code><span class='Value'>arg</span></code></td> +<td><code><span class='Value'>π©</span></code></td> +<td>Function, value</td> +</tr> +<tr> +<td><code><span class='Value'>π</span></code></td> +<td><code><span class='Function'>Operand</span></code></td> +<td><code><span class='Modifier'>_mod</span></code></td> +<td></td> +<td></td> +<td>Modifier</td> +</tr> +<tr> +<td><code><span class='Value'>π</span></code></td> +<td><code><span class='Function'>Operand</span></code></td> +<td><code><span class='Composition'>_comp_</span></code></td> +<td><code><span class='Paren'>(</span> <span class='Value'>value</span> <span class='Function'>|</span> <span class='Function'>Func</span> <span class='Paren'>)</span></code></td> +<td><code><span class='Value'>π</span></code></td> +<td>Composition</td> +</tr> +</tbody> +</table> +<p>In each case the constituent expressions are evaluated in reverse source order: Right, then Called, then Left. Then the expression's result is obtained by calling the Called value on its parameters. A left argument of <code><span class='Value'>nothing</span></code> is not used as a parameter, leaving only a right argument in that case. The data type of the Called value must be appropriate to the expression type, as indicated in the "Types" column. For function application, a value type (number, character, or array) is allowed. It is called simply by returning itself. Although the arguments are ignored in this case, they are still evaluated. A braced construct is evaluated by binding the parameter names given in columns L and R to the corresponding values. Then if all parameter levels present have been bound, its body is evaluated to give the result of application.</p> +<p>The following rules derive new functions or operators from existing ones.</p> +<table> +<thead> +<tr> +<th>Left</th> +<th>Center</th> +<th>Right</th> +<th>Result</th> +</tr> +</thead> +<tbody> +<tr> +<td></td> +<td><code><span class='Composition'>_comp_</span></code></td> +<td><code><span class='Paren'>(</span> <span class='Value'>value</span> <span class='Function'>|</span> <span class='Function'>Func</span> <span class='Paren'>)</span></code></td> +<td><code><span class='Brace'>{</span><span class='Function'>π½</span> <span class='Composition'>_C_</span> <span class='Function'>R</span><span class='Brace'>}</span></code></td> +</tr> +<tr> +<td><code><span class='Function'>Operand</span></code></td> +<td><code><span class='Composition'>_comp_</span></code></td> +<td></td> +<td><code><span class='Brace'>{</span><span class='Function'>L</span> <span class='Composition'>_C_</span> <span class='Function'>π½</span><span class='Brace'>}</span></code></td> +</tr> +<tr> +<td><code><span class='Function'>Operand</span></code></td> +<td><code><span class='Function'>Derv</span></code></td> +<td><code><span class='Function'>Fork</span></code></td> +<td><code><span class='Brace'>{</span><span class='Paren'>(</span><span class='Value'>π¨</span><span class='Function'>L</span><span class='Value'>π©</span><span class='Paren'>)</span><span class='Function'>C</span><span class='Paren'>(</span><span class='Value'>π¨</span><span class='Function'>R</span><span class='Value'>π©</span><span class='Paren'>)</span><span class='Brace'>}</span></code></td> +</tr> +<tr> +<td><code><span class='Value'>nothing?</span></code></td> +<td><code><span class='Function'>Derv</span></code></td> +<td><code><span class='Function'>Fork</span></code></td> +<td><code><span class='Brace'>{</span> <span class='Function'>C</span><span class='Paren'>(</span><span class='Value'>π¨</span><span class='Function'>R</span><span class='Value'>π©</span><span class='Paren'>)</span><span class='Brace'>}</span></code></td> +</tr> +</tbody> +</table> +<p>As with applications, all expressions are evaluated in reverse source order before doing anything else. Then a result is formed without calling the center value. Its value in BQN is given in the rightmost column, using <code><span class='Function'>L</span></code>, <code><span class='Function'>C</span></code>, and <code><span class='Function'>R</span></code> for the results of the expressions in the left, center, and right columns, respectively. For the first two rules (<em>partial application</em>), the given operand is bound to the composition: the result is a modifier that, when called, calls the center composition with the bound operand on the same side it appeared on and the new operand on the remaining side. A <em>train</em> is a function that, when called, calls the right-hand function on all arguments, then the left-hand function, and calls the center function with these results as arguments. In a composition partial application, the result will fail when applied if the center value does not have the composition type, and in a fork, it will fail if any component has a modifier or composition type (that is, cannot be applied as a function). BQN implementations are not required to check for these types when forming the result of these expressions, but may give an error on formation even if the result will never be applied.</p> + diff --git a/docs/spec/grammar.html b/docs/spec/grammar.html new file mode 100644 index 00000000..56803787 --- /dev/null +++ b/docs/spec/grammar.html @@ -0,0 +1,138 @@ +<head><link href="../style.css" rel="stylesheet"/></head> +<p>BQN's grammar is given below. Terms are defined in a <a href="https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form">BNF</a> variant. However, handling special names properly is possible but difficult in BNF, so they are explained in text along with the braced block grammar.</p> +<p>The symbols <code><span class='Value'>v</span></code>, <code><span class='Function'>F</span></code>, <code><span class='Modifier'>_m</span></code>, and <code><span class='Composition'>_c_</span></code> are identifier tokens with value, function, modifier, and composition classes respectively. Similarly, <code><span class='Value'>vl</span></code>, <code><span class='Function'>Fl</span></code>, <code><span class='Modifier'>_ml</span></code>, and <code><span class='Composition'>_cl_</span></code> refer to value literals (numeric and character literals, or primitives) of those classes. While names in the BNF here follow the identifier naming scheme, this is informative only: syntactic classes are no longer used after parsing and cannot be inspected in a running program.</p> +<p>A program is a list of statements. Almost all statements are expressions. However, explicit definitions and valueless results stemming from <code><span class='Nothing'>Β·</span></code>, or <code><span class='Value'>π¨</span></code> in a monadic brace function, can be used as statements but not expressions.</p> +<pre><span class='Function'>PROGRAM</span> <span class='Function'>=</span> <span class='Separator'>β</span><span class='Value'>?</span> <span class='Paren'>(</span> <span class='Paren'>(</span> <span class='Function'>STMT</span> <span class='Separator'>β</span> <span class='Paren'>)</span><span class='Value'>*</span> <span class='Function'>STMT</span> <span class='Separator'>β</span><span class='Value'>?</span> <span class='Paren'>)</span><span class='Value'>?</span> +<span class='Function'>STMT</span> <span class='Function'>=</span> <span class='Function'>EXPR</span> <span class='Function'>|</span> <span class='Function'>DEF</span> <span class='Function'>|</span> <span class='Value'>nothing</span> +<span class='Separator'>β</span> <span class='Function'>=</span> <span class='Paren'>(</span> <span class='String'>"β"</span> <span class='Function'>|</span> <span class='String'>","</span> <span class='Function'>|</span> <span class='Value'>\n</span> <span class='Paren'>)</span><span class='Function'>+</span> +<span class='Function'>EXPR</span> <span class='Function'>=</span> <span class='Value'>valExpr</span> <span class='Function'>|</span> <span class='Function'>FuncExpr</span> <span class='Function'>|</span> <span class='Modifier'>_modExpr</span> <span class='Function'>|</span> <span class='Composition'>_cmpExp_</span> +</pre> +<p>Here we define the "atomic" forms of functions and operators, which are either single tokens or enclosed in paired symbols. Stranded vectors with <code><span class='Ligature'>βΏ</span></code>, which binds more tightly than any form of execution, are also included.</p> +<pre><span class='Function'>ANY</span> <span class='Function'>=</span> <span class='Value'>atom</span> <span class='Function'>|</span> <span class='Function'>Func</span> <span class='Function'>|</span> <span class='Modifier'>_mod</span> <span class='Function'>|</span> <span class='Composition'>_comp_</span> +<span class='Composition'>_comp_</span> <span class='Function'>=</span> <span class='Composition'>_c_</span> <span class='Function'>|</span> <span class='Composition'>_cl_</span> <span class='Function'>|</span> <span class='String'>"("</span> <span class='Composition'>_cmpExp_</span> <span class='String'>")"</span> <span class='Function'>|</span> <span class='Composition'>_brComp_</span> +<span class='Modifier'>_mod</span> <span class='Function'>=</span> <span class='Modifier'>_m</span> <span class='Function'>|</span> <span class='Modifier'>_ml</span> <span class='Function'>|</span> <span class='String'>"("</span> <span class='Modifier'>_modExpr</span> <span class='String'>")"</span> <span class='Function'>|</span> <span class='Modifier'>_brMod</span> +<span class='Function'>Func</span> <span class='Function'>=</span> <span class='Function'>F</span> <span class='Function'>|</span> <span class='Function'>Fl</span> <span class='Function'>|</span> <span class='String'>"("</span> <span class='Function'>FuncExpr</span> <span class='String'>")"</span> <span class='Function'>|</span> <span class='Function'>BrFunc</span> +<span class='Value'>atom</span> <span class='Function'>=</span> <span class='Value'>v</span> <span class='Function'>|</span> <span class='Value'>vl</span> <span class='Function'>|</span> <span class='String'>"("</span> <span class='Value'>valExpr</span> <span class='String'>")"</span> <span class='Function'>|</span> <span class='Value'>brVal</span> <span class='Function'>|</span> <span class='Value'>list</span> +<span class='Value'>list</span> <span class='Function'>=</span> <span class='String'>"β¨"</span> <span class='Separator'>β</span><span class='Value'>?</span> <span class='Paren'>(</span> <span class='Paren'>(</span> <span class='Function'>EXPR</span> <span class='Separator'>β</span> <span class='Paren'>)</span><span class='Value'>*</span> <span class='Function'>EXPR</span> <span class='Separator'>β</span><span class='Value'>?</span> <span class='Paren'>)</span><span class='Value'>?</span> <span class='String'>"β©"</span> +<span class='Value'>value</span> <span class='Function'>=</span> <span class='Value'>atom</span> <span class='Function'>|</span> <span class='Function'>ANY</span> <span class='Paren'>(</span> <span class='String'>"βΏ"</span> <span class='Function'>ANY</span> <span class='Paren'>)</span><span class='Function'>+</span> +</pre> +<p>Starting at the highest-order objects, modifiers and compositions have fairly simple syntax. In most cases the syntax for <code><span class='Gets'>β</span></code> and <code><span class='Gets'>β©</span></code> is the same, but only <code><span class='Gets'>β©</span></code> can be used for modified assignment.</p> +<pre><span class='Function'>ASGN</span> <span class='Function'>=</span> <span class='String'>"β"</span> <span class='Function'>|</span> <span class='String'>"β©"</span> +<span class='Composition'>_cmpExp_</span> <span class='Function'>=</span> <span class='Composition'>_comp_</span> + <span class='Function'>|</span> <span class='Composition'>_c_</span> <span class='Function'>ASGN</span> <span class='Composition'>_cmpExp_</span> +<span class='Modifier'>_modExpr</span> <span class='Function'>=</span> <span class='Modifier'>_mod</span> + <span class='Function'>|</span> <span class='Composition'>_comp_</span> <span class='Paren'>(</span> <span class='Value'>value</span> <span class='Function'>|</span> <span class='Function'>Func</span> <span class='Paren'>)</span> <span class='Comment'># Right partial application +</span> <span class='Function'>|</span> <span class='Function'>Operand</span> <span class='Composition'>_comp_</span> <span class='Comment'># Left partial application +</span> <span class='Function'>|</span> <span class='Modifier'>_m</span> <span class='Function'>ASGN</span> <span class='Modifier'>_modExpr</span> +</pre> +<p>Functions can be formed by fully applying operators or as trains. Operators are left-associative, so that the left operand (<code><span class='Function'>Operand</span></code>) can include operators but the right operand (<code><span class='Value'>value</span> <span class='Function'>|</span> <span class='Function'>Func</span></code>) cannot. Trains are right-associative, but bind less tightly than operators. Assignment is not allowed in the top level of a train: it must be parenthesized.</p> +<pre><span class='Function'>Derv</span> <span class='Function'>=</span> <span class='Function'>Func</span> + <span class='Function'>|</span> <span class='Function'>Operand</span> <span class='Modifier'>_mod</span> + <span class='Function'>|</span> <span class='Function'>Operand</span> <span class='Composition'>_comp_</span> <span class='Paren'>(</span> <span class='Value'>value</span> <span class='Function'>|</span> <span class='Function'>Func</span> <span class='Paren'>)</span> +<span class='Function'>Operand</span> <span class='Function'>=</span> <span class='Value'>value</span> + <span class='Function'>|</span> <span class='Function'>Derv</span> +<span class='Function'>Fork</span> <span class='Function'>=</span> <span class='Function'>Derv</span> + <span class='Function'>|</span> <span class='Function'>Operand</span> <span class='Function'>Derv</span> <span class='Function'>Fork</span> <span class='Comment'># 3-train +</span> <span class='Function'>|</span> <span class='Value'>nothing</span> <span class='Function'>Derv</span> <span class='Function'>Fork</span> <span class='Comment'># 2-train +</span><span class='Function'>Train</span> <span class='Function'>=</span> <span class='Function'>Fork</span> + <span class='Function'>|</span> <span class='Function'>Derv</span> <span class='Function'>Fork</span> <span class='Comment'># 2-train +</span><span class='Function'>FuncExpr</span> <span class='Function'>=</span> <span class='Function'>Train</span> + <span class='Function'>|</span> <span class='Function'>F</span> <span class='Function'>ASGN</span> <span class='Function'>FuncExpr</span> +</pre> +<p>Value expressions are complicated by the possibility of list assignment. We also define nothing-statements, which have very similar syntax to value expressions but do not permit assignment.</p> +<pre><span class='Value'>arg</span> <span class='Function'>=</span> <span class='Value'>valExpr</span> + <span class='Function'>|</span> <span class='Paren'>(</span> <span class='Value'>value</span> <span class='Function'>|</span> <span class='Value'>nothing</span> <span class='Paren'>)</span><span class='Value'>?</span> <span class='Function'>Derv</span> <span class='Value'>arg</span> +<span class='Value'>nothing</span> <span class='Function'>=</span> <span class='String'>"Β·"</span> + <span class='Function'>|</span> <span class='Paren'>(</span> <span class='Value'>value</span> <span class='Function'>|</span> <span class='Value'>nothing</span> <span class='Paren'>)</span><span class='Value'>?</span> <span class='Function'>Derv</span> <span class='Value'>nothing</span> +<span class='Function'>LHS_ANY</span> <span class='Function'>=</span> <span class='Value'>lhsValue</span> <span class='Function'>|</span> <span class='Function'>F</span> <span class='Function'>|</span> <span class='Modifier'>_m</span> <span class='Function'>|</span> <span class='Composition'>_c_</span> +<span class='Function'>LHS_ATOM</span> <span class='Function'>=</span> <span class='Function'>LHS_ANY</span> <span class='Function'>|</span> <span class='String'>"("</span> <span class='Value'>lhsStr</span> <span class='String'>")"</span> +<span class='Function'>LHS_ELT</span> <span class='Function'>=</span> <span class='Function'>LHS_ANY</span> <span class='Function'>|</span> <span class='Value'>lhsStr</span> +<span class='Value'>lhsValue</span> <span class='Function'>=</span> <span class='Value'>v</span> + <span class='Function'>|</span> <span class='String'>"β¨"</span> <span class='Separator'>β</span><span class='Value'>?</span> <span class='Paren'>(</span> <span class='Paren'>(</span> <span class='Function'>LHS_ELT</span> <span class='Separator'>β</span> <span class='Paren'>)</span><span class='Value'>*</span> <span class='Function'>LHS_ELT</span> <span class='Separator'>β</span><span class='Value'>?</span> <span class='Paren'>)</span><span class='Value'>?</span> <span class='String'>"β©"</span> +<span class='Value'>lhsStr</span> <span class='Function'>=</span> <span class='Function'>LHS_ATOM</span> <span class='Paren'>(</span> <span class='String'>"βΏ"</span> <span class='Function'>LHS_ATOM</span> <span class='Paren'>)</span><span class='Function'>+</span> +<span class='Value'>lhs</span> <span class='Function'>=</span> <span class='Value'>lhsValue</span> <span class='Function'>|</span> <span class='Value'>lhsStr</span> +<span class='Value'>valExpr</span> <span class='Function'>=</span> <span class='Value'>arg</span> + <span class='Function'>|</span> <span class='Value'>lhs</span> <span class='Function'>ASGN</span> <span class='Value'>valExpr</span> + <span class='Function'>|</span> <span class='Value'>lhs</span> <span class='Function'>Derv</span> <span class='String'>"β©"</span> <span class='Value'>valExpr</span> <span class='Comment'># Modified assignment +</span></pre> +<p>A header looks like a name for the thing being headed, or its application to inputs (possibly twice in the case of modifiers and compositions). As with assignment, it is restricted to a simple form with no extra parentheses. The full list syntax is allowed for arguments. As a special rule, a monadic function header specifically can omit the function when the argument is not just a name (as this would conflict with a value label). The following cases define only headers with arguments, which are assumed to be special cases; there can be any number of these. Headers without arguments can only refer to the general caseβnote that operands are not pattern matchedβso there can be at most two of these kinds of headers, indicating the monadic and dyadic cases.</p> +<pre><span class='Value'>headW</span> <span class='Function'>=</span> <span class='Value'>value</span> <span class='Function'>|</span> <span class='String'>"π¨"</span> +<span class='Value'>headX</span> <span class='Function'>=</span> <span class='Value'>value</span> <span class='Function'>|</span> <span class='String'>"π©"</span> +<span class='Function'>HeadF</span> <span class='Function'>=</span> <span class='Function'>F</span> <span class='Function'>|</span> <span class='String'>"π"</span> <span class='Function'>|</span> <span class='String'>"π½"</span> +<span class='Function'>HeadG</span> <span class='Function'>=</span> <span class='Function'>F</span> <span class='Function'>|</span> <span class='String'>"π"</span> <span class='Function'>|</span> <span class='String'>"πΎ"</span> +<span class='Function'>ModH1</span> <span class='Function'>=</span> <span class='Function'>HeadF</span> <span class='Paren'>(</span> <span class='Modifier'>_m</span> <span class='Function'>|</span> <span class='String'>"_π£"</span> <span class='Paren'>)</span> +<span class='Function'>CmpH1</span> <span class='Function'>=</span> <span class='Function'>HeadF</span> <span class='Paren'>(</span> <span class='Composition'>_c_</span> <span class='Function'>|</span> <span class='String'>"_π£_"</span> <span class='Paren'>)</span> <span class='Function'>HeadG</span> +<span class='Function'>FuncHead</span> <span class='Function'>=</span> <span class='Value'>headW?</span> <span class='Paren'>(</span> <span class='Function'>F</span> <span class='Function'>|</span> <span class='String'>"π"</span> <span class='Paren'>)</span> <span class='Value'>headX</span> + <span class='Function'>|</span> <span class='Value'>vl</span> <span class='Function'>|</span> <span class='String'>"("</span> <span class='Value'>valExpr</span> <span class='String'>")"</span> <span class='Function'>|</span> <span class='Value'>brVal</span> <span class='Function'>|</span> <span class='Value'>list</span> <span class='Comment'># value, +</span> <span class='Function'>|</span> <span class='Function'>ANY</span> <span class='Paren'>(</span> <span class='String'>"βΏ"</span> <span class='Function'>ANY</span> <span class='Paren'>)</span><span class='Function'>+</span> <span class='Comment'># but not v +</span><span class='Modifier'>_modHead</span> <span class='Function'>=</span> <span class='Value'>headW?</span> <span class='Function'>ModH1</span> <span class='Value'>headX</span> +<span class='Composition'>_cmpHed_</span> <span class='Function'>=</span> <span class='Value'>headW?</span> <span class='Function'>CmpH1</span> <span class='Value'>headX</span> +</pre> +<p>A braced block contains bodies, which are lists of statements, separated by semicolons and possibly preceded by headers, which are separated from the body with a colon. Multiple bodies allow different handling for various cases, which are pattern-matched by headers. For a value block there are no inputs, so there can only be one possible case and one body. Functions and operators allow any number of "matched" bodies, with headers that have arguments, followed by at most two "main" bodies with either no headers or headers without arguments. If there is one main body, it is ambivalent, but two main bodies refer to the monadic and dyadic cases.</p> +<pre><span class='Function'>BODY</span> <span class='Function'>=</span> <span class='Separator'>β</span><span class='Value'>?</span> <span class='Paren'>(</span> <span class='Function'>STMT</span> <span class='Separator'>β</span> <span class='Paren'>)</span><span class='Value'>*</span> <span class='Function'>EXPR</span> <span class='Separator'>β</span><span class='Value'>?</span> +<span class='Function'>FCase</span> <span class='Function'>=</span> <span class='Separator'>β</span><span class='Value'>?</span> <span class='Function'>FuncHead</span> <span class='String'>":"</span> <span class='Function'>BODY</span> +<span class='Modifier'>_mCase</span> <span class='Function'>=</span> <span class='Separator'>β</span><span class='Value'>?</span> <span class='Modifier'>_modHead</span> <span class='String'>":"</span> <span class='Function'>BODY</span> +<span class='Composition'>_cCase_</span> <span class='Function'>=</span> <span class='Separator'>β</span><span class='Value'>?</span> <span class='Composition'>_cmpHed_</span> <span class='String'>":"</span> <span class='Function'>BODY</span> +<span class='Function'>FMain</span> <span class='Function'>=</span> <span class='Paren'>(</span> <span class='Separator'>β</span><span class='Value'>?</span> <span class='Function'>F</span> <span class='String'>":"</span> <span class='Paren'>)</span><span class='Value'>?</span> <span class='Function'>BODY</span> +<span class='Modifier'>_mMain</span> <span class='Function'>=</span> <span class='Paren'>(</span> <span class='Separator'>β</span><span class='Value'>?</span> <span class='Paren'>(</span> <span class='Modifier'>_m</span> <span class='Function'>|</span> <span class='Function'>ModH1</span> <span class='Paren'>)</span> <span class='String'>":"</span> <span class='Paren'>)</span><span class='Value'>?</span> <span class='Function'>BODY</span> +<span class='Composition'>_cMain_</span> <span class='Function'>=</span> <span class='Paren'>(</span> <span class='Separator'>β</span><span class='Value'>?</span> <span class='Paren'>(</span> <span class='Composition'>_c_</span> <span class='Function'>|</span> <span class='Function'>CmpH1</span> <span class='Paren'>)</span> <span class='String'>":"</span> <span class='Paren'>)</span><span class='Value'>?</span> <span class='Function'>BODY</span> +<span class='Value'>brVal</span> <span class='Function'>=</span> <span class='String'>"{"</span> <span class='Paren'>(</span> <span class='Separator'>β</span><span class='Value'>?</span> <span class='Value'>v</span> <span class='String'>":"</span> <span class='Paren'>)</span><span class='Value'>?</span> <span class='Function'>BODY</span> <span class='String'>"}"</span> +<span class='Function'>BrFunc</span> <span class='Function'>=</span> <span class='String'>"{"</span> <span class='Paren'>(</span> <span class='Function'>FCase</span> <span class='String'>";"</span> <span class='Paren'>)</span><span class='Value'>*</span> <span class='Paren'>(</span> <span class='Function'>FCase</span> <span class='Function'>|</span> <span class='Function'>FMain</span> <span class='Paren'>(</span> <span class='String'>";"</span> <span class='Function'>FMain</span> <span class='Paren'>)</span><span class='Value'>?</span> <span class='Paren'>)</span> <span class='String'>"}"</span> +<span class='Modifier'>_brMod</span> <span class='Function'>=</span> <span class='String'>"{"</span> <span class='Paren'>(</span> <span class='Modifier'>_mCase</span> <span class='String'>";"</span> <span class='Paren'>)</span><span class='Value'>*</span> <span class='Paren'>(</span> <span class='Modifier'>_mCase</span> <span class='Function'>|</span> <span class='Modifier'>_mMain</span> <span class='Paren'>(</span> <span class='String'>";"</span> <span class='Modifier'>_mMain</span> <span class='Paren'>)</span><span class='Value'>?</span> <span class='Paren'>)</span> <span class='String'>"}"</span> +<span class='Composition'>_brComp_</span> <span class='Function'>=</span> <span class='String'>"{"</span> <span class='Paren'>(</span> <span class='Composition'>_cCase_</span> <span class='String'>";"</span> <span class='Paren'>)</span><span class='Value'>*</span> <span class='Paren'>(</span> <span class='Composition'>_cCase_</span> <span class='Function'>|</span> <span class='Composition'>_cMan_</span> <span class='Paren'>(</span> <span class='String'>";"</span> <span class='Composition'>_cMan_</span> <span class='Paren'>)</span><span class='Value'>?</span> <span class='Paren'>)</span> <span class='String'>"}"</span> +</pre> +<p>Two additional rules apply to blocks, based on the special name associations in the table below. First, each block allows the special names in its column to be used as the given token types within <code><span class='Function'>BODY</span></code> terms (not headers). Except for the spaces labelled "None", each column is cumulative and a given entry also includes all the entries above it. Second, for <code><span class='Function'>BrFunc</span></code>, <code><span class='Modifier'>_brMod</span></code>, and <code><span class='Composition'>_brComp_</span></code> terms, if no header is given, then at least one <code><span class='Function'>BODY</span></code> term in it <em>must</em> contain one of the names on, and not above, the corresponding row. Otherwise the syntax would be ambiguous, since for example a simple <code><span class='String'>"{"</span> <span class='Function'>BODY</span> <span class='String'>"}"</span></code> sequence could have any type.</p> +<table> +<thead> +<tr> +<th>Term</th> +<th><code><span class='Value'>v</span></code></th> +<th><code><span class='Function'>F</span></code></th> +<th><code><span class='Modifier'>_m</span></code></th> +<th><code><span class='Composition'>_c_</span></code></th> +<th>other</th> +</tr> +</thead> +<tbody> +<tr> +<td><code><span class='Value'>brVal</span></code>, <code><span class='Function'>PROGRAM</span></code></td> +<td>None</td> +<td>None</td> +<td>None</td> +<td>None</td> +<td></td> +</tr> +<tr> +<td><code><span class='Function'>BrFunc</span></code></td> +<td><code><span class='Value'>π¨π©π€</span></code></td> +<td><code><span class='Function'>πππ</span></code></td> +<td></td> +<td></td> +<td><code><span class='String'>";"</span></code></td> +</tr> +<tr> +<td><code><span class='Modifier'>_brMod</span></code></td> +<td><code><span class='Value'>ππ£</span></code></td> +<td><code><span class='Function'>π½</span></code></td> +<td><code><span class='Composition'>_</span><span class='Value'>π£</span></code></td> +<td></td> +<td></td> +</tr> +<tr> +<td><code><span class='Composition'>_brComp_</span></code></td> +<td><code><span class='Value'>π</span></code></td> +<td><code><span class='Function'>πΎ</span></code></td> +<td>None</td> +<td><code><span class='Composition'>_</span><span class='Value'>π£</span><span class='Composition'>_</span></code></td> +<td></td> +</tr> +</tbody> +</table> +<p>The rules for special name can be expressed in BNF by making many copies of all expression rules above. For each "level", or row in the table, a new version of every rule should be made that allows that level but not higher ones, and another version should be made that requires exactly that level. The values themselves should be included in <code><span class='Value'>v</span></code>, <code><span class='Function'>F</span></code>, <code><span class='Modifier'>_m</span></code>, and <code><span class='Composition'>_c_</span></code> for these copies. Then the "allowed" rules are made simply by replacing the terms they contain (excluding <code><span class='Value'>brVal</span></code> and so on) with the same "allowed" versions, and "required" rules are constructed using both "allowed" and "required" rules. For every part of a production rule, an alternative should be created that requires the relevant name in that part while allowing it in the others. For example, <code><span class='Paren'>(</span> <span class='Value'>value</span> <span class='Function'>|</span> <span class='Value'>nothing</span> <span class='Paren'>)</span><span class='Value'>?</span> <span class='Function'>Derv</span> <span class='Value'>arg</span></code> would be transformed to</p> +<pre><span class='Value'>arg_req1</span> <span class='Function'>=</span> <span class='Value'>valExpr_req1</span> + <span class='Function'>|</span> <span class='Paren'>(</span> <span class='Value'>value_req1</span> <span class='Function'>|</span> <span class='Value'>nothing_req1</span> <span class='Paren'>)</span> <span class='Function'>Derv_allow1</span> <span class='Value'>arg_allow1</span> + <span class='Function'>|</span> <span class='Paren'>(</span> <span class='Value'>value_allow1</span> <span class='Function'>|</span> <span class='Value'>nothing_allow1</span> <span class='Paren'>)</span><span class='Value'>?</span> <span class='Function'>Derv_req1</span> <span class='Value'>arg_allow1</span> + <span class='Function'>|</span> <span class='Paren'>(</span> <span class='Value'>value_allow1</span> <span class='Function'>|</span> <span class='Value'>nothing_allow1</span> <span class='Paren'>)</span><span class='Value'>?</span> <span class='Function'>Derv_allow1</span> <span class='Value'>arg_req1</span> +</pre> +<p>Quite tedious. The explosion of rules is partly due to the fact that the brace-typing rule falls into a weaker class of grammars than the other rules. Most of BQN is <a href="https://en.wikipedia.org/wiki/Deterministic_context-free_grammar">deterministic context-free</a> but brace-typing is not, only context-free. Fortunately brace typing does not introduce the parsing difficulties that can be present in a general context-free grammar, and it can easily be performed in linear time: after <a href="token.html">scanning</a> but before parsing, move through the source code maintaining a stack of the current top-level set of braces. Whenever a colon or special name is encountered, annotate that set of braces to indicate that it is present. When a closing brace is encountered and the top brace is popped off the stack, the type is needed if there was no colon, and can be found based on which names were present. One way to present this information to the parser is to replace the brace tokens with new tokens that indicate the type.</p> + diff --git a/docs/spec/literal.html b/docs/spec/literal.html new file mode 100644 index 00000000..c4914753 --- /dev/null +++ b/docs/spec/literal.html @@ -0,0 +1,14 @@ +<head><link href="../style.css" rel="stylesheet"/></head> +<p>A <em>literal</em> is a single <a href="token.html">token</a> that indicates a fixed character, number, or array. While literals indicate data of a value type, <a href="primitive.html">primitives</a> indicate data of a function type: function, modifier, or composition.</p> +<p>Two types of literal deal with text. As the source code is considered to be a sequence of unicode code points ("characters"), and these code points are also used for BQN's character <a href="types.html">data type</a>, the representation of a text literal is very similar to its value. In a text literal, the newline character is always represented using the ASCII line feed character, code point 10. A <em>character literal</em> is enclosed with single quotes <code>'</code> and its value is identical to the single character between them. A <em>string literal</em> is enclosed in double quotes <code>"</code>, and any double quotes between them must come in pairs, as a lone double quote marks the end of the literal. The value of a string literal is a rank-1 array whose elements are the characters in between the enclosing quotes, after replacing each pair of double quotes with only one such quote.</p> +<p>The format of a <em>numeric literal</em> is more complicated. From the <a href="token.html">tokenization rules</a>, a numeric literal consists of a numeric character (one of <code><span class='Number'>Β―βΟ.0123456789</span></code>) followed by any number of numeric or alphabetic characters. Some numeric literals are <em>valid</em> and indicate a number, while others are invalid and cause an error. The grammar for valid numbers is given below in a <a href="https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form">BNF</a> variant. Only four alphabetic characters are allowed: "i", which separates the real and imaginary components of a complex number, "e", which functions as in scientific notation, and the uppercase versions of these letters.</p> +<pre><span class='Value'>number</span> <span class='Function'>=</span> <span class='Value'>component</span> <span class='Paren'>(</span> <span class='Paren'>(</span> <span class='String'>"i"</span> <span class='Function'>|</span> <span class='String'>"I"</span> <span class='Paren'>)</span> <span class='Value'>component</span> <span class='Paren'>)</span><span class='Value'>?</span> +<span class='Value'>component</span> <span class='Function'>=</span> <span class='Value'>mantissa</span> <span class='Paren'>(</span> <span class='Paren'>(</span> <span class='String'>"e"</span> <span class='Function'>|</span> <span class='String'>"E"</span> <span class='Paren'>)</span> <span class='Value'>exponent</span> <span class='Paren'>)</span><span class='Value'>?</span> +<span class='Value'>exponent</span> <span class='Function'>=</span> <span class='String'>"Β―"</span><span class='Value'>?</span> <span class='Value'>digit</span><span class='Function'>+</span> +<span class='Value'>mantissa</span> <span class='Function'>=</span> <span class='String'>"Β―"</span><span class='Value'>?</span> <span class='Paren'>(</span> <span class='String'>"β"</span> <span class='Function'>|</span> <span class='String'>"Ο"</span> <span class='Function'>|</span> <span class='Value'>digit</span><span class='Function'>+</span> <span class='Paren'>(</span> <span class='String'>"."</span> <span class='Value'>digit</span><span class='Function'>+</span> <span class='Paren'>)</span><span class='Value'>?</span> <span class='Paren'>)</span> +<span class='Value'>digit</span> <span class='Function'>=</span> <span class='String'>"0"</span> <span class='Function'>|</span> <span class='String'>"1"</span> <span class='Function'>|</span> <span class='String'>"2"</span> <span class='Function'>|</span> <span class='String'>"3"</span> <span class='Function'>|</span> <span class='String'>"4"</span> <span class='Function'>|</span> <span class='String'>"5"</span> <span class='Function'>|</span> <span class='String'>"6"</span> <span class='Function'>|</span> <span class='String'>"7"</span> <span class='Function'>|</span> <span class='String'>"8"</span> <span class='Function'>|</span> <span class='String'>"9"</span> +</pre> +<p>The digits or arabic numerals correspond to the numbers from 0 to 9 in the conventional way (also, each corresponds to its code point value minus 48). A sequence of digits gives a natural number by evaluating it in base 10: the number is 0 for an empty sequence, and otherwise the last digit's numerical value plus ten times the number obtained from the remaining digits. The symbol <code><span class='Number'>β</span></code> indicates infinity and <code><span class='Number'>Ο</span></code> indicates the ratio <a href="https://en.wikipedia.org/wiki/Pi_(mathematics">pi</a>) of a circle's circumference to its diameter (or, for modern mathematicians, the smallest positive real number at which the function <code><span class='Brace'>{</span><span class='Function'>β</span><span class='Number'>0j1</span><span class='Function'>Γ</span><span class='Value'>π©</span><span class='Brace'>}</span></code> attains a real part of 0). The <a href="https://aplwiki.com/wiki/High_minus">high minus</a> symbol <code><span class='Number'>Β―</span></code> indicates that the number containing it is to be negated.</p> +<p>When an exponent is provided (with <code><span class='Value'>e</span></code> or <code><span class='Function'>E</span></code>), the corresponding mantissa is multiplied by ten to that power, giving the value <code><span class='Value'>mantissa</span><span class='Function'>Γ</span><span class='Number'>10</span><span class='Function'>β</span><span class='Value'>exponent</span></code>. If a second component is present (using <code><span class='Value'>i</span></code> or <code><span class='Function'>I</span></code>), that component's value is multiplied by the <a href="https://en.wikipedia.org/wiki/Imaginary_unit">imaginary unit</a> <em>i</em> and added to the first component; otherwise the value is the first component's value without modification. If complex numbers are not supported, then <code><span class='Value'>i</span></code> should not be allowed in numeric literals, even when followed by 0.</p> +<p>The above specification describes exactly a complex number with extended real components. To obtain a BQN number, each component is rounded to its nearest representative by the rules of the number system used: for IEEE 754, smallest distance, with ties rounding to the option with even mantissa.</p> + diff --git a/docs/spec/token.html b/docs/spec/token.html new file mode 100644 index 00000000..531cfd6e --- /dev/null +++ b/docs/spec/token.html @@ -0,0 +1,40 @@ +<head><link href="../style.css" rel="stylesheet"/></head> +<p>This page describes BQN's token formation rules (token formation is also called scanning). Most tokens in BQN are a single character long, but quoted characters and strings, identifiers, and numbers can consist of multiple characters, and comments, spaces, and tabs are discarded during token formation.</p> +<p>BQN source code should be considered as a series of unicode code points, which we refer to as "characters". The separator between lines in a file is considered to be a single character, newline, even though some operating systems such as Windows typically represent it with a two-character CRLF sequence. Implementers should note that not all languages treat unicode code points as atomic, as exposing the UTF-8 or UTF-16 representation instead is common. For a language such as JavaScript that uses UTF-16, the double-struck characters <code><span class='Value'>π¨</span><span class='Function'>π</span><span class='Value'>π©</span><span class='Function'>π</span><span class='Value'>π</span><span class='Function'>π½</span><span class='Value'>π</span><span class='Function'>πΎ</span></code> are represented as two 16-bit surrogate characters, but BQN treats them as a single unit.</p> +<p>A BQN <em>character literal</em> consists of a single character between single quotes, such as <code><span class='String'>'a'</span></code>, and a <em>string literal</em> consists of any number of characters between double quotes, such as <code><span class='String'>""</span></code> or <code><span class='String'>"abc"</span></code>. Character and string literals take precedence with comments over other tokenization rules, so that <code><span class='Comment'>#</span></code> between quotes does not start a comment and whitespace between quotes is not removed, but a quote within a comment does not start a character literal. Almost any character can be included directly in a character or string literal without escaping. The only exception is the double quote character <code>"</code>, which must be written twice to include it in a string, as otherwise it would end the string instead. Character literals require no escaping at all, as the length is fixed. In particular, literals for the double and single quote characters are written <code><span class='String'>'''</span></code> and <code><span class='String'>'"'</span></code>, while length-1 strings containing these characters are <code><span class='String'>"'"</span></code> and <code><span class='String'>""""</span></code>.</p> +<p>A comment consists of the hash character <code><span class='Comment'>#</span></code> and any following text until (not including) the next newline character. The initial <code><span class='Comment'>#</span></code> must not be part of a string literal started earlier. Comments are ignored entirely and do not form tokens.</p> +<p>Identifiers and numeric literals share the same token formation rule. These tokens are formed from the <em>numeric characters</em> <code><span class='Number'>Β―βΟ.0123456789</span></code> and <em>alphabetic characters</em> <code><span class='Modifier'>_abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ</span></code> and the oddball <code><span class='Value'>π£</span></code>. Any sequence of these characters adjacent to each other forms a single token, which is a <em>numeric literal</em> if it begins with a numeric character and an <em>identifier</em> if it begins with an alphabetic character. Numeric literals are also subject to <a href="literal.html">numeric literal rules</a>, which specify which numeric literals are valid and which numbers they represent. If the token contains <code><span class='Value'>π£</span></code> it must be either <code><span class='Value'>π£</span></code>, <code><span class='Composition'>_</span><span class='Value'>π£</span></code>, or <code><span class='Composition'>_</span><span class='Value'>π£</span><span class='Composition'>_</span></code> and is considered a special name (see below). As the value taken by this identifier can only be a modifier or composition, the uppercase character <code><span class='Value'>β</span></code> is not allowed.</p> +<p>Following this step, the whitespace characters space and tab are ignored, and do not form tokens. Only these whitespace characters, and the newline character, which does form a token, are allowed.</p> +<p>Otherwise, a single character forms a token. Only the specified set of characters can be used; others result in an error. The classes of characters are given below.</p> +<table> +<thead> +<tr> +<th>Class</th> +<th>Characters</th> +</tr> +</thead> +<tbody> +<tr> +<td>Primitive Function</td> +<td><code><span class='Function'>+-ΓΓ·ββββ|Β¬β§β¨<>β =β€β₯β‘β’β£β’β₯βΎβββββ½β/ββββββββ·β</span></code></td> +</tr> +<tr> +<td>Primitive Modifier</td> +<td><code><span class='Modifier'>ΛΛΒ¨ββΌΒ΄`</span></code></td> +</tr> +<tr> +<td>Primitive Composition</td> +<td><code><span class='Composition'>βββΈββΎββΆβββ</span></code></td> +</tr> +<tr> +<td>Special name</td> +<td><code><span class='Value'>π¨π©πππ€</span><span class='Function'>πππ½πΎπ</span></code></td> +</tr> +<tr> +<td>Punctuation</td> +<td><code><span class='Gets'>ββ©β</span><span class='Paren'>()</span><span class='Brace'>{}</span><span class='Bracket'>β¨β©</span><span class='Ligature'>βΏ</span><span class='Separator'>β,</span></code> and newline</td> +</tr> +</tbody> +</table> +<p>In the BQN <a href="grammar.html">grammar specification</a>, the three primitive classes are grouped into terminals <code><span class='Function'>Fl</span></code>, <code><span class='Modifier'>_ml</span></code>, and <code><span class='Modifier'>_cl</span></code>, while the punctuation characters are identified separately as keywords such as <code><span class='String'>"β"</span></code>. The special names are handled specially. The uppercase versions <code><span class='Function'>πππ½πΎπ</span></code> and lowercase versions <code><span class='Value'>π¨π©πππ€</span></code> are two spellings of the five underlying inputs and function.</p> + diff --git a/docs/spec/types.html b/docs/spec/types.html new file mode 100644 index 00000000..81ed35ee --- /dev/null +++ b/docs/spec/types.html @@ -0,0 +1,20 @@ +<head><link href="../style.css" rel="stylesheet"/></head> +<p>BQN programs manipulate data of six types:</p> +<ul> +<li>Character</li> +<li>Number</li> +<li>Array</li> +<li>Function</li> +<li>Modifier</li> +<li>Composition</li> +</ul> +<p>Of these, the first three are considered <em>value types</em> and the remaining three <em>function types</em>. We first describe the much simpler function types; the remainder of this page will be dedicated to the value types. A member of any function type accepts some number of <em>inputs</em> and either returns a <em>result</em> or causes an error; inputs and the result are data of any type. When a function is given inputs (<em>called</em>), it may produce side effects before returning, such as manipulating variables and calling other functions within its scope, or performing I/O.</p> +<ul> +<li>A <em>function</em> takes one (monadic call) or two (dyadic call) <em>arguments</em>.</li> +<li>A <em>modifier</em> takes one <em>operand</em>.</li> +<li>A <em>composition</em> takes two <em>operands</em>.</li> +</ul> +<p>To begin the value types, a <em>character</em> is a <a href="https://en.wikipedia.org/wiki/Unicode">Unicode</a> code point, that is, its value is a non-negative integer within the ranges defined by Unicode (however, it is distinct from this number as a BQN value). Characters are ordered by this numeric value. BQN deals with code points as abstract entities and does not use encodings such as UTF-8 or UTF-16.</p> +<p>The precise type of a <em>number</em> may vary across BQN implementations or instances. A <em>real number</em> is a member of some supported subset of the <a href="https://en.wikipedia.org/wiki/Extended_real_number_line">extended real numbers</a>, that is, the real numbers and positive or negative infinity. Some system must be defined for rounding an arbitrary real number to a member of this subset, and the basic arithmetic operations add, subtract, multiply, divide, and natural exponent (base <em>e</em>) are defined by performing these operations on exact real values and rounding the result. The Power function (dyadic <code><span class='Function'>β</span></code>) is also used but need not be exactly rounded. A <em>complex number</em> is a value with two real number <em>components</em>, a <em>real part</em> and an <em>imaginary part</em>. A BQN implementation can either support real numbers only, or complex numbers.</p> +<p>An <em>array</em> is a rectangular collection of data. It is defined by a <em>shape</em>, which is a list of non-negative integer lengths, and a <em>ravel</em>, which is a list of <em>elements</em> whose length (the array's <em>bound</em>) is the product of all lengths in the shape. Arrays are defined inductively: any value (of a value or function type) can be used as an element of an array, but it is not possible for an array to contain itself as an element, or an array that contains itself, and so on.</p> + |
