aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--docs/spec/evaluate.html2
-rw-r--r--docs/spec/grammar.html4
-rw-r--r--docs/spec/token.html2
-rw-r--r--md.bqn8
4 files changed, 8 insertions, 8 deletions
diff --git a/docs/spec/evaluate.html b/docs/spec/evaluate.html
index 0f4e07c2..1df68613 100644
--- a/docs/spec/evaluate.html
+++ b/docs/spec/evaluate.html
@@ -12,7 +12,7 @@
<p>A <code><span class='Function'>PROGRAM</span></code> or <code><span class='Function'>BODY</span></code> is a list of <code><span class='Function'>STMT</span></code>s (for <code><span class='Function'>BODY</span></code>, the last must be an <code><span class='Function'>EXPR</span></code>, a particular kind of <code><span class='Function'>STMT</span></code>), which are evaluated in program order. The statement <code><span class='Value'>nothing</span></code> does nothing when evaluated, while <code><span class='Function'>EXPR</span></code> evaluates some APL code and possibly assigns the results, as described below.</p>
<p>A block consists of several <code><span class='Function'>BODY</span></code> terms, some of which may have an accompanying header describing accepted inputs and how they are processed. An immediate block <code><span class='Value'>brImm</span></code> can only have one <code><span class='Function'>BODY</span></code>, and is evaluated by evaluating the code in it. Other types of blocks do not evaluate any <code><span class='Function'>BODY</span></code> immediately, but instead return a function or modifier that obtains its result by evaluating a particular <code><span class='Function'>BODY</span></code>. The <code><span class='Function'>BODY</span></code> is identified and evaluated once the block has received enough inputs (operands or arguments), which for modifiers can take one or two calls: if two calls are required, then on the first call the operands are simply stored and no code is evaluated yet. Two calls are required if there is more than one <code><span class='Function'>BODY</span></code> term, if the <code><span class='Function'>BODY</span></code> contains the special names <code><span class='Value'>𝕨𝕩𝕀</span><span class='Function'>π•Žπ•π•Š</span></code>, or if its header specifies arguments (the header-body combination is a <code><span class='Modifier'>_mCase</span></code> or <code><span class='Modifier2'>_cCase_</span></code>). Otherwise only one is required.</p>
<p>To evaluate a block when enough inputs have been received, first the correct case must be identified. To do this, first each special case (<code><span class='Function'>FCase</span></code>, <code><span class='Modifier'>_mCase</span></code>, or <code><span class='Modifier2'>_cCase_</span></code>) is checked in order to see if its arguments are strucurally compatible with the given arguments. That is, is <code><span class='Value'>headW</span></code> is a <code><span class='Value'>subject</span></code>, there must be a left argument matching that structure, and if <code><span class='Value'>headX</span></code> is a <code><span class='Value'>subject</span></code>, the right argument must match that structure. This means that <code><span class='Value'>𝕨</span></code> not only matches any left argument but also no argument. The test for compatibility is the same as for multiple assignment described below, except that the header may contain constants, which must match the corresponding part of the given argument.If no special case matches, then an appropriate general case (<code><span class='Function'>FMain</span></code>, <code><span class='Modifier'>_mMain</span></code>, or <code><span class='Modifier2'>_cMain_</span></code>) is used: if there are two, the first is used with no left argument and the second with a left argument; if there are one, it is always used, and if there are none, an error results.</p>
-<p>The only remaining step before evaluating the <code><span class='Function'>BODY</span></code> is to bind the inputs and other names. Special names are always bound when applicable: <code><span class='Value'>𝕨𝕩𝕀</span></code> if arguments are used, <code><span class='Value'>𝕨</span></code> if there is a left argument, <code><span class='Value'>π•—π•˜</span></code> if operands are used, and <code><span class='Modifier2'>_</span><span class='Value'>𝕣</span></code> and <code><span class='Modifier2'>_</span><span class='Value'>𝕣</span><span class='Modifier2'>_</span></code> for modifiers and combinators, respectively. Any names in the header are also bound, allowing multiple assignment for arguments.</p>
+<p>The only remaining step before evaluating the <code><span class='Function'>BODY</span></code> is to bind the inputs and other names. Special names are always bound when applicable: <code><span class='Value'>𝕨𝕩𝕀</span></code> if arguments are used, <code><span class='Value'>𝕨</span></code> if there is a left argument, <code><span class='Value'>π•—π•˜</span></code> if operands are used, and <code><span class='Modifier'>_𝕣</span></code> and <code><span class='Modifier2'>_𝕣_</span></code> for modifiers and combinators, respectively. Any names in the header are also bound, allowing multiple assignment for arguments.</p>
<p>If there is no left argument, but the <code><span class='Function'>BODY</span></code> contains <code><span class='Value'>𝕨</span></code> at the top level, then it is conceptually re-parsed with <code><span class='Value'>𝕨</span></code> replaced by <code><span class='Nothing'>Β·</span></code> to give a monadic version before application. As the only effect when this re-parsed form is valid is to change some instances of <code><span class='Value'>arg</span></code> to <code><span class='Value'>nothing</span></code>, this can be achieved efficiently by annotating parts of the AST that depend on <code><span class='Value'>𝕨</span></code> as conditionally-nothing. However, it also causes an error if <code><span class='Value'>𝕨</span></code> is used as an operand or list element, where <code><span class='Value'>nothing</span></code> is not allowed by the grammar.</p>
<h3 id="assignment">Assignment</h3>
<p>An <em>assignment</em> is one of the four rules containing <code><span class='Function'>ASGN</span></code>. It is evaluated by first evaluating the right-hand-side <code><span class='Value'>subExpr</span></code>, <code><span class='Function'>FuncExpr</span></code>, <code><span class='Modifier'>_m1Expr</span></code>, or <code><span class='Modifier2'>_m2Exp_</span></code> expression, and then storing the result in the left-hand-side identifier or identifiers. The result of the assignment expression is the result of its right-hand side. Except for subjects, only a lone identifier is allowed on the left-hand side and storage sets it equal to the result. For subjects, <em>multiple assignment</em> with a list left-hand side is also allowed. Multiple assignment is performed recursively by assigning right-hand-side values to the left-hand-side targets, with single-identifier (<code><span class='Value'>s</span></code>) assignment as the base case. When matching the right-hand side to a list left-hand side, the left hand side is treated as a list of <code><span class='Value'>lhs</span></code> targets. The evaluated right-hand side must be a list (rank-1 array) of the same length, and is matched to these targets element-wise.</p>
diff --git a/docs/spec/grammar.html b/docs/spec/grammar.html
index 82964547..77678a95 100644
--- a/docs/spec/grammar.html
+++ b/docs/spec/grammar.html
@@ -120,7 +120,7 @@
<td><code><span class='Modifier'>_brMod1</span></code></td>
<td><code><span class='Value'>𝕗𝕣</span></code></td>
<td><code><span class='Function'>𝔽</span></code></td>
-<td><code><span class='Modifier2'>_</span><span class='Value'>𝕣</span></code></td>
+<td><code><span class='Modifier'>_𝕣</span></code></td>
<td></td>
<td></td>
</tr>
@@ -129,7 +129,7 @@
<td><code><span class='Value'>π•˜</span></code></td>
<td><code><span class='Function'>𝔾</span></code></td>
<td>None</td>
-<td><code><span class='Modifier2'>_</span><span class='Value'>𝕣</span><span class='Modifier2'>_</span></code></td>
+<td><code><span class='Modifier2'>_𝕣_</span></code></td>
<td></td>
</tr>
</tbody>
diff --git a/docs/spec/token.html b/docs/spec/token.html
index a4b708cd..3d04a407 100644
--- a/docs/spec/token.html
+++ b/docs/spec/token.html
@@ -9,7 +9,7 @@
<p>BQN source code should be considered as a series of unicode code points, which we refer to as &quot;characters&quot;. The separator between lines in a file is considered to be a single character, newline, even though some operating systems such as Windows typically represent it with a two-character CRLF sequence. Implementers should note that not all languages treat unicode code points as atomic, as exposing the UTF-8 or UTF-16 representation instead is common. For a language such as JavaScript that uses UTF-16, the double-struck characters <code><span class='Value'>𝕨</span><span class='Function'>π•Ž</span><span class='Value'>𝕩</span><span class='Function'>𝕏</span><span class='Value'>𝕗</span><span class='Function'>𝔽</span><span class='Value'>π•˜</span><span class='Function'>𝔾</span></code> are represented as two 16-bit surrogate characters, but BQN treats them as a single unit.</p>
<p>A BQN <em>character literal</em> consists of a single character between single quotes, such as <code><span class='String'>'a'</span></code>, and a <em>string literal</em> consists of any number of characters between double quotes, such as <code><span class='String'>&quot;&quot;</span></code> or <code><span class='String'>&quot;abc&quot;</span></code>. Character and string literals take precedence with comments over other tokenization rules, so that <code><span class='Comment'>#</span></code> between quotes does not start a comment and whitespace between quotes is not removed, but a quote within a comment does not start a character literal. Almost any character can be included directly in a character or string literal without escaping. The only exception is the double quote character <code>&quot;</code>, which must be written twice to include it in a string, as otherwise it would end the string instead. Character literals require no escaping at all, as the length is fixed. In particular, literals for the double and single quote characters are written <code><span class='String'>'''</span></code> and <code><span class='String'>'&quot;'</span></code>, while length-1 strings containing these characters are <code><span class='String'>&quot;'&quot;</span></code> and <code><span class='String'>&quot;&quot;&quot;&quot;</span></code>.</p>
<p>A comment consists of the hash character <code><span class='Comment'>#</span></code> and any following text until (not including) the next newline character. The initial <code><span class='Comment'>#</span></code> must not be part of a string literal started earlier. Comments are ignored entirely and do not form tokens.</p>
-<p>Identifiers and numeric literals share the same token formation rule. These tokens are formed from the <em>numeric characters</em> <code><span class='Number'>Β―βˆžΟ€.0123456789</span></code> and <em>alphabetic characters</em> <code><span class='Modifier'>_abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ</span></code> and the oddball <code><span class='Value'>𝕣</span></code>. Any sequence of these characters adjacent to each other forms a single token, which is a <em>numeric literal</em> if it begins with a numeric character and an <em>identifier</em> if it begins with an alphabetic character. Numeric literals are also subject to <a href="literal.html">numeric literal rules</a>, which specify which numeric literals are valid and which numbers they represent. If the token contains <code><span class='Value'>𝕣</span></code> it must be either <code><span class='Value'>𝕣</span></code>, <code><span class='Modifier2'>_</span><span class='Value'>𝕣</span></code>, or <code><span class='Modifier2'>_</span><span class='Value'>𝕣</span><span class='Modifier2'>_</span></code> and is considered a special name (see below). As the value taken by this identifier can only be a modifier, the uppercase character <code><span class='Value'>ℝ</span></code> is not allowed.</p>
+<p>Identifiers and numeric literals share the same token formation rule. These tokens are formed from the <em>numeric characters</em> <code><span class='Number'>Β―βˆžΟ€.0123456789</span></code> and <em>alphabetic characters</em> <code><span class='Modifier'>_abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ</span></code> and the oddball <code><span class='Value'>𝕣</span></code>. Any sequence of these characters adjacent to each other forms a single token, which is a <em>numeric literal</em> if it begins with a numeric character and an <em>identifier</em> if it begins with an alphabetic character. Numeric literals are also subject to <a href="literal.html">numeric literal rules</a>, which specify which numeric literals are valid and which numbers they represent. If the token contains <code><span class='Value'>𝕣</span></code> it must be either <code><span class='Value'>𝕣</span></code>, <code><span class='Modifier'>_𝕣</span></code>, or <code><span class='Modifier2'>_𝕣_</span></code> and is considered a special name (see below). As the value taken by this identifier can only be a modifier, the uppercase character <code><span class='Value'>ℝ</span></code> is not allowed.</p>
<p>Following this step, the whitespace characters space and tab are ignored, and do not form tokens. Only these whitespace characters, and the newline character, which does form a token, are allowed.</p>
<p>Otherwise, a single character forms a token. Only the specified set of characters can be used; others result in an error. The classes of characters are given below.</p>
<table>
diff --git a/md.bqn b/md.bqn
index e2b65992..ef85f253 100644
--- a/md.bqn
+++ b/md.bqn
@@ -459,7 +459,7 @@ TestSections ← {
# if a statement is an assignment.
idChars ← ⟨
β€’d∾"Β―.Ο€βˆž"
- ' '+βŒΎβ€’UCSβ€’a
+ "𝕣"∾˜' '+βŒΎβ€’UCSβ€’a
β€’a
"_"
⟩
@@ -511,6 +511,9 @@ GetHighlights ← {
# Color with "String" and "Comment"
col βŒˆβ†© +Β΄ (1β€Ώ2-Λœβ‰ classes) Γ— ToMaskΒ¨ tc
+ # UTF-16 hack: first half of a special name needs to match the second
+ col↩ (1⌽col) ⊣⌾((𝕩=βŠ‘"𝕩")⊸/) col
+
# Color numeric literals and identifiers
id ← col=5 # ←→ π•©βˆŠidChars
w ← 0⊸Shl⊸< id # Word (identifier or number) beginning mask
@@ -520,9 +523,6 @@ GetHighlights ← {
wi ← 1-˜+`id/w # Index of word containing each of /id
col↩(wi⊏wt)⌾(id⊸/) col
- # UTF-16 hack: first half of a special name needs to match the second
- col↩ (1⌽col) ⊣⌾((𝕩=βŠ‘"𝕩")⊸/) col
-
# Tags are placed at boundaries between different colors
boundary ← Β―1⊸ShlβŠΈβ‰  col
bcol ← boundary / col