Highlight quotes as strings even if unpaired

author: Marshall Lochbaum <mwlochbaum@gmail.com> 2020-09-03 22:15:24 -0400
committer: Marshall Lochbaum <mwlochbaum@gmail.com> 2020-09-03 22:15:24 -0400
commit: ceaa82c6d1564b2ca7965c4f29b51f45ad1c2933 (patch)
tree: c49c0e9c48abf57a82b0bc5d197a2f7421a8167a /docs
parent: af948faa3d79ae682d971c71704f9334cf8e847f (diff)
4 files changed, 5 insertions, 5 deletions
diff --git a/docs/doc/based.html b/docs/doc/based.html
index 5a6513db..031c9572 100644
--- a/docs/doc/based.html
+++ b/docs/doc/based.html
@@ -42,7 +42,7 @@
 <p>Arrays in BQN, like nearly all data structures in modern programming languages, are an <a href="https://en.wikipedia.org/wiki/Inductive_type">inductive type</a>. That means that an array can be constructed from existing values, but can't contain itself (including recursively: an array always has finite depth). To construct the type of all BQN values inductively, we would say that atoms form the base case, and arrays are an inductive case: an array is a shaped collection of existing BQN values. For an array programmer, this is of course the easy part.</p>
 <h2 id="versus-the-nested-array-model">Versus the nested array model</h2>
 <p>The <a href="https://aplwiki.com/wiki/Array_model#Nested_array_theory">nested array model</a> of NARS, APL2, Dyalog, and GNU APL can be constructed from the based model by adding a rule: a scalar array containing an atom is equivalent to that atom. The equivalents of atoms in nested array theory are thus called &quot;simple scalars&quot;, and they are considered arrays but share the same characteristics of BQN atoms. Nested arrays don't form an inductive type, because simple scalars contain themselves.</p>
-<p>Nested array theory can seem simpler to use, because the programmer never has to worry about simple scalars being enclosed the wrong number of times: all these encloses have been identified with each other. For example, <code>'<span class='Value'>abcd</span>'<span class='Value'>[</span><span class='Number'>2</span><span class='Value'>]</span></code> returns a character while BQN's <code><span class='Number'>2</span><span class='Function'>⊏</span><span class='String'>&quot;abcd&quot;</span></code> returns a scalar containing a character. However, these issues usually still appear with more complex arrays: <code>'<span class='Value'>ab</span>' <span class='Number'>1</span> '<span class='Value'>ef</span>'<span class='Value'>[</span><span class='Number'>2</span><span class='Value'>]</span></code> (here spaces are used for stranding) is not a string but an enclosed string!</p>
+<p>Nested array theory can seem simpler to use, because the programmer never has to worry about simple scalars being enclosed the wrong number of times: all these encloses have been identified with each other. For example, <code><span class='String'>'</span><span class='Value'>abcd</span><span class='String'>'</span><span class='Value'>[</span><span class='Number'>2</span><span class='Value'>]</span></code> returns a character while BQN's <code><span class='Number'>2</span><span class='Function'>⊏</span><span class='String'>&quot;abcd&quot;</span></code> returns a scalar containing a character. However, these issues usually still appear with more complex arrays: <code><span class='String'>'</span><span class='Value'>ab</span><span class='String'>'</span> <span class='Number'>1</span> <span class='String'>'</span><span class='Value'>ef</span><span class='String'>'</span><span class='Value'>[</span><span class='Number'>2</span><span class='Value'>]</span></code> (here spaces are used for stranding) is not a string but an enclosed string!</p>
 <p>A property that might warn about dangerous issues like this is that nested array theory tends to create <em>inversions</em> where the depth of a particular array depends on its rank (reversing the normal hierarchy of depth→rank→shape). A 1-character string has depth 1, but when its rank is reduced to 0, its depth is reduced as well.</p>
 <p>In some cases nested array theory can remove a depth issue entirely, and not just partially. Most notable is the <a href="../problems.html#search-function-depth">search function result depth</a> issue, in which it's impossible for a search function in BQN to return an atomic number because it always returns an array. Nested array theory doesn't have this issue since a scalar number is &quot;just a number&quot;, and more complicated arrays can't cause problems because a search function's result is always a numeric array. The other half of the problem, about the non-principal argument depth, is only partly hidden, and causes problems for example when searching for a single string out of a list of strings.</p>
 <h2 id="versus-the-boxed-array-model">Versus the boxed array model</h2>
diff --git a/docs/doc/syntax.html b/docs/doc/syntax.html
index 42fe714a..aafb86d9 100644
--- a/docs/doc/syntax.html
+++ b/docs/doc/syntax.html
@@ -21,7 +21,7 @@
 <td><a href="#comments">Comment</a></td>
 </tr>
 <tr>
-<td><code>'&quot;</code></td>
+<td><code><span class='String'>'&quot;</span></code></td>
 <td><a href="#constants">Character or string literal</a></td>
 </tr>
 <tr>
@@ -102,7 +102,7 @@
 <a class="replLink" title="Open in the REPL" target="_blank" href="https://mlochbaum.github.io/BQN/try.html#code=4p+oIMKvz4Ag4ouEIDAuNSDii4QgNWXCrzEg4ouEIDEuNUUzIOKLhCDiiJ4g4p+pICAgIyBBIGxpc3Qgb2YgbnVtYmVycw==&run">↗️</a><pre>    <span class='Bracket'>⟨</span> <span class='Number'>¯π</span> <span class='Separator'>⋄</span> <span class='Number'>0.5</span> <span class='Separator'>⋄</span> <span class='Number'>5e¯1</span> <span class='Separator'>⋄</span> <span class='Number'>1.5E3</span> <span class='Separator'>⋄</span> <span class='Number'>∞</span> <span class='Bracket'>⟩</span>   <span class='Comment'># A list of numbers
 </span>⟨ ¯3.14159265358979 0.5 0.5 1500 ∞ ⟩
 </pre>
-<p>Strings are written with double quotes <code><span class='String'>&quot;&quot;</span></code>, and characters with single quotes <code>''</code> with a single character in between. A double quote within a string can be escaped by writing it twice; if two string literals are next to each other, they must be separated by a space. In contrast, character literals do not use escapes, as the length is already known.</p>
+<p>Strings are written with double quotes <code><span class='String'>&quot;&quot;</span></code>, and characters with single quotes <code><span class='String'>''</span></code> with a single character in between. A double quote within a string can be escaped by writing it twice; if two string literals are next to each other, they must be separated by a space. In contrast, character literals do not use escapes, as the length is already known.</p>
 <a class="replLink" title="Open in the REPL" target="_blank" href="https://mlochbaum.github.io/BQN/try.html#code=4omgwqgg4p+oICJzdHIiIOKLhCAicyd0IiJyIiDii4QgJ2MnIOKLhCAnJycg4ouEICciJyDin6kgICAjICIiIGlzIGFuIGVzY2FwZQoK4omhwqgg4p+oICJhIiDii4QgJ2EnIOKfqSAgICMgQSBzdHJpbmcgaXMgYW4gYXJyYXkgYnV0IGEgY2hhcmFjdGVyIGlzbid0&run">↗️</a><pre>    <span class='Function'>≠</span><span class='Modifier'>¨</span> <span class='Bracket'>⟨</span> <span class='String'>&quot;str&quot;</span> <span class='Separator'>⋄</span> <span class='String'>&quot;s't&quot;&quot;r&quot;</span> <span class='Separator'>⋄</span> <span class='String'>'c'</span> <span class='Separator'>⋄</span> <span class='String'>'''</span> <span class='Separator'>⋄</span> <span class='String'>'&quot;'</span> <span class='Bracket'>⟩</span>   <span class='Comment'># &quot;&quot; is an escape
 </span>⟨ 3 5 1 1 1 ⟩
 
diff --git a/docs/spec/literal.html b/docs/spec/literal.html
index d51bf50f..04491352 100644
--- a/docs/spec/literal.html
+++ b/docs/spec/literal.html
@@ -6,7 +6,7 @@
 <div class="nav"><a href="https://github.com/mlochbaum/BQN">BQN</a></div>
 <h1 id="specification-bqn-literal-notation">Specification: BQN literal notation</h1>
 <p>A <em>literal</em> is a single <a href="token.html">token</a> that indicates a fixed character, number, or array. While literals indicate values of a data type, <a href="primitive.html">primitives</a> indicate values of an operation type: function, 1-modifier, or 2-modifier.</p>
-<p>Two types of literal deal with text. As the source code is considered to be a sequence of unicode code points (&quot;characters&quot;), and these code points are also used for BQN's character <a href="types.html">data type</a>, the representation of a text literal is very similar to its value. In a text literal, the newline character is always represented using the ASCII line feed character, code point 10. A <em>character literal</em> is enclosed with single quotes <code>'</code> and its value is identical to the single character between them. A <em>string literal</em> is enclosed in double quotes <code>&quot;</code>, and any double quotes between them must come in pairs, as a lone double quote marks the end of the literal. The value of a string literal is a rank-1 array whose elements are the characters in between the enclosing quotes, after replacing each pair of double quotes with only one such quote.</p>
+<p>Two types of literal deal with text. As the source code is considered to be a sequence of unicode code points (&quot;characters&quot;), and these code points are also used for BQN's character <a href="types.html">data type</a>, the representation of a text literal is very similar to its value. In a text literal, the newline character is always represented using the ASCII line feed character, code point 10. A <em>character literal</em> is enclosed with single quotes <code><span class='String'>'</span></code> and its value is identical to the single character between them. A <em>string literal</em> is enclosed in double quotes <code><span class='String'>&quot;</span></code>, and any double quotes between them must come in pairs, as a lone double quote marks the end of the literal. The value of a string literal is a rank-1 array whose elements are the characters in between the enclosing quotes, after replacing each pair of double quotes with only one such quote.</p>
 <p>The format of a <em>numeric literal</em> is more complicated. From the <a href="token.html">tokenization rules</a>, a numeric literal consists of a numeric character (one of <code><span class='Number'>¯∞π.0123456789</span></code>) followed by any number of numeric or alphabetic characters. Some numeric literals are <em>valid</em> and indicate a number, while others are invalid and cause an error. The grammar for valid numbers is given below in a <a href="https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form">BNF</a> variant. Only four alphabetic characters are allowed: &quot;i&quot;, which separates the real and imaginary components of a complex number, &quot;e&quot;, which functions as in scientific notation, and the uppercase versions of these letters.</p>
 <pre><span class='Value'>number</span>    <span class='Function'>=</span> <span class='Value'>component</span> <span class='Paren'>(</span> <span class='Paren'>(</span> <span class='String'>&quot;i&quot;</span> <span class='Function'>|</span> <span class='String'>&quot;I&quot;</span> <span class='Paren'>)</span> <span class='Value'>component</span> <span class='Paren'>)</span><span class='Value'>?</span>
 <span class='Value'>component</span> <span class='Function'>=</span> <span class='Value'>mantissa</span> <span class='Paren'>(</span> <span class='Paren'>(</span> <span class='String'>&quot;e&quot;</span> <span class='Function'>|</span> <span class='String'>&quot;E&quot;</span> <span class='Paren'>)</span> <span class='Value'>exponent</span> <span class='Paren'>)</span><span class='Value'>?</span>
diff --git a/docs/spec/token.html b/docs/spec/token.html
index 87ae06ab..010c033e 100644
--- a/docs/spec/token.html
+++ b/docs/spec/token.html
@@ -7,7 +7,7 @@
 <h1 id="specification-bqn-token-formation">Specification: BQN token formation</h1>
 <p>This page describes BQN's token formation rules (token formation is also called scanning). Most tokens in BQN are a single character long, but quoted characters and strings, identifiers, and numbers can consist of multiple characters, and comments, spaces, and tabs are discarded during token formation.</p>
 <p>BQN source code should be considered as a series of unicode code points, which we refer to as &quot;characters&quot;. The separator between lines in a file is considered to be a single character, newline, even though some operating systems such as Windows typically represent it with a two-character CRLF sequence. Implementers should note that not all languages treat unicode code points as atomic, as exposing the UTF-8 or UTF-16 representation instead is common. For a language such as JavaScript that uses UTF-16, the double-struck characters <code><span class='Value'>𝕨</span><span class='Function'>𝕎</span><span class='Value'>𝕩</span><span class='Function'>𝕏</span><span class='Value'>𝕗</span><span class='Function'>𝔽</span><span class='Value'>𝕘</span><span class='Function'>𝔾</span></code> are represented as two 16-bit surrogate characters, but BQN treats them as a single unit.</p>
-<p>A BQN <em>character literal</em> consists of a single character between single quotes, such as <code><span class='String'>'a'</span></code>, and a <em>string literal</em> consists of any number of characters between double quotes, such as <code><span class='String'>&quot;&quot;</span></code> or <code><span class='String'>&quot;abc&quot;</span></code>. Character and string literals take precedence with comments over other tokenization rules, so that <code><span class='Comment'>#</span></code> between quotes does not start a comment and whitespace between quotes is not removed, but a quote within a comment does not start a character literal. Almost any character can be included directly in a character or string literal without escaping. The only exception is the double quote character <code>&quot;</code>, which must be written twice to include it in a string, as otherwise it would end the string instead. Character literals require no escaping at all, as the length is fixed. In particular, literals for the double and single quote characters are written <code><span class='String'>'''</span></code> and <code><span class='String'>'&quot;'</span></code>, while length-1 strings containing these characters are <code><span class='String'>&quot;'&quot;</span></code> and <code><span class='String'>&quot;&quot;&quot;&quot;</span></code>.</p>
+<p>A BQN <em>character literal</em> consists of a single character between single quotes, such as <code><span class='String'>'a'</span></code>, and a <em>string literal</em> consists of any number of characters between double quotes, such as <code><span class='String'>&quot;&quot;</span></code> or <code><span class='String'>&quot;abc&quot;</span></code>. Character and string literals take precedence with comments over other tokenization rules, so that <code><span class='Comment'>#</span></code> between quotes does not start a comment and whitespace between quotes is not removed, but a quote within a comment does not start a character literal. Almost any character can be included directly in a character or string literal without escaping. The only exception is the double quote character <code><span class='String'>&quot;</span></code>, which must be written twice to include it in a string, as otherwise it would end the string instead. Character literals require no escaping at all, as the length is fixed. In particular, literals for the double and single quote characters are written <code><span class='String'>'''</span></code> and <code><span class='String'>'&quot;'</span></code>, while length-1 strings containing these characters are <code><span class='String'>&quot;'&quot;</span></code> and <code><span class='String'>&quot;&quot;&quot;&quot;</span></code>.</p>
 <p>A comment consists of the hash character <code><span class='Comment'>#</span></code> and any following text until (not including) the next newline character. The initial <code><span class='Comment'>#</span></code> must not be part of a string literal started earlier. Comments are ignored entirely and do not form tokens.</p>
 <p>Identifiers and numeric literals share the same token formation rule. These tokens are formed from the <em>numeric characters</em> <code><span class='Number'>¯∞π.0123456789</span></code> and <em>alphabetic characters</em> <code><span class='Modifier'>_abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ</span></code> and the oddball <code><span class='Value'>𝕣</span></code>. Any sequence of these characters adjacent to each other forms a single token, which is a <em>numeric literal</em> if it begins with a numeric character and an <em>identifier</em> if it begins with an alphabetic character. Numeric literals are also subject to <a href="literal.html">numeric literal rules</a>, which specify which numeric literals are valid and which numbers they represent. If the token contains <code><span class='Value'>𝕣</span></code> it must be either <code><span class='Value'>𝕣</span></code>, <code><span class='Modifier'>_𝕣</span></code>, or <code><span class='Modifier2'>_𝕣_</span></code> and is considered a special name (see below). As the value taken by this identifier can only be a modifier, the uppercase character <code><span class='Value'>ℝ</span></code> is not allowed.</p>
 <p>Following this step, the whitespace characters space and tab are ignored, and do not form tokens. Only these whitespace characters, and the newline character, which does form a token, are allowed.</p>
author	Marshall Lochbaum <mwlochbaum@gmail.com>	2020-09-03 22:15:24 -0400
committer	Marshall Lochbaum <mwlochbaum@gmail.com>	2020-09-03 22:15:24 -0400
commit	ceaa82c6d1564b2ca7965c4f29b51f45ad1c2933 (patch)
tree	c49c0e9c48abf57a82b0bc5d197a2f7421a8167a /docs
parent	af948faa3d79ae682d971c71704f9334cf8e847f (diff)