aboutsummaryrefslogtreecommitdiff
path: root/docs/spec/token.html
diff options
context:
space:
mode:
authorMarshall Lochbaum <mwlochbaum@gmail.com>2020-07-20 14:05:36 -0400
committerMarshall Lochbaum <mwlochbaum@gmail.com>2020-07-20 14:05:36 -0400
commite52d50ed594dd5626523ca7931315e47bde8c9d1 (patch)
treed355718d8f81ea0773cd177e5b5eb17e043a5da3 /docs/spec/token.html
parent883eda3df8e162e2d8a62b4d1ec03eadcf8b8069 (diff)
Make header id slugs match Github's
Diffstat (limited to 'docs/spec/token.html')
-rw-r--r--docs/spec/token.html2
1 files changed, 1 insertions, 1 deletions
diff --git a/docs/spec/token.html b/docs/spec/token.html
index 9a121561..c12fc3e4 100644
--- a/docs/spec/token.html
+++ b/docs/spec/token.html
@@ -1,6 +1,6 @@
<head><link href="../style.css" rel="stylesheet"/></head>
<div class="nav"><a href="https://github.com/mlochbaum/BQN">BQN</a></div>
-<h1 id="specification--bqn-token-formation">Specification: BQN token formation</h1>
+<h1 id="specification-bqn-token-formation">Specification: BQN token formation</h1>
<p>This page describes BQN's token formation rules (token formation is also called scanning). Most tokens in BQN are a single character long, but quoted characters and strings, identifiers, and numbers can consist of multiple characters, and comments, spaces, and tabs are discarded during token formation.</p>
<p>BQN source code should be considered as a series of unicode code points, which we refer to as &quot;characters&quot;. The separator between lines in a file is considered to be a single character, newline, even though some operating systems such as Windows typically represent it with a two-character CRLF sequence. Implementers should note that not all languages treat unicode code points as atomic, as exposing the UTF-8 or UTF-16 representation instead is common. For a language such as JavaScript that uses UTF-16, the double-struck characters <code><span class='Value'>𝕨</span><span class='Function'>𝕎</span><span class='Value'>𝕩</span><span class='Function'>𝕏</span><span class='Value'>𝕗</span><span class='Function'>𝔽</span><span class='Value'>𝕘</span><span class='Function'>𝔾</span></code> are represented as two 16-bit surrogate characters, but BQN treats them as a single unit.</p>
<p>A BQN <em>character literal</em> consists of a single character between single quotes, such as <code><span class='String'>'a'</span></code>, and a <em>string literal</em> consists of any number of characters between double quotes, such as <code><span class='String'>&quot;&quot;</span></code> or <code><span class='String'>&quot;abc&quot;</span></code>. Character and string literals take precedence with comments over other tokenization rules, so that <code><span class='Comment'>#</span></code> between quotes does not start a comment and whitespace between quotes is not removed, but a quote within a comment does not start a character literal. Almost any character can be included directly in a character or string literal without escaping. The only exception is the double quote character <code>&quot;</code>, which must be written twice to include it in a string, as otherwise it would end the string instead. Character literals require no escaping at all, as the length is fixed. In particular, literals for the double and single quote characters are written <code><span class='String'>'''</span></code> and <code><span class='String'>'&quot;'</span></code>, while length-1 strings containing these characters are <code><span class='String'>&quot;'&quot;</span></code> and <code><span class='String'>&quot;&quot;&quot;&quot;</span></code>.</p>