Separate token and constant documentation into its own page

author: Marshall Lochbaum <mwlochbaum@gmail.com> 2022-07-07 21:23:06 -0400
committer: Marshall Lochbaum <mwlochbaum@gmail.com> 2022-07-07 21:23:26 -0400
commit: 77c6ab5c8435c9fcde7c4742ee0e5eb06341eeff (patch)
tree: 48ff9cf3b9066aea0e38111a9dc5ce92f87ebe96 /docs/doc/token.html
parent: f14c4af888dc678eefe1de323b8fe41f7387e82b (diff)
1 files changed, 41 insertions, 0 deletions
diff --git a/docs/doc/token.html b/docs/doc/token.html
new file mode 100644
index 00000000..75387cde
--- /dev/null
+++ b/docs/doc/token.html
@@ -0,0 +1,41 @@
+<head>
+  <link href="../favicon.ico" rel="shortcut icon" type="image/x-icon"/>
+  <link href="../style.css" rel="stylesheet"/>
+  <title>BQN: Tokens</title>
+</head>
+<div class="nav">(<a href="https://github.com/mlochbaum/BQN">github</a>) / <a href="../index.html">BQN</a> / <a href="index.html">doc</a></div>
+<h1 id="tokens"><a class="header" href="#tokens">Tokens</a></h1>
+<p>A &quot;token&quot; is the smallest part of syntax, much like a word in English. BQN's rules for forming tokens are simpler than most programming languages, because most of them are single characters. There are only a few kinds of multi-character tokens: character and string literals written with <code><span class='String'>'</span></code> and <code><span class='String'>&quot;</span></code>, and words which are numbers, names, or system values.</p>
+<p>Strings (and characters) and comments have starting and ending conditions, but can't overlap. The first one that starts first &quot;wins&quot;: it needs to end before any other token or comment can start. These also take precedence over other token rules, that is, characters inside a string or comment don't form other tokens.</p>
+<h2 id="non-tokens"><a class="header" href="#non-tokens">Non-tokens</a></h2>
+<p>Comments, and the horizontal whitespace characters space and tab, don't form tokens, since they don't do anything as far as the program's concerned. However, whitespace can be used to separate adjacent words or strings. Comments can only end with a newline, which doesn't need separating, so they <em>really</em> don't do anything, other than inform the reader about whatever you have to say. Note that newline characters (either LF or CR) are <a href="#separators">separators</a>, which are tokens.</p>
+<h3 id="comments"><a class="header" href="#comments">Comments</a></h3>
+<p>A comment starts with a <code><span class='Comment'>#</span></code> that isn't part of a character or string literal, and continues to the end of the line.</p>
+<a class="replLink" title="Open in the REPL" target="_blank" href="https://mlochbaum.github.io/BQN/try.html#code=JyMnIC0gMSAgI1RoaXMgaXMgdGhlIGNvbW1lbnQ=">↗️</a><pre>    <span class='String'>'#'</span> <span class='Function'>-</span> <span class='Number'>1</span>  <span class='Comment'>#This is the comment
+</span>'"'
+</pre>
+<p>Every line of commentary needs its own <code><span class='Comment'>#</span></code>; there's no multi-line comment syntax.</p>
+<h2 id="characters-and-strings"><a class="header" href="#characters-and-strings">Characters and strings</a></h2>
+<p>Strings—lists of characters—are written with double quotes <code><span class='String'>&quot;&quot;</span></code>, and <a href="types.html#characters">characters</a> with single quotes <code><span class='String'>''</span></code> with a single character in between. Only one character ever needs to be escaped: a double quote in a string is written twice. So <code><span class='String'>&quot;&quot;&quot;&quot;</span></code> is a one-character string of <code><span class='String'>&quot;</span></code>, and if two string literals are next to each other, they have to be separated by a space. Character literals don't have even one escape, as the length is already known. Other than the double quote, character and string literals can contain any character directly: newlines, null characters, or other Unicode.</p>
+<a class="replLink" title="Open in the REPL" target="_blank" href="https://mlochbaum.github.io/BQN/try.html#code=4omgwqgg4p+oICJzdHIiIOKLhCAicyd0IiJyIiDii4QgJ2MnIOKLhCAnJycg4ouEICciJyDin6kgICAjICIiIGlzIGFuIGVzY2FwZQoK4omhwqgg4p+oICJhIiDii4QgJ2EnIOKfqSAgICMgQSBzdHJpbmcgaXMgYW4gYXJyYXkgYnV0IGEgY2hhcmFjdGVyIGlzbid0">↗️</a><pre>    <span class='Function'>≠</span><span class='Modifier'>¨</span> <span class='Bracket'>⟨</span> <span class='String'>&quot;str&quot;</span> <span class='Separator'>⋄</span> <span class='String'>&quot;s't&quot;&quot;r&quot;</span> <span class='Separator'>⋄</span> <span class='String'>'c'</span> <span class='Separator'>⋄</span> <span class='String'>'''</span> <span class='Separator'>⋄</span> <span class='String'>'&quot;'</span> <span class='Bracket'>⟩</span>   <span class='Comment'># &quot;&quot; is an escape
+</span>⟨ 3 5 1 1 1 ⟩
+
+    <span class='Function'>≡</span><span class='Modifier'>¨</span> <span class='Bracket'>⟨</span> <span class='String'>&quot;a&quot;</span> <span class='Separator'>⋄</span> <span class='String'>'a'</span> <span class='Bracket'>⟩</span>   <span class='Comment'># A string is an array but a character isn't
+</span>⟨ 1 0 ⟩
+</pre>
+<p>But including a null character in your source code is probably not a great idea for other reasons. The null character (code point 0) has a dedicated literal representation <code><span class='String'>@</span></code>. Null can be used with <a href="arithmetic.html#character-arithmetic">character arithmetic</a> to directly convert between characters and numeric code points, which among many other uses allows tricky characters to be entered by code point: for example, a non-breaking space is <code><span class='String'>@</span><span class='Function'>+</span><span class='Number'>160</span></code>.</p>
+<h2 id="words"><a class="header" href="#words">Words</a></h2>
+<p>Numbers and variable names share a token formation rule, and are collectively called words. A word is a number if it starts with a digit or numeric character <code><span class='Number'>¯∞π</span></code>, and a name otherwise.</p>
+<p>Words are formed from digits, letters, and the characters <code><span class='Modifier2'>_</span><span class='Value'>.</span><span class='Number'>¯∞π</span></code>. All these characters stick together, so that you need to separate words with whitespace in order to write them next to each other. But <code><span class='Value'>.</span></code> only counts if it's followed by a digit: otherwise it forms its own token to support <a href="namespace.html#imports">namespace syntax</a> <code><span class='Value'>ns.field</span></code>. A word may be preceded by <code><span class='Value'>•</span></code> to form a system name.</p>
+<p>The character <code><span class='Value'>𝕣</span></code> also sticks with other word-forming characters, but is only allowed form the special names <code><span class='Value'>𝕣</span></code>, <code><span class='Modifier'>_𝕣</span></code>, and <code><span class='Modifier2'>_𝕣_</span></code>.</p>
+<h3 id="numbers"><a class="header" href="#numbers">Numbers</a></h3>
+<p><a href="types.html#numbers">Numbers</a> are written as decimals, allowing <code><span class='Number'>¯</span></code> for the negative sign (because <code><span class='Function'>-</span></code> is a function) and <code><span class='Value'>e</span></code> or <code><span class='Function'>E</span></code> for scientific notation. They must have digits before and after the decimal point (so, <code><span class='Number'>0.5</span></code> instead of <code><span class='Number'>.5</span></code>), and any exponent must be an integer. Two special numbers <code><span class='Number'>∞</span></code> and <code><span class='Number'>π</span></code> are supported, possibly with a minus sign. If complex numbers are supported (no implementation to date has them), then they can be written with the components separated by <code><span class='Value'>i</span></code> or <code><span class='Function'>I</span></code>.</p>
+<a class="replLink" title="Open in the REPL" target="_blank" href="https://mlochbaum.github.io/BQN/try.html#code=4p+oIMKvz4Ag4ouEIDAuNSDii4QgNWXCrzEg4ouEIDEuNUUzIOKLhCDiiJ4g4p+pICAgIyBBIGxpc3Qgb2YgbnVtYmVycw==">↗️</a><pre>    <span class='Bracket'>⟨</span> <span class='Number'>¯π</span> <span class='Separator'>⋄</span> <span class='Number'>0.5</span> <span class='Separator'>⋄</span> <span class='Number'>5e¯1</span> <span class='Separator'>⋄</span> <span class='Number'>1.5E3</span> <span class='Separator'>⋄</span> <span class='Number'>∞</span> <span class='Bracket'>⟩</span>   <span class='Comment'># A list of numbers
+</span>⟨ ¯3.141592653589793 0.5 0.5 1500 ∞ ⟩
+</pre>
+<h3 id="names"><a class="header" href="#names">Names</a></h3>
+<p>A variable name starts with a letter but otherwise can contain anything, including characters like <code><span class='Number'>∞</span></code> and <code><span class='Number'>¯</span></code>. Names represent identifiers according to the rules of <a href="lexical.html">lexical scoping</a>. A somewhat unusual feature of BQN is that identifiers are case- and underscore-insensitive, so that <code><span class='Value'>abc</span></code> is treated as the same name as <code><span class='Modifier'>_a_B_c</span></code>. This works with the <a href="expression.html#role-spellings">role spelling</a> system so that changing the case or adding underscores allows the same variable to be used in different roles.</p>
+<p>The system dot <code><span class='Value'>•</span></code> can only start a word, and must be followed by a name. This accesses a system value such as the debugging display function <code><span class='Function'>•Show</span></code>.</p>
+<h2 id="separators"><a class="header" href="#separators">Separators</a></h2>
+<p>The characters <code><span class='Separator'>⋄</span></code> and <code><span class='Separator'>,</span></code> and newline are completely interchangeable and are used to separate expressions. An expression might be an element in a list or a line in a block. Empty sections—those that consist only of whitespace—are ignored. This means that any number of separators can be used between expressions, and that leading and trailing separators are also allowed. The expressions are evaluated in text order: left to right and top to bottom.</p>
+<p>Both LF and CR are allowed as newline characters, and CRLF functions as a separator too because of the way multiple separators work.</p>
author	Marshall Lochbaum <mwlochbaum@gmail.com>	2022-07-07 21:23:06 -0400
committer	Marshall Lochbaum <mwlochbaum@gmail.com>	2022-07-07 21:23:26 -0400
commit	77c6ab5c8435c9fcde7c4742ee0e5eb06341eeff (patch)
tree	48ff9cf3b9066aea0e38111a9dc5ce92f87ebe96 /docs/doc/token.html
parent	f14c4af888dc678eefe1de323b8fe41f7387e82b (diff)