Separate token and constant documentation into its own page

author: Marshall Lochbaum <mwlochbaum@gmail.com> 2022-07-07 21:23:06 -0400
committer: Marshall Lochbaum <mwlochbaum@gmail.com> 2022-07-07 21:23:26 -0400
commit: 77c6ab5c8435c9fcde7c4742ee0e5eb06341eeff (patch)
tree: 48ff9cf3b9066aea0e38111a9dc5ce92f87ebe96 /doc
parent: f14c4af888dc678eefe1de323b8fe41f7387e82b (diff)
7 files changed, 79 insertions, 38 deletions
diff --git a/doc/README.md b/doc/README.md
index 7d0a1fd4..f0c5ae18 100644
--- a/doc/README.md
+++ b/doc/README.md
@@ -19,6 +19,7 @@ References:
 - [BQN as combinatory logic](birds.md)
 
 Concepts:
+- [Tokens and constants](token.md)
 - [Expression syntax](expression.md)
   - [Context-free grammar](context.md)
 - [Arrays](array.md)
diff --git a/doc/arithmetic.md b/doc/arithmetic.md
index e51fc53e..bd61ed0e 100644
--- a/doc/arithmetic.md
+++ b/doc/arithmetic.md
@@ -49,7 +49,7 @@ Each of these functions also has a meaning with only one argument, although math
 
         √ 0‿1‿2‿4
 
-Take note of the difference between the function `-`, and the "high minus" character `¯`, which is a part of [numeric notation](syntax.md#constants). Also shown is the number `∞`, which BQN supports along with `¯∞` (but depending on implementation BQN may or may not keep track of `¯0`. Integer optimization loses the distinction so it's best not to rely on it).
+Take note of the difference between the function `-`, and the "high minus" character `¯`, which is a part of [numeric notation](token.md#numbers). Also shown is the number `∞`, which BQN supports along with `¯∞` (but depending on implementation BQN may or may not keep track of `¯0`. Integer optimization loses the distinction so it's best not to rely on it).
 
 The logarithm is written with [Undo](undo.md): `⋆⁼`. As with Power, the default base is *e*, giving a natural logarithm.
 
diff --git a/doc/arrayrepr.md b/doc/arrayrepr.md
index 205c4a44..63dba983 100644
--- a/doc/arrayrepr.md
+++ b/doc/arrayrepr.md
@@ -92,7 +92,7 @@ Now it's time to discuss ways to write arrays in a BQN program. There are three
 
 ### Strings
 
-A **string** consists of a sequence of characters surrounded by double quotes `""`. The only rule for the characters inside is that any double quote must be escaped by repeating it twice; otherwise the string ends at that point.
+A [**string** literal](token.md#characters-and-strings) consists of a sequence of characters surrounded by double quotes `""`. The only rule for the characters inside is that any double quote must be escaped by repeating it twice; otherwise the string ends at that point.
 
         "-'×%""*"
 
@@ -102,7 +102,7 @@ Even special characters like a newline can appear in a string literal, so that s
 
 ### Brackets
 
-**List notation** uses angle brackets `⟨⟩`. The contents are structurally identical to those of a [block](block.md), that is, a list of expressions [separated](syntax.md#separators) by `,` or `⋄` or newlines. Unlike a block, a list doesn't need to have any expressions: `⟨⟩` or `⟨⋄⟩` or `⟨,,⋄,⟩` will create an empty list. Other differences are that a list doesn't introduce a new [scope](lexical.md) and all of the expressions have to result in a value, not [Nothing](expression.md#nothing) (`·`).
+**List notation** uses angle brackets `⟨⟩`. The contents are structurally identical to those of a [block](block.md), that is, a list of expressions [separated](token.md#separators) by `,` or `⋄` or newlines. Unlike a block, a list doesn't need to have any expressions: `⟨⟩` or `⟨⋄⟩` or `⟨,,⋄,⟩` will create an empty list. Other differences are that a list doesn't introduce a new [scope](lexical.md) and all of the expressions have to result in a value, not [Nothing](expression.md#nothing) (`·`).
 
 Entries in a list are evaluated in source order, and the value will be the list of those results. The list has a subject [role](expression.md#syntactic-role), even if it contains expressions with other roles. Any value can be an element.
 
diff --git a/doc/expression.md b/doc/expression.md
index 5fa6cdc7..5359a672 100644
--- a/doc/expression.md
+++ b/doc/expression.md
@@ -2,7 +2,7 @@
 
 # Expression syntax
 
-BQN expressions are the part of [syntax](syntax.md) that describes computations to perform. Programs are mainly made up of expressions with a little organizing material like [blocks](block.md) and [namespaces](namespace.md) around them. This page explains how functions, modifiers, and assignment combine with their inputs. It doesn't describe [constant](syntax.md#constants) and [array](arrayrepr.md#array-literals) literals, which each form a single subject for grammatical purposes.
+BQN expressions are the part of [syntax](syntax.md) that describes computations to perform. Programs are mainly made up of expressions with a little organizing material like [blocks](block.md) and [namespaces](namespace.md) around them. This page explains how functions, modifiers, and assignment combine with their inputs. It doesn't describe [constant](token.md) and [array](arrayrepr.md#array-literals) literals, which each form a single subject for grammatical purposes.
 
 The [first tutorial](../tutorial/expression.md) also covers how to build and read BQN expressions.
 
diff --git a/doc/glossary.md b/doc/glossary.md
index cd68e6c9..cd5b1853 100644
--- a/doc/glossary.md
+++ b/doc/glossary.md
@@ -89,10 +89,10 @@ BQN uses standard terminology for particular sets of numbers, with natural numbe
 * [**Primitive**](primitive.md): One of several fixed operations defined by the language, denoted by a single-character token.
 * **Word**: A sequence of alphabetic or numeric characters.
 * **Name**: A word that starts with an alphabetic character. Names are compared case-insensitively and ignoring underscores `_`.
-* [**Numeric literal**](syntax.md#constants): A word that starts with a numeric character, indicating a number.
+* [**Numeric literal**](token.md#numbers): A word that starts with a numeric character, indicating a number.
 * [**String literal**](arrayrepr.md#strings): A literal written with double quotes `""`, indicating a string.
-* [**Character literal**](syntax.md#constants): A literal written with single quotes `''`, indicating a string.
-* [**Null literal**](syntax.md#constants): The literal `@`, indicating the null character (code point 0).
+* [**Character literal**](token.md#characters-and-strings): A literal written with single quotes `''`, indicating a string.
+* [**Null literal**](token.md#characters-and-strings): The literal `@`, indicating the null character (code point 0).
 
 ## Grammar
 
diff --git a/doc/syntax.md b/doc/syntax.md
index da334d6d..86b88d3a 100644
--- a/doc/syntax.md
+++ b/doc/syntax.md
@@ -15,11 +15,11 @@ Here's a full table of precedence for BQN's glyphs (broader than "operator prece
 |       | [Stranding](#list-and-array-notation) | n-ary         | `‿`
 |       | Modifier                              | Left-to-right | `∘⎉¨´`…          | `↩` in `Fn↩`
 |       | Function                              | Right-to-left | `+↕⊔⍉`…          | `←↩⇐`
-|       | [Separator](#separators)              |               | `⋄,` and newline | `?`
+|       | [Separator](token.md#separators)      |               | `⋄,` and newline | `?`
 |       | [Header](block.md#block-headers)      |               | `:`
 | Low   | [Body](block.md#multiple-bodies)      |               | `;`
 
-While all of BQN's grammar fits into this table somehow, it's not really the whole story because subexpressions including parentheses and blocks might behave like functions or modifiers.
+While all of BQN's grammar fits into this table somehow, it's not really the whole story because subexpressions including parentheses and blocks might behave like functions or modifiers. See [expressions](#expressions) and [blocks](#blocks).
 
 ## Special glyphs
 
@@ -27,17 +27,17 @@ The following glyphs are used for BQN syntax. [Primitives](primitive.md) (built-
 
 Glyph(s)        | Meaning
 ----------------|-----------
-`#`             | [Comment](#comments)
-`'"`            | [Character or string literal](#constants)
-`@`             | [Null character](#constants)
-`¯∞π`           | [Used in numeric literals](#constants)
+`#`             | [Comment](token.md#comments)
+`'"`            | [Character or string literal](token.md#characters-and-strings)
+`@`             | [Null character](token.md#characters-and-strings)
+`¯∞π`           | [Used in numeric literals](token.md#numbers)
 `·`             | [Nothing](expression.md#nothing)
 `()`            | [Expression grouping](expression.md#parentheses)
 `←`             | [Define](expression.md#assignment)
 `⇐`             | [Export](namespace.md#exports)
 `↩`             | [Change](expression.md#assignment)
 `.`             | Namespace [field access](namespace.md#imports)
-`⋄,` or newline | Statement or element [separator](#separators)
+`⋄,` or newline | Statement or element [separator](token.md#separators)
 `⟨⟩`            | [List](#list-and-array-notation)
 `[]`            | [Array](#list-and-array-notation)
 `‿`             | [Strand](#list-and-array-notation) (lightweight list syntax)
@@ -52,27 +52,16 @@ Glyph(s)        | Meaning
 `𝕘𝔾`            | [Right operand of a 2-modifier](#blocks)
 `𝕣`             | [Modifier self-reference](#blocks)
 
-## Comments
+## Tokens
 
-A comment starts with a `#` that isn't part of a character or string literal, and continues to the end of the line.
+*[Full documentation](token.md)*
 
-        '#' - 1  #This is the comment
-
-## Constants
-
-BQN has single-token notation for numbers, strings, and characters.
-
-[Numbers](types.md#numbers) are written as decimals, allowing `¯` for the negative sign (because `-` is a function) and `e` or `E` for scientific notation. They must have digits before and after the decimal point (so, `0.5` instead of `.5`), and any exponent must be an integer. Two special numbers `∞` and `π` are supported, possibly with a minus sign. If complex numbers are supported (no implementation to date has them), then they can be written with the components separated by `i` or `I`.
-
-        ⟨ ¯π ⋄ 0.5 ⋄ 5e¯1 ⋄ 1.5E3 ⋄ ∞ ⟩   # A list of numbers
-
-Strings—lists of characters—are written with double quotes `""`, and [characters](types.md#characters) with single quotes `''` with a single character in between. Only one character ever needs to be escaped: a double quote in a string is written twice. So `""""` is a one-character string of `"`, and if two string literals are next to each other, they have to be separated by a space. Character literals don't have even one escape, as the length is already known. Other than the double quote, character and string literals can contain anything: newlines, null characters, or any other Unicode.
-
-        ≠¨ ⟨ "str" ⋄ "s't""r" ⋄ 'c' ⋄ ''' ⋄ '"' ⟩   # "" is an escape
-
-        ≡¨ ⟨ "a" ⋄ 'a' ⟩   # A string is an array but a character isn't
-
-But including a null character in your source code is probably not a great idea for other reasons. The null character (code point 0) has a dedicated literal representation `@`. Null can be used with [character arithmetic](arithmetic.md#character-arithmetic) to directly convert between characters and numeric code points, which among many other uses allows tricky characters to be entered by code point: for example, a non-breaking space is `@+160`.
+BQN syntax is made up of tokens, which are mostly single characters. But there are a few exceptions:
+- [Comments](token.md#comments) start with `#` and end at the end of the line.
+- [Character literals](token.md#characters-and-strings) start and end with `'`, and have exactly one character in between.
+- [String literals](token.md#characters-and-strings) start and end with `"`. Pairs of quotes `""` in between represent one quote character, and other characters (including `'`) represent themselves.
+- [Numbers](token.md#numbers) support decimal (`.`) and scientific (`e`) notation, plus `π` and `∞`, and use `¯` for a minus sign.
+- [Variable names](token.md#names) allow letters, underscores, and numeric characters. They're matched case-insensitively, with a [spelling system](expression.md#role-spellings) that determines role.
 
 ## Expressions
 
@@ -95,11 +84,7 @@ The double arrow `⇐` is used for functionality relating to [namespaces](namesp
 
 ## Arrays and blocks
 
-Arrays and code blocks can both be represented as sequences of expressions in source code. There are paired bracket representations, using `⟨⟩` for lists, `[]` for arrays, and `{}` for blocks, as well as a shortcut "stranding" notation using `‿` for lists.
-
-### Separators
-
-The characters `⋄` and `,` and newline are completely interchangeable and are used to separate expressions. An expression might be an element in a list or a line in a block. Empty sections—those that consist only of whitespace—are ignored. This means that any number of separators can be used between expressions, and that leading and trailing separators are also allowed. The expressions are evaluated in text order: left to right and top to bottom.
+Arrays and code blocks can both be represented as sequences of expressions in source code. There are paired bracket representations, using `⟨⟩` for lists, `[]` for arrays, and `{}` for blocks, as well as a shortcut "stranding" notation using `‿` for lists. Elements within brackets are divided by [separators](token.md#separators): `,` or `⋄` or a line break.
 
 ### List and array notation
 
diff --git a/doc/token.md b/doc/token.md
new file mode 100644
index 00000000..bfb02746
--- /dev/null
+++ b/doc/token.md
@@ -0,0 +1,55 @@
+*View this file with results and syntax highlighting [here](https://mlochbaum.github.io/BQN/doc/token.html).*
+
+# Tokens
+
+A "token" is the smallest part of syntax, much like a word in English. BQN's rules for forming tokens are simpler than most programming languages, because most of them are single characters. There are only a few kinds of multi-character tokens: character and string literals written with `'` and `"`, and words which are numbers, names, or system values.
+
+Strings (and characters) and comments have starting and ending conditions, but can't overlap. The first one that starts first "wins": it needs to end before any other token or comment can start. These also take precedence over other token rules, that is, characters inside a string or comment don't form other tokens.
+
+## Non-tokens
+
+Comments, and the horizontal whitespace characters space and tab, don't form tokens, since they don't do anything as far as the program's concerned. However, whitespace can be used to separate adjacent words or strings. Comments can only end with a newline, which doesn't need separating, so they _really_ don't do anything, other than inform the reader about whatever you have to say. Note that newline characters (either LF or CR) are [separators](#separators), which are tokens.
+
+### Comments
+
+A comment starts with a `#` that isn't part of a character or string literal, and continues to the end of the line.
+
+        '#' - 1  #This is the comment
+
+Every line of commentary needs its own `#`; there's no multi-line comment syntax.
+
+## Characters and strings
+
+Strings—lists of characters—are written with double quotes `""`, and [characters](types.md#characters) with single quotes `''` with a single character in between. Only one character ever needs to be escaped: a double quote in a string is written twice. So `""""` is a one-character string of `"`, and if two string literals are next to each other, they have to be separated by a space. Character literals don't have even one escape, as the length is already known. Other than the double quote, character and string literals can contain any character directly: newlines, null characters, or other Unicode.
+
+        ≠¨ ⟨ "str" ⋄ "s't""r" ⋄ 'c' ⋄ ''' ⋄ '"' ⟩   # "" is an escape
+
+        ≡¨ ⟨ "a" ⋄ 'a' ⟩   # A string is an array but a character isn't
+
+But including a null character in your source code is probably not a great idea for other reasons. The null character (code point 0) has a dedicated literal representation `@`. Null can be used with [character arithmetic](arithmetic.md#character-arithmetic) to directly convert between characters and numeric code points, which among many other uses allows tricky characters to be entered by code point: for example, a non-breaking space is `@+160`.
+
+## Words
+
+Numbers and variable names share a token formation rule, and are collectively called words. A word is a number if it starts with a digit or numeric character `¯∞π`, and a name otherwise.
+
+Words are formed from digits, letters, and the characters `_.¯∞π`. All these characters stick together, so that you need to separate words with whitespace in order to write them next to each other. But `.` only counts if it's followed by a digit: otherwise it forms its own token to support [namespace syntax](namespace.md#imports) `ns.field`. A word may be preceded by `•` to form a system name.
+
+The character `𝕣` also sticks with other word-forming characters, but is only allowed form the special names `𝕣`, `_𝕣`, and `_𝕣_`.
+
+### Numbers
+
+[Numbers](types.md#numbers) are written as decimals, allowing `¯` for the negative sign (because `-` is a function) and `e` or `E` for scientific notation. They must have digits before and after the decimal point (so, `0.5` instead of `.5`), and any exponent must be an integer. Two special numbers `∞` and `π` are supported, possibly with a minus sign. If complex numbers are supported (no implementation to date has them), then they can be written with the components separated by `i` or `I`.
+
+        ⟨ ¯π ⋄ 0.5 ⋄ 5e¯1 ⋄ 1.5E3 ⋄ ∞ ⟩   # A list of numbers
+
+### Names
+
+A variable name starts with a letter but otherwise can contain anything, including characters like `∞` and `¯`. Names represent identifiers according to the rules of [lexical scoping](lexical.md). A somewhat unusual feature of BQN is that identifiers are case- and underscore-insensitive, so that `abc` is treated as the same name as `_a_B_c`. This works with the [role spelling](expression.md#role-spellings) system so that changing the case or adding underscores allows the same variable to be used in different roles.
+
+The system dot `•` can only start a word, and must be followed by a name. This accesses a system value such as the debugging display function `•Show`.
+
+## Separators
+
+The characters `⋄` and `,` and newline are completely interchangeable and are used to separate expressions. An expression might be an element in a list or a line in a block. Empty sections—those that consist only of whitespace—are ignored. This means that any number of separators can be used between expressions, and that leading and trailing separators are also allowed. The expressions are evaluated in text order: left to right and top to bottom.
+
+Both LF and CR are allowed as newline characters, and CRLF functions as a separator too because of the way multiple separators work.
author	Marshall Lochbaum <mwlochbaum@gmail.com>	2022-07-07 21:23:06 -0400
committer	Marshall Lochbaum <mwlochbaum@gmail.com>	2022-07-07 21:23:26 -0400
commit	77c6ab5c8435c9fcde7c4742ee0e5eb06341eeff (patch)
tree	48ff9cf3b9066aea0e38111a9dc5ce92f87ebe96 /doc
parent	f14c4af888dc678eefe1de323b8fe41f7387e82b (diff)