diff options
| author | Marshall Lochbaum <mwlochbaum@gmail.com> | 2021-03-26 22:15:00 -0400 |
|---|---|---|
| committer | Marshall Lochbaum <mwlochbaum@gmail.com> | 2021-03-26 22:15:00 -0400 |
| commit | 0bfbb0a20ec6b06cfb0398c473f563a89d541ebf (patch) | |
| tree | 387c7629dfa48d28ef946e6e40ef0fc7aeaa5bf4 /spec/token.md | |
| parent | 5b661e364c0925706e00f93134f44c3a3fb765be (diff) | |
Add • to the tokenization spec
Diffstat (limited to 'spec/token.md')
| -rw-r--r-- | spec/token.md | 4 |
1 files changed, 3 insertions, 1 deletions
diff --git a/spec/token.md b/spec/token.md index d97a6a73..0fa53d44 100644 --- a/spec/token.md +++ b/spec/token.md @@ -12,7 +12,9 @@ A comment consists of the hash character `#` and any following text until (not i Identifiers and numeric literals share the same token formation rule. These tokens are formed from the *numeric characters* `¯∞π0123456789` and *alphabetic characters* `_abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ` and the oddball `𝕣`. Additionally, `.` is considered a numeric character if it is followed immediately by a digit (`0123456789`); otherwise it forms its own token. Any sequence of these characters adjacent to each other forms a single token, which is a *numeric literal* if it begins with a numeric character and an *identifier* if it begins with an alphabetic character. If a token begins with an underscore then its first non-underscore character must be alphabetic: for example, `_99` is not a valid token. Numeric literals are also subject to [numeric literal rules](literal.md), which specify which numeric literals are valid and which numbers they represent. If the token contains `𝕣` it must be either `𝕣`, `_𝕣`, or `_𝕣_` and is considered a special name (see below). As the value taken by this identifier can only be a modifier, the uppercase character `ℝ` is not allowed. -Following this step, the whitespace characters space and tab are ignored, and do not form tokens. Only these whitespace characters, and the newline character, which does form a token, are allowed. +The *system dot* `•` always attaches to the token containing the next character, which must not be a whitespace character or `#`. This combined token is valid only if its name matches a defined [system value](system.md), ignoring underscores and letter case as with identifiers (but in the unlikely case that system values with numeric names are defined, they need not follow the numeric literal rules). Its role is the same as the role the remainder of the token would have if not preceded by `•`, and it is considered a literal for grammar purposes. + +Following these steps, the whitespace characters space and tab are ignored, and do not form tokens. Only these whitespace characters, and the newline character, which does form a token, are allowed. Otherwise, a single character forms a token. Only the specified set of characters can be used; others result in an error. The classes of characters are given below. |
