From d2b372b66c473d4dbd51df6870ce5cb8ce9706b5 Mon Sep 17 00:00:00 2001 From: Marshall Lochbaum Date: Tue, 25 Aug 2020 18:54:01 -0400 Subject: =?UTF-8?q?Highlight=20=5F=F0=9D=95=A3=20and=20=5F=F0=9D=95=A3=5F?= =?UTF-8?q?=20correctly?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- docs/spec/token.html | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'docs/spec/token.html') diff --git a/docs/spec/token.html b/docs/spec/token.html index a4b708cd..3d04a407 100644 --- a/docs/spec/token.html +++ b/docs/spec/token.html @@ -9,7 +9,7 @@

BQN source code should be considered as a series of unicode code points, which we refer to as "characters". The separator between lines in a file is considered to be a single character, newline, even though some operating systems such as Windows typically represent it with a two-character CRLF sequence. Implementers should note that not all languages treat unicode code points as atomic, as exposing the UTF-8 or UTF-16 representation instead is common. For a language such as JavaScript that uses UTF-16, the double-struck characters 𝕨𝕎𝕩𝕏𝕗𝔽𝕘𝔾 are represented as two 16-bit surrogate characters, but BQN treats them as a single unit.

A BQN character literal consists of a single character between single quotes, such as 'a', and a string literal consists of any number of characters between double quotes, such as "" or "abc". Character and string literals take precedence with comments over other tokenization rules, so that # between quotes does not start a comment and whitespace between quotes is not removed, but a quote within a comment does not start a character literal. Almost any character can be included directly in a character or string literal without escaping. The only exception is the double quote character ", which must be written twice to include it in a string, as otherwise it would end the string instead. Character literals require no escaping at all, as the length is fixed. In particular, literals for the double and single quote characters are written ''' and '"', while length-1 strings containing these characters are "'" and """".

A comment consists of the hash character # and any following text until (not including) the next newline character. The initial # must not be part of a string literal started earlier. Comments are ignored entirely and do not form tokens.

-

Identifiers and numeric literals share the same token formation rule. These tokens are formed from the numeric characters ¯∞π.0123456789 and alphabetic characters _abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ and the oddball 𝕣. Any sequence of these characters adjacent to each other forms a single token, which is a numeric literal if it begins with a numeric character and an identifier if it begins with an alphabetic character. Numeric literals are also subject to numeric literal rules, which specify which numeric literals are valid and which numbers they represent. If the token contains 𝕣 it must be either 𝕣, _𝕣, or _𝕣_ and is considered a special name (see below). As the value taken by this identifier can only be a modifier, the uppercase character is not allowed.

+

Identifiers and numeric literals share the same token formation rule. These tokens are formed from the numeric characters ¯∞π.0123456789 and alphabetic characters _abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ and the oddball 𝕣. Any sequence of these characters adjacent to each other forms a single token, which is a numeric literal if it begins with a numeric character and an identifier if it begins with an alphabetic character. Numeric literals are also subject to numeric literal rules, which specify which numeric literals are valid and which numbers they represent. If the token contains 𝕣 it must be either 𝕣, _𝕣, or _𝕣_ and is considered a special name (see below). As the value taken by this identifier can only be a modifier, the uppercase character is not allowed.

Following this step, the whitespace characters space and tab are ignored, and do not form tokens. Only these whitespace characters, and the newline character, which does form a token, are allowed.

Otherwise, a single character forms a token. Only the specified set of characters can be used; others result in an error. The classes of characters are given below.

-- cgit v1.2.3