diff options
Diffstat (limited to 'docs')
| -rw-r--r-- | docs/spec/index.html | 2 | ||||
| -rw-r--r-- | docs/spec/token.html | 3 |
2 files changed, 3 insertions, 2 deletions
diff --git a/docs/spec/index.html b/docs/spec/index.html index 4f7d4faa..3b5539ee 100644 --- a/docs/spec/index.html +++ b/docs/spec/index.html @@ -5,7 +5,7 @@ </head> <div class="nav"><a href="https://github.com/mlochbaum/BQN">BQN</a> / <a href="../index.html">main</a></div> <h1 id="bqn-specification">BQN specification</h1> -<p>This document, and the others in this directory (linked in the list below) make up the pre-versioning BQN specification. The specification differs from the <a href="../doc/index.html">documentation</a> in that its purpose is only to describe the exact details of BQN's operation in the most quickly accessible way, rather than to explain the central ideas of BQN functionality and how it might be used. The core of BQN, which excludes system-provided values, is now almost completely specified. Two planned features—syntax for system-provided values and an extension to allow low-rank elements in the argument to Join—have not yet been added, and the spec will continue to be edited further to improve clarity and cover any edge cases that have been missed.</p> +<p>This document, and the others in this directory (linked in the list below) make up the pre-versioning BQN specification. The specification differs from the <a href="../doc/index.html">documentation</a> in that its purpose is only to describe the exact details of BQN's operation in the most quickly accessible way, rather than to explain the central ideas of BQN functionality and how it might be used. The core of BQN, which excludes system-provided values, is now almost completely specified. One planned features—an extension to allow low-rank elements in the argument to Join—has not yet been added, and the spec will continue to be edited further to improve clarity and cover any edge cases that have been missed.</p> <p>Under this specification, a language implementation is a <strong>BQN pre-version implementation</strong> if it behaves as specified for all input programs. It is a <strong>BQN pre-version implementation with extensions</strong> if it behaves as specified in all cases where the specification does not require an error, but behaves differently in at least one case where it requires an error. It is a <strong>partial</strong> version of either of these if it doesn't conform to the description but differs from a conforming implementation only by rejecting with an error some programs that the conforming implementation accepts. As the specification is not yet versioned, other instances of the specification define these terms in different ways. An implementation can use one of these term if it conforms to any instance of the pre-versioning BQN specifications that defines them. When versioning is begun, there will be only one specification for each version.</p> <p>The following documents are included in the BQN specification. A BQN program is a sequence of <a href="https://en.wikipedia.org/wiki/Unicode">Unicode</a> code points: to evaluate it, it is converted into a sequence of tokens using the token formation rules, then these tokens are arranged in a syntax tree according to the grammar, and then this tree is evaluated according to the evaluation semantics. The program may be evaluated in the presence of additional context such as a filesystem or command-line arguments; this context is presented to the program and manipulated through the system-provided values.</p> <ul> diff --git a/docs/spec/token.html b/docs/spec/token.html index 7209f982..5a258422 100644 --- a/docs/spec/token.html +++ b/docs/spec/token.html @@ -10,7 +10,8 @@ <p>A BQN <em>character literal</em> consists of a single character between single quotes, such as <code><span class='String'>'a'</span></code>, and a <em>string literal</em> consists of any number of characters between double quotes, such as <code><span class='String'>""</span></code> or <code><span class='String'>"abc"</span></code>. Character and string literals take precedence with comments over other tokenization rules, so that <code><span class='Comment'>#</span></code> between quotes does not start a comment and whitespace between quotes is not removed, but a quote within a comment does not start a character literal. Almost any character can be included directly in a character or string literal without escaping. The only exception is the double quote character <code><span class='String'>"</span></code>, which must be written twice to include it in a string, as otherwise it would end the string instead. Character literals require no escaping at all, as the length is fixed. In particular, literals for the double and single quote characters are written <code><span class='String'>'''</span></code> and <code><span class='String'>'"'</span></code>, while length-1 strings containing these characters are <code><span class='String'>"'"</span></code> and <code><span class='String'>""""</span></code>.</p> <p>A comment consists of the hash character <code><span class='Comment'>#</span></code> and any following text until (not including) the next newline character. The initial <code><span class='Comment'>#</span></code> must not be part of a string literal started earlier. Comments are ignored entirely and do not form tokens.</p> <p>Identifiers and numeric literals share the same token formation rule. These tokens are formed from the <em>numeric characters</em> <code><span class='Number'>¯∞π0123456789</span></code> and <em>alphabetic characters</em> <code><span class='Modifier'>_abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ</span></code> and the oddball <code><span class='Value'>𝕣</span></code>. Additionally, <code><span class='Value'>.</span></code> is considered a numeric character if it is followed immediately by a digit (<code><span class='Number'>0123456789</span></code>); otherwise it forms its own token. Any sequence of these characters adjacent to each other forms a single token, which is a <em>numeric literal</em> if it begins with a numeric character and an <em>identifier</em> if it begins with an alphabetic character. If a token begins with an underscore then its first non-underscore character must be alphabetic: for example, <code><span class='Modifier'>_99</span></code> is not a valid token. Numeric literals are also subject to <a href="literal.html">numeric literal rules</a>, which specify which numeric literals are valid and which numbers they represent. If the token contains <code><span class='Value'>𝕣</span></code> it must be either <code><span class='Value'>𝕣</span></code>, <code><span class='Modifier'>_𝕣</span></code>, or <code><span class='Modifier2'>_𝕣_</span></code> and is considered a special name (see below). As the value taken by this identifier can only be a modifier, the uppercase character <code><span class='Value'>ℝ</span></code> is not allowed.</p> -<p>Following this step, the whitespace characters space and tab are ignored, and do not form tokens. Only these whitespace characters, and the newline character, which does form a token, are allowed.</p> +<p>The <em>system dot</em> <code><span class='Value'>•</span></code> always attaches to the token containing the next character, which must not be a whitespace character or <code><span class='Comment'>#</span></code>. This combined token is valid only if its name matches a defined <a href="system.html">system value</a>, ignoring underscores and letter case as with identifiers (but in the unlikely case that system values with numeric names are defined, they need not follow the numeric literal rules). Its role is the same as the role the remainder of the token would have if not preceded by <code><span class='Value'>•</span></code>, and it is considered a literal for grammar purposes.</p> +<p>Following these steps, the whitespace characters space and tab are ignored, and do not form tokens. Only these whitespace characters, and the newline character, which does form a token, are allowed.</p> <p>Otherwise, a single character forms a token. Only the specified set of characters can be used; others result in an error. The classes of characters are given below.</p> <table> <thead> |
