From 77c6ab5c8435c9fcde7c4742ee0e5eb06341eeff Mon Sep 17 00:00:00 2001 From: Marshall Lochbaum Date: Thu, 7 Jul 2022 21:23:06 -0400 Subject: Separate token and constant documentation into its own page --- doc/README.md | 1 + doc/arithmetic.md | 2 +- doc/arrayrepr.md | 4 ++-- doc/expression.md | 2 +- doc/glossary.md | 6 ++--- doc/syntax.md | 47 +++++++++++++------------------------ doc/token.md | 55 ++++++++++++++++++++++++++++++++++++++++++++ docs/doc/arithmetic.html | 2 +- docs/doc/arrayrepr.html | 4 ++-- docs/doc/expression.html | 2 +- docs/doc/glossary.html | 6 ++--- docs/doc/index.html | 3 +++ docs/doc/syntax.html | 47 +++++++++++++++---------------------- docs/doc/token.html | 41 +++++++++++++++++++++++++++++++++ docs/help/character.html | 2 +- docs/help/comment.html | 2 +- docs/help/infinity.html | 2 +- docs/help/minus.html | 2 +- docs/help/nullcharacter.html | 2 +- docs/help/pi.html | 2 +- docs/help/separator.html | 2 +- docs/help/string.html | 2 +- help/character.md | 2 +- help/comment.md | 2 +- help/infinity.md | 2 +- help/minus.md | 2 +- help/nullcharacter.md | 2 +- help/pi.md | 2 +- help/separator.md | 2 +- help/string.md | 2 +- md.bqn | 4 ++-- 31 files changed, 166 insertions(+), 92 deletions(-) create mode 100644 doc/token.md create mode 100644 docs/doc/token.html diff --git a/doc/README.md b/doc/README.md index 7d0a1fd4..f0c5ae18 100644 --- a/doc/README.md +++ b/doc/README.md @@ -19,6 +19,7 @@ References: - [BQN as combinatory logic](birds.md) Concepts: +- [Tokens and constants](token.md) - [Expression syntax](expression.md) - [Context-free grammar](context.md) - [Arrays](array.md) diff --git a/doc/arithmetic.md b/doc/arithmetic.md index e51fc53e..bd61ed0e 100644 --- a/doc/arithmetic.md +++ b/doc/arithmetic.md @@ -49,7 +49,7 @@ Each of these functions also has a meaning with only one argument, although math √ 0‿1‿2‿4 -Take note of the difference between the function `-`, and the "high minus" character `¯`, which is a part of [numeric notation](syntax.md#constants). Also shown is the number `∞`, which BQN supports along with `¯∞` (but depending on implementation BQN may or may not keep track of `¯0`. Integer optimization loses the distinction so it's best not to rely on it). +Take note of the difference between the function `-`, and the "high minus" character `¯`, which is a part of [numeric notation](token.md#numbers). Also shown is the number `∞`, which BQN supports along with `¯∞` (but depending on implementation BQN may or may not keep track of `¯0`. Integer optimization loses the distinction so it's best not to rely on it). The logarithm is written with [Undo](undo.md): `⋆⁼`. As with Power, the default base is *e*, giving a natural logarithm. diff --git a/doc/arrayrepr.md b/doc/arrayrepr.md index 205c4a44..63dba983 100644 --- a/doc/arrayrepr.md +++ b/doc/arrayrepr.md @@ -92,7 +92,7 @@ Now it's time to discuss ways to write arrays in a BQN program. There are three ### Strings -A **string** consists of a sequence of characters surrounded by double quotes `""`. The only rule for the characters inside is that any double quote must be escaped by repeating it twice; otherwise the string ends at that point. +A [**string** literal](token.md#characters-and-strings) consists of a sequence of characters surrounded by double quotes `""`. The only rule for the characters inside is that any double quote must be escaped by repeating it twice; otherwise the string ends at that point. "-'×%""*" @@ -102,7 +102,7 @@ Even special characters like a newline can appear in a string literal, so that s ### Brackets -**List notation** uses angle brackets `⟨⟩`. The contents are structurally identical to those of a [block](block.md), that is, a list of expressions [separated](syntax.md#separators) by `,` or `⋄` or newlines. Unlike a block, a list doesn't need to have any expressions: `⟨⟩` or `⟨⋄⟩` or `⟨,,⋄,⟩` will create an empty list. Other differences are that a list doesn't introduce a new [scope](lexical.md) and all of the expressions have to result in a value, not [Nothing](expression.md#nothing) (`·`). +**List notation** uses angle brackets `⟨⟩`. The contents are structurally identical to those of a [block](block.md), that is, a list of expressions [separated](token.md#separators) by `,` or `⋄` or newlines. Unlike a block, a list doesn't need to have any expressions: `⟨⟩` or `⟨⋄⟩` or `⟨,,⋄,⟩` will create an empty list. Other differences are that a list doesn't introduce a new [scope](lexical.md) and all of the expressions have to result in a value, not [Nothing](expression.md#nothing) (`·`). Entries in a list are evaluated in source order, and the value will be the list of those results. The list has a subject [role](expression.md#syntactic-role), even if it contains expressions with other roles. Any value can be an element. diff --git a/doc/expression.md b/doc/expression.md index 5fa6cdc7..5359a672 100644 --- a/doc/expression.md +++ b/doc/expression.md @@ -2,7 +2,7 @@ # Expression syntax -BQN expressions are the part of [syntax](syntax.md) that describes computations to perform. Programs are mainly made up of expressions with a little organizing material like [blocks](block.md) and [namespaces](namespace.md) around them. This page explains how functions, modifiers, and assignment combine with their inputs. It doesn't describe [constant](syntax.md#constants) and [array](arrayrepr.md#array-literals) literals, which each form a single subject for grammatical purposes. +BQN expressions are the part of [syntax](syntax.md) that describes computations to perform. Programs are mainly made up of expressions with a little organizing material like [blocks](block.md) and [namespaces](namespace.md) around them. This page explains how functions, modifiers, and assignment combine with their inputs. It doesn't describe [constant](token.md) and [array](arrayrepr.md#array-literals) literals, which each form a single subject for grammatical purposes. The [first tutorial](../tutorial/expression.md) also covers how to build and read BQN expressions. diff --git a/doc/glossary.md b/doc/glossary.md index cd68e6c9..cd5b1853 100644 --- a/doc/glossary.md +++ b/doc/glossary.md @@ -89,10 +89,10 @@ BQN uses standard terminology for particular sets of numbers, with natural numbe * [**Primitive**](primitive.md): One of several fixed operations defined by the language, denoted by a single-character token. * **Word**: A sequence of alphabetic or numeric characters. * **Name**: A word that starts with an alphabetic character. Names are compared case-insensitively and ignoring underscores `_`. -* [**Numeric literal**](syntax.md#constants): A word that starts with a numeric character, indicating a number. +* [**Numeric literal**](token.md#numbers): A word that starts with a numeric character, indicating a number. * [**String literal**](arrayrepr.md#strings): A literal written with double quotes `""`, indicating a string. -* [**Character literal**](syntax.md#constants): A literal written with single quotes `''`, indicating a string. -* [**Null literal**](syntax.md#constants): The literal `@`, indicating the null character (code point 0). +* [**Character literal**](token.md#characters-and-strings): A literal written with single quotes `''`, indicating a string. +* [**Null literal**](token.md#characters-and-strings): The literal `@`, indicating the null character (code point 0). ## Grammar diff --git a/doc/syntax.md b/doc/syntax.md index da334d6d..86b88d3a 100644 --- a/doc/syntax.md +++ b/doc/syntax.md @@ -15,11 +15,11 @@ Here's a full table of precedence for BQN's glyphs (broader than "operator prece | | [Stranding](#list-and-array-notation) | n-ary | `‿` | | Modifier | Left-to-right | `∘⎉¨´`… | `↩` in `Fn↩` | | Function | Right-to-left | `+↕⊔⍉`… | `←↩⇐` -| | [Separator](#separators) | | `⋄,` and newline | `?` +| | [Separator](token.md#separators) | | `⋄,` and newline | `?` | | [Header](block.md#block-headers) | | `:` | Low | [Body](block.md#multiple-bodies) | | `;` -While all of BQN's grammar fits into this table somehow, it's not really the whole story because subexpressions including parentheses and blocks might behave like functions or modifiers. +While all of BQN's grammar fits into this table somehow, it's not really the whole story because subexpressions including parentheses and blocks might behave like functions or modifiers. See [expressions](#expressions) and [blocks](#blocks). ## Special glyphs @@ -27,17 +27,17 @@ The following glyphs are used for BQN syntax. [Primitives](primitive.md) (built- Glyph(s) | Meaning ----------------|----------- -`#` | [Comment](#comments) -`'"` | [Character or string literal](#constants) -`@` | [Null character](#constants) -`¯∞π` | [Used in numeric literals](#constants) +`#` | [Comment](token.md#comments) +`'"` | [Character or string literal](token.md#characters-and-strings) +`@` | [Null character](token.md#characters-and-strings) +`¯∞π` | [Used in numeric literals](token.md#numbers) `·` | [Nothing](expression.md#nothing) `()` | [Expression grouping](expression.md#parentheses) `←` | [Define](expression.md#assignment) `⇐` | [Export](namespace.md#exports) `↩` | [Change](expression.md#assignment) `.` | Namespace [field access](namespace.md#imports) -`⋄,` or newline | Statement or element [separator](#separators) +`⋄,` or newline | Statement or element [separator](token.md#separators) `⟨⟩` | [List](#list-and-array-notation) `[]` | [Array](#list-and-array-notation) `‿` | [Strand](#list-and-array-notation) (lightweight list syntax) @@ -52,27 +52,16 @@ Glyph(s) | Meaning `𝕘𝔾` | [Right operand of a 2-modifier](#blocks) `𝕣` | [Modifier self-reference](#blocks) -## Comments +## Tokens -A comment starts with a `#` that isn't part of a character or string literal, and continues to the end of the line. +*[Full documentation](token.md)* - '#' - 1 #This is the comment - -## Constants - -BQN has single-token notation for numbers, strings, and characters. - -[Numbers](types.md#numbers) are written as decimals, allowing `¯` for the negative sign (because `-` is a function) and `e` or `E` for scientific notation. They must have digits before and after the decimal point (so, `0.5` instead of `.5`), and any exponent must be an integer. Two special numbers `∞` and `π` are supported, possibly with a minus sign. If complex numbers are supported (no implementation to date has them), then they can be written with the components separated by `i` or `I`. - - ⟨ ¯π ⋄ 0.5 ⋄ 5e¯1 ⋄ 1.5E3 ⋄ ∞ ⟩ # A list of numbers - -Strings—lists of characters—are written with double quotes `""`, and [characters](types.md#characters) with single quotes `''` with a single character in between. Only one character ever needs to be escaped: a double quote in a string is written twice. So `""""` is a one-character string of `"`, and if two string literals are next to each other, they have to be separated by a space. Character literals don't have even one escape, as the length is already known. Other than the double quote, character and string literals can contain anything: newlines, null characters, or any other Unicode. - - ≠¨ ⟨ "str" ⋄ "s't""r" ⋄ 'c' ⋄ ''' ⋄ '"' ⟩ # "" is an escape - - ≡¨ ⟨ "a" ⋄ 'a' ⟩ # A string is an array but a character isn't - -But including a null character in your source code is probably not a great idea for other reasons. The null character (code point 0) has a dedicated literal representation `@`. Null can be used with [character arithmetic](arithmetic.md#character-arithmetic) to directly convert between characters and numeric code points, which among many other uses allows tricky characters to be entered by code point: for example, a non-breaking space is `@+160`. +BQN syntax is made up of tokens, which are mostly single characters. But there are a few exceptions: +- [Comments](token.md#comments) start with `#` and end at the end of the line. +- [Character literals](token.md#characters-and-strings) start and end with `'`, and have exactly one character in between. +- [String literals](token.md#characters-and-strings) start and end with `"`. Pairs of quotes `""` in between represent one quote character, and other characters (including `'`) represent themselves. +- [Numbers](token.md#numbers) support decimal (`.`) and scientific (`e`) notation, plus `π` and `∞`, and use `¯` for a minus sign. +- [Variable names](token.md#names) allow letters, underscores, and numeric characters. They're matched case-insensitively, with a [spelling system](expression.md#role-spellings) that determines role. ## Expressions @@ -95,11 +84,7 @@ The double arrow `⇐` is used for functionality relating to [namespaces](namesp ## Arrays and blocks -Arrays and code blocks can both be represented as sequences of expressions in source code. There are paired bracket representations, using `⟨⟩` for lists, `[]` for arrays, and `{}` for blocks, as well as a shortcut "stranding" notation using `‿` for lists. - -### Separators - -The characters `⋄` and `,` and newline are completely interchangeable and are used to separate expressions. An expression might be an element in a list or a line in a block. Empty sections—those that consist only of whitespace—are ignored. This means that any number of separators can be used between expressions, and that leading and trailing separators are also allowed. The expressions are evaluated in text order: left to right and top to bottom. +Arrays and code blocks can both be represented as sequences of expressions in source code. There are paired bracket representations, using `⟨⟩` for lists, `[]` for arrays, and `{}` for blocks, as well as a shortcut "stranding" notation using `‿` for lists. Elements within brackets are divided by [separators](token.md#separators): `,` or `⋄` or a line break. ### List and array notation diff --git a/doc/token.md b/doc/token.md new file mode 100644 index 00000000..bfb02746 --- /dev/null +++ b/doc/token.md @@ -0,0 +1,55 @@ +*View this file with results and syntax highlighting [here](https://mlochbaum.github.io/BQN/doc/token.html).* + +# Tokens + +A "token" is the smallest part of syntax, much like a word in English. BQN's rules for forming tokens are simpler than most programming languages, because most of them are single characters. There are only a few kinds of multi-character tokens: character and string literals written with `'` and `"`, and words which are numbers, names, or system values. + +Strings (and characters) and comments have starting and ending conditions, but can't overlap. The first one that starts first "wins": it needs to end before any other token or comment can start. These also take precedence over other token rules, that is, characters inside a string or comment don't form other tokens. + +## Non-tokens + +Comments, and the horizontal whitespace characters space and tab, don't form tokens, since they don't do anything as far as the program's concerned. However, whitespace can be used to separate adjacent words or strings. Comments can only end with a newline, which doesn't need separating, so they _really_ don't do anything, other than inform the reader about whatever you have to say. Note that newline characters (either LF or CR) are [separators](#separators), which are tokens. + +### Comments + +A comment starts with a `#` that isn't part of a character or string literal, and continues to the end of the line. + + '#' - 1 #This is the comment + +Every line of commentary needs its own `#`; there's no multi-line comment syntax. + +## Characters and strings + +Strings—lists of characters—are written with double quotes `""`, and [characters](types.md#characters) with single quotes `''` with a single character in between. Only one character ever needs to be escaped: a double quote in a string is written twice. So `""""` is a one-character string of `"`, and if two string literals are next to each other, they have to be separated by a space. Character literals don't have even one escape, as the length is already known. Other than the double quote, character and string literals can contain any character directly: newlines, null characters, or other Unicode. + + ≠¨ ⟨ "str" ⋄ "s't""r" ⋄ 'c' ⋄ ''' ⋄ '"' ⟩ # "" is an escape + + ≡¨ ⟨ "a" ⋄ 'a' ⟩ # A string is an array but a character isn't + +But including a null character in your source code is probably not a great idea for other reasons. The null character (code point 0) has a dedicated literal representation `@`. Null can be used with [character arithmetic](arithmetic.md#character-arithmetic) to directly convert between characters and numeric code points, which among many other uses allows tricky characters to be entered by code point: for example, a non-breaking space is `@+160`. + +## Words + +Numbers and variable names share a token formation rule, and are collectively called words. A word is a number if it starts with a digit or numeric character `¯∞π`, and a name otherwise. + +Words are formed from digits, letters, and the characters `_.¯∞π`. All these characters stick together, so that you need to separate words with whitespace in order to write them next to each other. But `.` only counts if it's followed by a digit: otherwise it forms its own token to support [namespace syntax](namespace.md#imports) `ns.field`. A word may be preceded by `•` to form a system name. + +The character `𝕣` also sticks with other word-forming characters, but is only allowed form the special names `𝕣`, `_𝕣`, and `_𝕣_`. + +### Numbers + +[Numbers](types.md#numbers) are written as decimals, allowing `¯` for the negative sign (because `-` is a function) and `e` or `E` for scientific notation. They must have digits before and after the decimal point (so, `0.5` instead of `.5`), and any exponent must be an integer. Two special numbers `∞` and `π` are supported, possibly with a minus sign. If complex numbers are supported (no implementation to date has them), then they can be written with the components separated by `i` or `I`. + + ⟨ ¯π ⋄ 0.5 ⋄ 5e¯1 ⋄ 1.5E3 ⋄ ∞ ⟩ # A list of numbers + +### Names + +A variable name starts with a letter but otherwise can contain anything, including characters like `∞` and `¯`. Names represent identifiers according to the rules of [lexical scoping](lexical.md). A somewhat unusual feature of BQN is that identifiers are case- and underscore-insensitive, so that `abc` is treated as the same name as `_a_B_c`. This works with the [role spelling](expression.md#role-spellings) system so that changing the case or adding underscores allows the same variable to be used in different roles. + +The system dot `•` can only start a word, and must be followed by a name. This accesses a system value such as the debugging display function `•Show`. + +## Separators + +The characters `⋄` and `,` and newline are completely interchangeable and are used to separate expressions. An expression might be an element in a list or a line in a block. Empty sections—those that consist only of whitespace—are ignored. This means that any number of separators can be used between expressions, and that leading and trailing separators are also allowed. The expressions are evaluated in text order: left to right and top to bottom. + +Both LF and CR are allowed as newline characters, and CRLF functions as a separator too because of the way multiple separators work. diff --git a/docs/doc/arithmetic.html b/docs/doc/arithmetic.html index 9e511044..918bd9c5 100644 --- a/docs/doc/arithmetic.html +++ b/docs/doc/arithmetic.html @@ -96,7 +96,7 @@ 0124 ⟨ 0 1 1.414213562373095 2 ⟩ -

Take note of the difference between the function -, and the "high minus" character ¯, which is a part of numeric notation. Also shown is the number , which BQN supports along with ¯∞ (but depending on implementation BQN may or may not keep track of ¯0. Integer optimization loses the distinction so it's best not to rely on it).

+

Take note of the difference between the function -, and the "high minus" character ¯, which is a part of numeric notation. Also shown is the number , which BQN supports along with ¯∞ (but depending on implementation BQN may or may not keep track of ¯0. Integer optimization loses the distinction so it's best not to rely on it).

The logarithm is written with Undo: . As with Power, the default base is e, giving a natural logarithm.

↗️
     10
 2.302585092994046
diff --git a/docs/doc/arrayrepr.html b/docs/doc/arrayrepr.html
index 159bb8f7..5218009e 100644
--- a/docs/doc/arrayrepr.html
+++ b/docs/doc/arrayrepr.html
@@ -162,7 +162,7 @@
 

The tutorial section here also covers this topic.

Now it's time to discuss ways to write arrays in a BQN program. There are three kinds literal notation for lists: strings, list notation, and stranding. Strings indicate character lists (with space for the fill) and the other two can combine any sequence of elements. Additionally, there's a square bracket notation that can form higher-rank arrays.

Strings

-

A string consists of a sequence of characters surrounded by double quotes "". The only rule for the characters inside is that any double quote must be escaped by repeating it twice; otherwise the string ends at that point.

+

A string literal consists of a sequence of characters surrounded by double quotes "". The only rule for the characters inside is that any double quote must be escaped by repeating it twice; otherwise the string ends at that point.

↗️
    "-'×%""*"
 "-'×%""*"
 
@@ -171,7 +171,7 @@
 

Even special characters like a newline can appear in a string literal, so that string literals are automatically multi-line.

Brackets

-

List notation uses angle brackets ⟨⟩. The contents are structurally identical to those of a block, that is, a list of expressions separated by , or or newlines. Unlike a block, a list doesn't need to have any expressions: ⟨⟩ or or ,,⋄, will create an empty list. Other differences are that a list doesn't introduce a new scope and all of the expressions have to result in a value, not Nothing (·).

+

List notation uses angle brackets ⟨⟩. The contents are structurally identical to those of a block, that is, a list of expressions separated by , or or newlines. Unlike a block, a list doesn't need to have any expressions: ⟨⟩ or or ,,⋄, will create an empty list. Other differences are that a list doesn't introduce a new scope and all of the expressions have to result in a value, not Nothing (·).

Entries in a list are evaluated in source order, and the value will be the list of those results. The list has a subject role, even if it contains expressions with other roles. Any value can be an element.

↗️
    @, ˘, "abc"
 ┌─              
diff --git a/docs/doc/expression.html b/docs/doc/expression.html
index 81f2a048..b8f9254c 100644
--- a/docs/doc/expression.html
+++ b/docs/doc/expression.html
@@ -5,7 +5,7 @@
 
 
 

Expression syntax

-

BQN expressions are the part of syntax that describes computations to perform. Programs are mainly made up of expressions with a little organizing material like blocks and namespaces around them. This page explains how functions, modifiers, and assignment combine with their inputs. It doesn't describe constant and array literals, which each form a single subject for grammatical purposes.

+

BQN expressions are the part of syntax that describes computations to perform. Programs are mainly made up of expressions with a little organizing material like blocks and namespaces around them. This page explains how functions, modifiers, and assignment combine with their inputs. It doesn't describe constant and array literals, which each form a single subject for grammatical purposes.

The first tutorial also covers how to build and read BQN expressions.

Overview

BQN expressions consist of subjects, functions, and modifiers arranged in sequence, with parentheses to group parts into subexpressions. Assignment arrows and can also be present and mostly act like functions. Functions can be applied to subjects or grouped into trains, while modifiers can be applied to subjects or functions. The most important kinds of application are:

diff --git a/docs/doc/glossary.html b/docs/doc/glossary.html index 3588a8d8..daf3ec43 100644 --- a/docs/doc/glossary.html +++ b/docs/doc/glossary.html @@ -99,10 +99,10 @@
  • Primitive: One of several fixed operations defined by the language, denoted by a single-character token.
  • Word: A sequence of alphabetic or numeric characters.
  • Name: A word that starts with an alphabetic character. Names are compared case-insensitively and ignoring underscores _.
  • -
  • Numeric literal: A word that starts with a numeric character, indicating a number.
  • +
  • Numeric literal: A word that starts with a numeric character, indicating a number.
  • String literal: A literal written with double quotes "", indicating a string.
  • -
  • Character literal: A literal written with single quotes '', indicating a string.
  • -
  • Null literal: The literal @, indicating the null character (code point 0).
  • +
  • Character literal: A literal written with single quotes '', indicating a string.
  • +
  • Null literal: The literal @, indicating the null character (code point 0).
  • Grammar

      diff --git a/docs/doc/index.html b/docs/doc/index.html index deded4be..4edd0d0e 100644 --- a/docs/doc/index.html +++ b/docs/doc/index.html @@ -24,6 +24,9 @@

    Concepts:

      +
    • Tokens and constants + +
    • Expression syntax
      • Context-free grammar
      • diff --git a/docs/doc/syntax.html b/docs/doc/syntax.html index 94bace3c..ef6c1804 100644 --- a/docs/doc/syntax.html +++ b/docs/doc/syntax.html @@ -56,7 +56,7 @@ -Separator +Separator ⋄, and newline ? @@ -77,7 +77,7 @@ -

        While all of BQN's grammar fits into this table somehow, it's not really the whole story because subexpressions including parentheses and blocks might behave like functions or modifiers.

        +

        While all of BQN's grammar fits into this table somehow, it's not really the whole story because subexpressions including parentheses and blocks might behave like functions or modifiers. See expressions and blocks.

        Special glyphs

        The following glyphs are used for BQN syntax. Primitives (built-in functions and modifiers) are not listed in this table, and have their own page. Digits, characters, and the underscore _ are used for numbers and variable names.

        @@ -90,19 +90,19 @@ - + - + - + - + @@ -130,7 +130,7 @@ - + @@ -186,25 +186,16 @@
        #CommentComment
        '"Character or string literalCharacter or string literal
        @Null characterNull character
        ¯∞πUsed in numeric literalsUsed in numeric literals
        ·
        ⋄, or newlineStatement or element separatorStatement or element separator
        ⟨⟩
        -

        Comments

        -

        A comment starts with a # that isn't part of a character or string literal, and continues to the end of the line.

        -↗️
            '#' - 1  #This is the comment
        -'"'
        -
        -

        Constants

        -

        BQN has single-token notation for numbers, strings, and characters.

        -

        Numbers are written as decimals, allowing ¯ for the negative sign (because - is a function) and e or E for scientific notation. They must have digits before and after the decimal point (so, 0.5 instead of .5), and any exponent must be an integer. Two special numbers and π are supported, possibly with a minus sign. If complex numbers are supported (no implementation to date has them), then they can be written with the components separated by i or I.

        -↗️
             ¯π  0.5  5e¯1  1.5E3      # A list of numbers
        -⟨ ¯3.141592653589793 0.5 0.5 1500 ∞ ⟩
        -
        -

        Strings—lists of characters—are written with double quotes "", and characters with single quotes '' with a single character in between. Only one character ever needs to be escaped: a double quote in a string is written twice. So """" is a one-character string of ", and if two string literals are next to each other, they have to be separated by a space. Character literals don't have even one escape, as the length is already known. Other than the double quote, character and string literals can contain anything: newlines, null characters, or any other Unicode.

        -↗️
            ¨  "str"  "s't""r"  'c'  '''  '"'    # "" is an escape
        -⟨ 3 5 1 1 1 ⟩
        -
        -    ¨  "a"  'a'    # A string is an array but a character isn't
        -⟨ 1 0 ⟩
        -
        -

        But including a null character in your source code is probably not a great idea for other reasons. The null character (code point 0) has a dedicated literal representation @. Null can be used with character arithmetic to directly convert between characters and numeric code points, which among many other uses allows tricky characters to be entered by code point: for example, a non-breaking space is @+160.

        +

        Tokens

        +

        Full documentation

        +

        BQN syntax is made up of tokens, which are mostly single characters. But there are a few exceptions:

        +
          +
        • Comments start with # and end at the end of the line.
        • +
        • Character literals start and end with ', and have exactly one character in between.
        • +
        • String literals start and end with ". Pairs of quotes "" in between represent one quote character, and other characters (including ') represent themselves.
        • +
        • Numbers support decimal (.) and scientific (e) notation, plus π and , and use ¯ for a minus sign.
        • +
        • Variable names allow letters, underscores, and numeric characters. They're matched case-insensitively, with a spelling system that determines role.
        • +

        Expressions

        Full documentation

        BQN expressions are composed of subjects, functions, and modifiers, with parentheses to group parts into subexpressions. Functions can be applied to subjects or grouped into trains, while modifiers can be applied to subjects or functions. The most important kinds of application are:

        @@ -263,9 +254,7 @@

        Assignment arrows , , and store expression results in variables: and create new variables while modifies existing ones. The general format is Name Value, where the two sides have the same role. Additionally, lhs F rhs is a shortened form of lhs lhs F rhs and lhs F expands to lhs F lhs.

        The double arrow is used for functionality relating to namespaces. It has a few purposes: exporting assignment namevalue, plain export name, and aliasing aliasfieldnamespace. A block that uses it for export returns a namespace rather than the result of its last statement. The other namespace-related bit of syntax is field access ns.field.

        Arrays and blocks

        -

        Arrays and code blocks can both be represented as sequences of expressions in source code. There are paired bracket representations, using ⟨⟩ for lists, [] for arrays, and {} for blocks, as well as a shortcut "stranding" notation using for lists.

        -

        Separators

        -

        The characters and , and newline are completely interchangeable and are used to separate expressions. An expression might be an element in a list or a line in a block. Empty sections—those that consist only of whitespace—are ignored. This means that any number of separators can be used between expressions, and that leading and trailing separators are also allowed. The expressions are evaluated in text order: left to right and top to bottom.

        +

        Arrays and code blocks can both be represented as sequences of expressions in source code. There are paired bracket representations, using ⟨⟩ for lists, [] for arrays, and {} for blocks, as well as a shortcut "stranding" notation using for lists. Elements within brackets are divided by separators: , or or a line break.

        List and array notation

        Full documentation

        Lists (1-dimensional arrays) are enclosed in angle brackets ⟨⟩, with the results of the expressions in between being the list's elements. Lists of two elements or more can also be written with the ligature character . This character has higher binding strength than any part of an expression except . for namespace field access. If one of the elements is a compound expression, then it will need to be enclosed in parentheses.

        diff --git a/docs/doc/token.html b/docs/doc/token.html new file mode 100644 index 00000000..75387cde --- /dev/null +++ b/docs/doc/token.html @@ -0,0 +1,41 @@ + + + + BQN: Tokens + + +

        Tokens

        +

        A "token" is the smallest part of syntax, much like a word in English. BQN's rules for forming tokens are simpler than most programming languages, because most of them are single characters. There are only a few kinds of multi-character tokens: character and string literals written with ' and ", and words which are numbers, names, or system values.

        +

        Strings (and characters) and comments have starting and ending conditions, but can't overlap. The first one that starts first "wins": it needs to end before any other token or comment can start. These also take precedence over other token rules, that is, characters inside a string or comment don't form other tokens.

        +

        Non-tokens

        +

        Comments, and the horizontal whitespace characters space and tab, don't form tokens, since they don't do anything as far as the program's concerned. However, whitespace can be used to separate adjacent words or strings. Comments can only end with a newline, which doesn't need separating, so they really don't do anything, other than inform the reader about whatever you have to say. Note that newline characters (either LF or CR) are separators, which are tokens.

        +

        Comments

        +

        A comment starts with a # that isn't part of a character or string literal, and continues to the end of the line.

        +↗️
            '#' - 1  #This is the comment
        +'"'
        +
        +

        Every line of commentary needs its own #; there's no multi-line comment syntax.

        +

        Characters and strings

        +

        Strings—lists of characters—are written with double quotes "", and characters with single quotes '' with a single character in between. Only one character ever needs to be escaped: a double quote in a string is written twice. So """" is a one-character string of ", and if two string literals are next to each other, they have to be separated by a space. Character literals don't have even one escape, as the length is already known. Other than the double quote, character and string literals can contain any character directly: newlines, null characters, or other Unicode.

        +↗️
            ¨  "str"  "s't""r"  'c'  '''  '"'    # "" is an escape
        +⟨ 3 5 1 1 1 ⟩
        +
        +    ¨  "a"  'a'    # A string is an array but a character isn't
        +⟨ 1 0 ⟩
        +
        +

        But including a null character in your source code is probably not a great idea for other reasons. The null character (code point 0) has a dedicated literal representation @. Null can be used with character arithmetic to directly convert between characters and numeric code points, which among many other uses allows tricky characters to be entered by code point: for example, a non-breaking space is @+160.

        +

        Words

        +

        Numbers and variable names share a token formation rule, and are collectively called words. A word is a number if it starts with a digit or numeric character ¯∞π, and a name otherwise.

        +

        Words are formed from digits, letters, and the characters _.¯∞π. All these characters stick together, so that you need to separate words with whitespace in order to write them next to each other. But . only counts if it's followed by a digit: otherwise it forms its own token to support namespace syntax ns.field. A word may be preceded by to form a system name.

        +

        The character 𝕣 also sticks with other word-forming characters, but is only allowed form the special names 𝕣, _𝕣, and _𝕣_.

        +

        Numbers

        +

        Numbers are written as decimals, allowing ¯ for the negative sign (because - is a function) and e or E for scientific notation. They must have digits before and after the decimal point (so, 0.5 instead of .5), and any exponent must be an integer. Two special numbers and π are supported, possibly with a minus sign. If complex numbers are supported (no implementation to date has them), then they can be written with the components separated by i or I.

        +↗️
             ¯π  0.5  5e¯1  1.5E3      # A list of numbers
        +⟨ ¯3.141592653589793 0.5 0.5 1500 ∞ ⟩
        +
        +

        Names

        +

        A variable name starts with a letter but otherwise can contain anything, including characters like and ¯. Names represent identifiers according to the rules of lexical scoping. A somewhat unusual feature of BQN is that identifiers are case- and underscore-insensitive, so that abc is treated as the same name as _a_B_c. This works with the role spelling system so that changing the case or adding underscores allows the same variable to be used in different roles.

        +

        The system dot can only start a word, and must be followed by a name. This accesses a system value such as the debugging display function •Show.

        +

        Separators

        +

        The characters and , and newline are completely interchangeable and are used to separate expressions. An expression might be an element in a list or a line in a block. Empty sections—those that consist only of whitespace—are ignored. This means that any number of separators can be used between expressions, and that leading and trailing separators are also allowed. The expressions are evaluated in text order: left to right and top to bottom.

        +

        Both LF and CR are allowed as newline characters, and CRLF functions as a separator too because of the way multiple separators work.

        diff --git a/docs/help/character.html b/docs/help/character.html index 7a29a0f5..4675c644 100644 --- a/docs/help/character.html +++ b/docs/help/character.html @@ -6,7 +6,7 @@

        Single Quote (')

        'c': Character

        -

        →full documentation

        +

        →full documentation

        A character literal whose value is the character between quotes. Any character can be used, even ' and newline.

        ↗️
            'a''b'
         "ab"
        diff --git a/docs/help/comment.html b/docs/help/comment.html
        index 70671d81..8b02d745 100644
        --- a/docs/help/comment.html
        +++ b/docs/help/comment.html
        @@ -6,7 +6,7 @@
         
         

        Number Sign (#)

        #: Comment

        -

        →full documentation

        +

        →full documentation

        Create a comment that extends to the end of the line.

        Anything written in comments is ignored.

        ↗️
            1 + 2 # + 3 + 4
        diff --git a/docs/help/infinity.html b/docs/help/infinity.html
        index c607967e..8771e545 100644
        --- a/docs/help/infinity.html
        +++ b/docs/help/infinity.html
        @@ -6,7 +6,7 @@
         
         

        Infinity ()

        : Infinity

        -

        →full documentation

        +

        →full documentation

        Mathematical constant Infinity, a numeric literal. Can be negative (¯∞).

        ↗️
            
         ∞
        diff --git a/docs/help/minus.html b/docs/help/minus.html
        index c235aa75..ca43170a 100644
        --- a/docs/help/minus.html
        +++ b/docs/help/minus.html
        @@ -6,7 +6,7 @@
         
         

        Macron (¯)

        ¯: Minus

        -

        →full documentation

        +

        →full documentation

        Prefix before numbers to indicate that they are negative.

        Note that this is not the same as -, since it is part of the number, rather than a primitive that negates its value.

        ↗️
            -123
        diff --git a/docs/help/nullcharacter.html b/docs/help/nullcharacter.html
        index 24bcf67e..491642e8 100644
        --- a/docs/help/nullcharacter.html
        +++ b/docs/help/nullcharacter.html
        @@ -6,7 +6,7 @@
         
         

        Commercial At (@)

        @: Null Character

        -

        →full documentation

        +

        →full documentation

        Null character, code point 0 in ASCII. A shortcut character literal.

        Add to a code point number to get that character.

        ↗️
            @+50
        diff --git a/docs/help/pi.html b/docs/help/pi.html
        index 3a947cb2..7d677211 100644
        --- a/docs/help/pi.html
        +++ b/docs/help/pi.html
        @@ -6,7 +6,7 @@
         
         

        Pi (π)

        π: Pi

        -

        →full documentation

        +

        →full documentation

        The mathematical constant pi, a numeric literal. Can be negative (¯π).

        ↗️
            π
         3.141592653589793
        diff --git a/docs/help/separator.html b/docs/help/separator.html
        index 3c104dbe..931d635e 100644
        --- a/docs/help/separator.html
        +++ b/docs/help/separator.html
        @@ -6,7 +6,7 @@
         
         

        Comma (,) and Diamond ()

        , or : Separator

        -

        →full documentation

        +

        →full documentation

        Separates statements in blocks, programs, and arrays. Characters , and are interchangeable with each other and with newline.

        ↗️
            a  3 ,  b  2
         2
        diff --git a/docs/help/string.html b/docs/help/string.html
        index d7d1e74a..18a6db4f 100644
        --- a/docs/help/string.html
        +++ b/docs/help/string.html
        @@ -6,7 +6,7 @@
         
         

        Double Quote (")

        "str": String

        -

        →full documentation

        +

        →full documentation

        Literal notation for a string, or list of characters. Double quotes must be escaped by writing them twice. Any other characters can be included directly.

        ↗️
            2  "string"
         'r'
        diff --git a/help/character.md b/help/character.md
        index 7f1e3487..0dd299ff 100644
        --- a/help/character.md
        +++ b/help/character.md
        @@ -3,7 +3,7 @@
         # Single Quote (`'`)
         
         ## `'c'`: Character
        -[→full documentation](../doc/syntax.md#constants)
        +[→full documentation](../doc/token.md#characters-and-strings)
         
         A character literal whose value is the character between quotes. Any character can be used, even `'` and newline.
         
        diff --git a/help/comment.md b/help/comment.md
        index 9e966a50..ac661530 100644
        --- a/help/comment.md
        +++ b/help/comment.md
        @@ -3,7 +3,7 @@
         # Number Sign (`#`)
         
         ## `#`: Comment
        -[→full documentation](../doc/syntax.md#comments)
        +[→full documentation](../doc/token.md#comments)
         
         Create a comment that extends to the end of the line.
         
        diff --git a/help/infinity.md b/help/infinity.md
        index d26b4614..6af8db68 100644
        --- a/help/infinity.md
        +++ b/help/infinity.md
        @@ -3,7 +3,7 @@
         # Infinity (`∞`)
         
         ## `∞`: Infinity
        -[→full documentation](../doc/syntax.md#constants)
        +[→full documentation](../doc/token.md#numbers)
         
         Mathematical constant Infinity, a numeric literal. Can be negative (`¯∞`).
         
        diff --git a/help/minus.md b/help/minus.md
        index 534d553b..525fa293 100644
        --- a/help/minus.md
        +++ b/help/minus.md
        @@ -3,7 +3,7 @@
         # Macron (`¯`)
         
         ## `¯`: Minus
        -[→full documentation](../doc/syntax.md#constants)
        +[→full documentation](../doc/token.md#numbers)
         
         Prefix before numbers to indicate that they are negative.
         
        diff --git a/help/nullcharacter.md b/help/nullcharacter.md
        index 6c0c18ab..95b9d753 100644
        --- a/help/nullcharacter.md
        +++ b/help/nullcharacter.md
        @@ -3,7 +3,7 @@
         # Commercial At (`@`)
         
         ## `@`: Null Character
        -[→full documentation](../doc/syntax.md#constants)
        +[→full documentation](../doc/token.md#characters-and-strings)
         
         Null character, code point 0 in ASCII. A shortcut character literal.
         
        diff --git a/help/pi.md b/help/pi.md
        index e32ff767..8e106419 100644
        --- a/help/pi.md
        +++ b/help/pi.md
        @@ -3,7 +3,7 @@
         # Pi (`π`)
         
         ## `π`: Pi
        -[→full documentation](../doc/syntax.md#constants)
        +[→full documentation](../doc/token.md#numbers)
         
         The mathematical constant pi, a numeric literal. Can be negative (`¯π`).
         
        diff --git a/help/separator.md b/help/separator.md
        index a64720bf..b38fdd3f 100644
        --- a/help/separator.md
        +++ b/help/separator.md
        @@ -3,7 +3,7 @@
         # Comma (`,`) and Diamond (`⋄`)
         
         ## `,` or `⋄`: Separator
        -[→full documentation](../doc/syntax.md#separators)
        +[→full documentation](../doc/token.md#separators)
         
         Separates statements in blocks, programs, and arrays. Characters `,` and `⋄` are interchangeable with each other and with newline.
         
        diff --git a/help/string.md b/help/string.md
        index a8a4741f..94c0a81d 100644
        --- a/help/string.md
        +++ b/help/string.md
        @@ -3,7 +3,7 @@
         # Double Quote (`"`)
         
         ## `"str"`: String
        -[→full documentation](../doc/syntax.md#constants)
        +[→full documentation](../doc/token.md#characters-and-strings)
         
         Literal notation for a string, or list of characters. Double quotes must be escaped by writing them twice. Any other characters can be included directly.
         
        diff --git a/md.bqn b/md.bqn
        index 8aa21c43..fd325102 100644
        --- a/md.bqn
        +++ b/md.bqn
        @@ -273,9 +273,9 @@ Markdown ← {filename𝕊𝕩:
             items ↩ { ∨´indent ?
               # Require indented lines to form a nested list
               Len ← { ! ∧´(⊑¨𝕩)∊"-+*" ⋄ LenBullet ⊑𝕩 }
        -      start ← »⊸< indent
               # Process items recursively
        -      sub ← Len⊸ProcBullet¨ (1-˜indent×+`start) ⊔ items
        +      groups ← 1 -˜ (indent∾1) × +` (¬indent)∾1
        +      sub ← Len⊸ProcBullet⍟(0<≠)¨ groups ⊔ items
               # Append to the first line, which is assumed to stand alone
               (ProcInline¨ indent ¬⊸/ items) JoinLines∘⋈¨ sub
             ;
        -- 
        cgit v1.2.3