aboutsummaryrefslogtreecommitdiff
path: root/docs/doc/context.html
blob: b08f70a0a729192a64ba36dca40b5899e788dab1 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
<head>
  <link href="../favicon.ico" rel="shortcut icon" type="image/x-icon"/>
  <link href="../style.css" rel="stylesheet"/>
  <title>BQN's context-free grammar</title>
</head>
<div class="nav"><a href="https://github.com/mlochbaum/BQN">BQN</a> / <a href="../index.html">main</a> / <a href="index.html">doc</a></div>
<h1 id="bqns-context-free-grammar">BQN's context-free grammar</h1>
<p>APL has a problem. To illustrate, let's look at an APL expression:</p>
<pre><span class='Value'>a</span> <span class='Value'>b</span> <span class='Value'>c</span> <span class='Value'>d</span> <span class='Value'>e</span>
</pre>
<p>It is impossible to say anything about this sentence! Is <code><span class='Value'>c</span></code> a dyadic operator being applied to <code><span class='Value'>b</span></code> and <code><span class='Value'>d</span></code>, or are <code><span class='Value'>b</span></code> and <code><span class='Value'>d</span></code> two dyadic functions being applied to arrays? In contrast, expressions in C-like or Lisp-like languages show their structure of application:</p>
<pre><span class='Value'>b</span><span class='Paren'>(</span><span class='Value'>a</span><span class='Separator'>,</span> <span class='Value'>d</span><span class='Paren'>(</span><span class='Value'>c</span><span class='Paren'>)(</span><span class='Value'>e</span><span class='Paren'>))</span>
<span class='Paren'>(</span><span class='Value'>b</span> <span class='Value'>a</span> <span class='Paren'>((</span><span class='Value'>d</span> <span class='Value'>c</span><span class='Paren'>)</span> <span class='Value'>e</span><span class='Paren'>))</span>
</pre>
<p>In each case, some values are used as inputs to functions while others are the functions being applied. The result of a function can be used either as an input or as a function again. These expressions correspond to the APL expression where <code><span class='Value'>a</span></code> and <code><span class='Value'>e</span></code> are arrays, <code><span class='Value'>b</span></code> and <code><span class='Value'>c</span></code> are functions, and <code><span class='Value'>d</span></code> is a monadic operator. However, these syntactic classes have to be known to see what the APL expression is doing—they are a form of context that is required for a reader to know the grammatical structure of the expression. In a context-free grammar like that of simple C or Lisp expressions, a value's grammatical role is part of the expression itself, indicated with parentheses: they come after the function in C and before it in Lisp. Of course, a consequence of using parentheses in this way is having a lot of parentheses. BQN uses a different method to annotate grammatical role:</p>
<pre><span class='Value'>a</span> <span class='Function'>B</span> <span class='Function'>C</span> <span class='Modifier'>_d</span> <span class='Value'>e</span>
</pre>
<p>Here, the lowercase spelling indicates that <code><span class='Value'>a</span></code> and <code><span class='Value'>e</span></code> are to be treated as subjects (&quot;arrays&quot; in APL) while the uppercase spelling of variables <code><span class='Function'>B</span></code> and <code><span class='Function'>C</span></code> are used as functions and <code><span class='Modifier'>_d</span></code> is a 1-modifier (&quot;monadic operator&quot;). Like parentheses for function application, the spelling is not inherent to the variable values used, but instead indicates their grammatical role in this particular expression. A variable has no inherent spelling and can be used in any role, so the names <code><span class='Value'>a</span></code>, <code><span class='Function'>A</span></code>, <code><span class='Modifier'>_a</span></code>, and <code><span class='Modifier2'>_a_</span></code> all refer to exact same variable, but in different roles; typically we use the lowercase name to refer to the variable in isolation—all values are nouns when speaking about them in English. While we still don't know anything about what values <code><span class='Value'>a</span></code>, <code><span class='Value'>b</span></code>, <code><span class='Value'>c</span></code>, and so on have, we know how they interact in the line of code above.</p>
<h2 id="is-grammatical-context-really-a-problem">Is grammatical context really a problem?</h2>
<p>Yes, in the sense of <a href="../commentary/problems.html">problems with BQN</a>. A grammar that uses context is harder for humans to read and machines to execute. A particular difficulty is that parts of an expression you don't yet understand can interfere with parts you do, making it difficult to work through an unknown codebase.</p>
<p>One difficulty beginners to APL will encounter is that code in APL at first appears like a string of undifferentiated symbols. For example, a tacit Unique Mask implementation <code><span class='Value'>⍳⍨</span><span class='Function'>=</span><span class='Value'></span><span class='Modifier2'></span><span class='Function'></span></code> consists of six largely unfamiliar characters with little to distinguish them (in fact, the one obvious bit of structure, the repeated <code><span class='Value'></span></code>, is misleading as it means different things in each case!). Simply placing parentheses into the expression, like <code><span class='Paren'>(</span><span class='Value'>⍳⍨</span><span class='Paren'>)</span><span class='Function'>=</span><span class='Paren'>(</span><span class='Value'></span><span class='Modifier2'></span><span class='Function'></span><span class='Paren'>)</span></code>, can be a great help to a beginner, and part of learning APL is to naturally see where the parentheses should go. The equivalent BQN expression, <code><span class='Function'></span><span class='Modifier'>˜</span><span class='Function'>=↕</span><span class='Modifier2'></span><span class='Function'></span></code>, will likely appear equally intimidating at first, but the path to learning which things apply to which is much shorter: rather than learning the entire list of APL primitives, a beginner just needs to know that superscript characters like <code><span class='Modifier'>˜</span></code> are 1-modifiers and characters like <code><span class='Modifier2'></span></code> with unbroken circles are 2-modifiers before beginning to learn the BQN grammar that will explain how to tie the various parts together.</p>
<p>This sounds like a distant concern to a master of APL or a computer that has no difficulty memorizing a few dozen glyphs. Quite the opposite: the same concern applies to variables whenever you begin work with an unfamiliar codebase! Many APL programmers even enforce variable name conventions to ensure they know the class of a variable. By having such a system built in, BQN keeps you from having to rely on programmers following a style guide, and also allows greater flexibility, including <a href="functional.html">functional programming</a>, as we'll see later.</p>
<p>Shouldn't a codebase define all the variables it uses, so we can see their class from the definition? Not always: consider that in a language with libraries, code might be imported from dependencies. Many APLs also have some dynamic features that can allow a variable to have more than one class, such as the <code><span class='Value'></span><span class='Gets'></span><span class='Function'></span></code> pattern in a dfn that makes <code><span class='Value'></span></code> an array in the dyadic case but a function in the monadic case. Regardless, searching for a definition somewhere in the code is certainly a lot more work than knowing the class just from looking! One final difficulty is that even one unknown can delay understanding of an entire expression. Suppose in <code><span class='Function'>A</span> <span class='Function'>B</span> <span class='Value'>c</span></code>, <code><span class='Function'>B</span></code> is a function and <code><span class='Value'>c</span></code> is an array, and both values are known to be constant. If <code><span class='Function'>A</span></code> is known to be a function (even if its value is not yet known), its right argument <code><span class='Function'>B</span> <span class='Value'>c</span></code> can be evaluated ahead of time. But if <code><span class='Function'>A</span></code>'s type isn't known, it's impossible to know if this optimization is worth it, because if it is an array, <code><span class='Function'>B</span></code> will instead be called dyadically.</p>
<h2 id="bqns-spelling-system">BQN's spelling system</h2>
<p>BQN's expression grammar is a simplified version of the typical APL, removing some oddities like niladic functions and the two-glyph Outer Product operator. Every value can be used in any of four syntactic roles:</p>
<table>
<thead>
<tr>
<th>BQN</th>
<th>APL</th>
<th>J</th>
</tr>
</thead>
<tbody>
<tr>
<td>Subject</td>
<td>Array</td>
<td>Noun</td>
</tr>
<tr>
<td>Function</td>
<td>Function</td>
<td>Verb</td>
</tr>
<tr>
<td>1-modifier</td>
<td>Monadic operator</td>
<td>Adverb</td>
</tr>
<tr>
<td>2-modifier</td>
<td>Dyadic operator</td>
<td>Conjunction</td>
</tr>
</tbody>
</table>
<p>Unlike variables, BQN primitives have only one spelling, and a fixed role (but their values can be used in a different role by storing them in variables). Superscript glyphs <code><span class='Modifier'>˜¨˘⁼⌜´˝`</span></code> are used for 1-modifiers, and glyphs <code><span class='Modifier2'>∘○⊸⟜⌾⊘◶⚇⎉⍟</span></code> with an unbroken circle are 2-modifiers. Other primitives are functions. String and numeric literals are subjects.</p>
<p>BQN's variables use another system, where the spelling indicates how the variable's value is used. A variable spelled with a lowercase first letter, like <code><span class='Value'>var</span></code>, is a subject. Spelled with an uppercase first letter, like <code><span class='Function'>Var</span></code>, it is a function. Underscores are placed where operands apply to indicate a 1-modifier <code><span class='Modifier'>_var</span></code> or 2-modifier <code><span class='Modifier2'>_var_</span></code>. Other than the first letter or underscore, variables are case-insensitive.</p>
<p>The associations between spelling and syntactic role are considered part of BQN's <a href="../spec/token.html">token formation rules</a>.</p>
<p>One rule for typing is also best considered to be a pre-parsing rule like the spelling system: the role of a brace construct <code><span class='Brace'>{}</span></code> with no header is determined by which special arguments it uses: it's a subject if there are none, but a <code><span class='Value'>𝕨</span></code> or <code><span class='Value'>𝕩</span></code> makes it at least a function, an <code><span class='Function'>𝔽</span></code> makes it a 1- or 2-modifier, and a <code><span class='Function'>𝔾</span></code> always makes it a 2-modifier.</p>
<h2 id="bqns-grammar">BQN's grammar</h2>
<p>A formal treatment is included in <a href="../spec/grammar.html">the spec</a>. BQN's grammar—the ways syntactic roles interact—follows the original APL model (plus trains) closely, with allowances for new features like <a href="arrayrepr.html#list-literals">list notation</a>. In order to keep BQN's syntax context-free, the syntactic role of any expression must be known from its contents, just like tokens.</p>
<p>Here is a table of the APL-derived modifier and function application rules:</p>
<table>
<thead>
<tr>
<th>left</th>
<th>main</th>
<th>right</th>
<th>output</th>
<th>name</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td><code><span class='Function'>F</span></code></td>
<td><code><span class='Value'>x</span></code></td>
<td>Subject</td>
<td>Monadic function</td>
</tr>
<tr>
<td><code><span class='Value'>w</span></code></td>
<td><code><span class='Function'>F</span></code></td>
<td><code><span class='Value'>x</span></code></td>
<td>Subject</td>
<td>Dyadic function</td>
</tr>
<tr>
<td></td>
<td><code><span class='Function'>F</span></code></td>
<td><code><span class='Function'>G</span></code></td>
<td>Function</td>
<td>2-train</td>
</tr>
<tr>
<td><code><span class='Function'>F</span><span class='Value'>*</span></code></td>
<td><code><span class='Function'>G</span></code></td>
<td><code><span class='Function'>H</span></code></td>
<td>Function</td>
<td>3-train</td>
</tr>
<tr>
<td><code><span class='Function'>F</span><span class='Value'>*</span></code></td>
<td><code><span class='Modifier'>_m</span></code></td>
<td></td>
<td>Function</td>
<td>1-Modifier</td>
</tr>
<tr>
<td><code><span class='Function'>F</span><span class='Value'>*</span></code></td>
<td><code><span class='Modifier2'>_c_</span></code></td>
<td><code><span class='Function'>G</span><span class='Value'>*</span></code></td>
<td>Function</td>
<td>2-Modifier</td>
</tr>
<tr>
<td></td>
<td><code><span class='Modifier2'>_c_</span></code></td>
<td><code><span class='Function'>G</span><span class='Value'>*</span></code></td>
<td>1-Modifier</td>
<td>Partial application</td>
</tr>
<tr>
<td><code><span class='Function'>F</span><span class='Value'>*</span></code></td>
<td><code><span class='Modifier2'>_c_</span></code></td>
<td></td>
<td>1-Modifier</td>
<td>Partial application</td>
</tr>
</tbody>
</table>
<p>A function with an asterisk indicates that a subject can also be used: in these positions there is no difference between function and subject spellings. Modifier applications bind more tightly than functions, and associate left-to-right while functions associate right-to-left.</p>
<p>BQN lists can be written with angle brackets <code><span class='Bracket'></span><span class='Value'>elt0</span><span class='Separator'>,</span><span class='Value'>elt1</span><span class='Separator'>,</span><span class='Value'></span><span class='Bracket'></span></code> or ligatures <code><span class='Value'>elt0</span><span class='Ligature'></span><span class='Value'>elt1</span><span class='Ligature'></span><span class='Value'></span></code>. In either case the elements can have any type, and the result is a subject.</p>
<p>The statements in a block can also be any role, including the return value at the end. These roles have no effect: outside of braces, an immediate block is a subject, a function always returns a subject, and a modifier always returns a function, regardless of how these objects were defined.</p>
<h2 id="mixing-roles">Mixing roles</h2>
<p>BQN's value types align closely with its syntactic roles: functions, 1-modifiers, and 2-modifiers are all types (<em>operation</em> types) as well as roles, while the other types (<em>data</em> types) are split into numbers, characters, and arrays. This is no accident, and usually values will be used in roles that correspond to their underlying type. However, the ability to use a role that doesn't match the type is also useful.</p>
<p>Any type can be passed as an argument to a function, or as an operand, by treating it as a subject. This means that BQN fully supports Lisp-style <a href="functional.html">functional programming</a>, where functions can be used as first-class entities.</p>
<p>It can also be useful to treat a value of a data type as a function, in which case it applies as a constant function. This rule is useful with most built-in modifiers. For example, <code><span class='Function'>F</span><span class='Modifier2'></span><span class='Number'>1</span></code> uses a constant for the rank even though in general a function can be given, and if <code><span class='Value'>a</span></code> is an array then <code><span class='Value'>a</span><span class='Modifier2'></span><span class='Paren'>(</span><span class='Value'>b</span><span class='Modifier2'></span><span class='Function'>/</span><span class='Paren'>)</span></code> inserts the values in <code><span class='Value'>a</span></code> into the positions selected by <code><span class='Value'>b</span></code>, ignoring the old values rather than applying a function to them.</p>
<p>Other mixes of roles are generally not useful. While a combination such as treating a function as a modifier is allowed, attempting to apply it to an operand will fail. Only a 1-modifier can be applied as a 1-modifier and only a 2-modifier can be applied as a 2-modifier. Only a function or data can be applied as a function.</p>
<p>It's also worth noting that a subject may unexpectedly be a function! For example, the result of <code><span class='Value'>𝕨</span><span class='Modifier'>˜</span><span class='Value'>𝕩</span></code> may not always be <code><span class='Value'>𝕨</span></code>. <code><span class='Value'>𝕨</span><span class='Modifier'>˜</span><span class='Value'>𝕩</span></code> is exactly identical to <code><span class='Function'>𝕎</span><span class='Modifier'>˜</span><span class='Value'>𝕩</span></code>, which gives <code><span class='Value'>𝕩</span><span class='Function'>𝕎</span><span class='Value'>𝕩</span></code>. If <code><span class='Function'>𝕎</span></code> is a number, character, or array, that's the same as <code><span class='Value'>𝕨</span></code>, but if it is a function, then it will be applied.</p>
<p>The primary way to change the role of a value in BQN is to use a name, including one of the special names for inputs to a brace function or modifier. In particular, you can use <code><span class='Brace'>{</span><span class='Function'>𝔽</span><span class='Brace'>}</span></code> to convert a subject operand into a function. Converting a function to a subject is more difficult. Often an array of functions is wanted, in which case they can be stranded together; otherwise it's probably best to give the function a name. Picking a function out of a list, for example <code><span class='Function'></span><span class='Bracket'></span><span class='Function'>+</span><span class='Bracket'></span></code>, will give it as a subject.</p>