1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
|
<head>
<link href="../favicon.ico" rel="shortcut icon" type="image/x-icon"/>
<link href="../style.css" rel="stylesheet"/>
<title>BQN: Lexical scoping</title>
</head>
<div class="nav">(<a href="https://github.com/mlochbaum/BQN">github</a>) / <a href="../index.html">BQN</a> / <a href="index.html">doc</a></div>
<h1 id="lexical-scoping"><a class="header" href="#lexical-scoping">Lexical scoping</a></h1>
<p>BQN uses lexical scope, like most modern functional programming languages including Javascript, Scheme, and Julia, and like Dyalog APL's dfns (tradfns are dynamically scoped). This document describes how lexical scoping works, and a few small details relevant to BQN's version of it.</p>
<p>In short, every <a href="block.html">block</a> is a separate scope that can refer to identifiers in containing scopes. When evaluated, the block makes a variable for each identifier defined in it (including arguments and operands). The blocks that it contains will now access these variables. In the first level of a block, variables must be defined before they can be used, but in child blocks, a variable can be used regardless of where it's defined, as long as the definition is evaluated before the child block is.</p>
<h2 id="scopes"><a class="header" href="#scopes">Scopes</a></h2>
<p>Scoping is a mechanism that allows the same variable name to refer to different variables depending on program context. For example, the following code uses the name <code><span class='Value'>a</span></code> in two ways: once for a value at the top level, and once locally in a function. With scoping, once you write <code><span class='Brace'>{}</span></code> to create a block, you can define any name you want inside without worrying whether it's taken.</p>
<a class="replLink" title="Open in the REPL" target="_blank" href="https://mlochbaum.github.io/BQN/try.html#code=YSDihpAgNgpGIOKGkCB7IGEgw5cgMSArIGEg4oaQIPCdlakgfQoKRiA0ICAgICMgU2V0cyBh4oaQNCBpbnRlcm5hbGx5CmEgICAgICAjIFRoZSBvdXRlciBhIGlzIHVuY2hhbmdlZA==">↗️</a><pre> <span class='Value'>a</span> <span class='Gets'>←</span> <span class='Number'>6</span>
<span class='Function'>F</span> <span class='Gets'>←</span> <span class='Brace'>{</span> <span class='Value'>a</span> <span class='Function'>×</span> <span class='Number'>1</span> <span class='Function'>+</span> <span class='Value'>a</span> <span class='Gets'>←</span> <span class='Value'>𝕩</span> <span class='Brace'>}</span>
<span class='Function'>F</span> <span class='Number'>4</span> <span class='Comment'># Sets a←4 internally
</span>20
<span class='Value'>a</span> <span class='Comment'># The outer a is unchanged
</span>6
</pre>
<p>Above, the scope of the first <code><span class='Value'>a</span></code> is the entire program, while the scope of the second <code><span class='Value'>a</span></code> is limited to the body of <code><span class='Function'>F</span></code>. So one form of context is that a name might mean different things depending on which block contains it. But even the exact same instance of a name in the source code might mean multiple things! A second kind of context is which evaluation of a block uses the name.</p>
<p>Without this ability BQN would be pretty limited: for example, an <a href="oop.html">object</a>'s fields are variables. If the variable value didn't depend on what object contained it, there could effectively only be one instance of each object! While it's needed all the time, the most direct way to demonstrate one name meaning multiple things is with recursion. The (fragile) function below labels each element in a nested list structure with its index in the list containing it.</p>
<a class="replLink" title="Open in the REPL" target="_blank" href="https://mlochbaum.github.io/BQN/try.html#code=TGFiZWwg4oaQIHsgaeKGkOKGleKJoPCdlakg4ouEIGkg4omNIPCdlYrijZ89wqgg8J2VqSB9CgpMYWJlbCDin6giYWIi4oC/OCwgN+KAvzbigL814p+p">↗️</a><pre> <span class='Function'>Label</span> <span class='Gets'>←</span> <span class='Brace'>{</span> <span class='Value'>i</span><span class='Gets'>←</span><span class='Function'>↕≠</span><span class='Value'>𝕩</span> <span class='Separator'>⋄</span> <span class='Value'>i</span> <span class='Function'>≍</span> <span class='Function'>𝕊</span><span class='Modifier2'>⍟</span><span class='Function'>=</span><span class='Modifier'>¨</span> <span class='Value'>𝕩</span> <span class='Brace'>}</span>
<span class='Function'>Label</span> <span class='Bracket'>⟨</span><span class='String'>"ab"</span><span class='Ligature'>‿</span><span class='Number'>8</span><span class='Separator'>,</span> <span class='Number'>7</span><span class='Ligature'>‿</span><span class='Number'>6</span><span class='Ligature'>‿</span><span class='Number'>5</span><span class='Bracket'>⟩</span>
┌─
╵ 0 1
┌─ ┌─
╵ 0 1 ╵ 0 1 2
┌─ 8 7 6 5
╵ 0 1 ┘
'a' 'b'
┘
┘
┘
</pre>
<p>Each call creates the <a href="range.html">list of indices</a> <code><span class='Value'>i</span></code>, then calls itself using <code><span class='Function'>𝕊</span></code> on each element of <code><span class='Value'>𝕩</span></code> <a href="repeat.html">if</a> it's a list, then <a href="couple.html">couples</a> <code><span class='Value'>i</span></code> to the result. This requires <code><span class='Value'>i</span></code> to be unaffected by other calls to the function, which works because <code><span class='Value'>i</span></code> is scoped not only to the source code location but also to the particular evaluation of the block that creates it.</p>
<p>These examples probably work like you expect—they're meant to highlight the features that scoping should have, in order to help show how less intuitive cases work later on.</p>
<h2 id="visibility"><a class="header" href="#visibility">Visibility</a></h2>
<p>A scope can view and modify (with <code><span class='Gets'>↩</span></code>) variables in other scopes that contain it. We say these variables are visible in the inner scopes. Variables at the top level of a program are visible to all the code in that program, so that we might call them "global". That would be a little misleading though, because for example each file is an entire program, so if one file is imported from another then it can't read the first file's variables.</p>
<a class="replLink" title="Open in the REPL" target="_blank" href="https://mlochbaum.github.io/BQN/try.html#code=Y291bnRlciDihpAgMAppbmMg4oaQIDYKQ291bnQg4oaQIHsgY291bnRlciAr4oapIPCdlakgw5cgaW5jIH0KCkNvdW50IDAKQ291bnQgMQpDb3VudCAxCkNvdW50IDU=">↗️</a><pre> <span class='Value'>counter</span> <span class='Gets'>←</span> <span class='Number'>0</span>
<span class='Value'>inc</span> <span class='Gets'>←</span> <span class='Number'>6</span>
<span class='Function'>Count</span> <span class='Gets'>←</span> <span class='Brace'>{</span> <span class='Value'>counter</span> <span class='Function'>+</span><span class='Gets'>↩</span> <span class='Value'>𝕩</span> <span class='Function'>×</span> <span class='Value'>inc</span> <span class='Brace'>}</span>
<span class='Function'>Count</span> <span class='Number'>0</span>
0
<span class='Function'>Count</span> <span class='Number'>1</span>
6
<span class='Function'>Count</span> <span class='Number'>1</span>
12
<span class='Function'>Count</span> <span class='Number'>5</span>
42
</pre>
<p>So the <code><span class='Function'>Count</span></code> function above increments and prints a global <code><span class='Value'>counter</span></code> by a global amount <code><span class='Value'>inc</span></code>, which is visible in <code><span class='Function'>Count</span></code> because it's visible everywhere. Or, not quite… if a block defines its <em>own</em> copy of <code><span class='Value'>inc</span></code>, then it will lose access to the outer one. However, code that comes before that definition can still see the outer variable. So it can copy its value at the start of the block (this won't reflect later changes to the value. Don't shadow variables you want to use!).</p>
<a class="replLink" title="Open in the REPL" target="_blank" href="https://mlochbaum.github.io/BQN/try.html#code=eyBpbmPihpAzIOKLhCBpbmMgfSAgIyBpbmMgaXMgc2hhZG93ZWQKCmluYyAgIyBCdXQgaXQncyBzdGlsbCBoZXJlIGluIHRoZSBvdXRlciBzY29wZQoKeyBh4oaQaW5jIOKLhCBpbmPihpAzIOKLhCBhIH0gICMgUmVhZCBiZWZvcmUgc2hhZG93aW5n">↗️</a><pre> <span class='Brace'>{</span> <span class='Value'>inc</span><span class='Gets'>←</span><span class='Number'>3</span> <span class='Separator'>⋄</span> <span class='Value'>inc</span> <span class='Brace'>}</span> <span class='Comment'># inc is shadowed
</span>3
<span class='Value'>inc</span> <span class='Comment'># But it's still here in the outer scope
</span>6
<span class='Brace'>{</span> <span class='Value'>a</span><span class='Gets'>←</span><span class='Value'>inc</span> <span class='Separator'>⋄</span> <span class='Value'>inc</span><span class='Gets'>←</span><span class='Number'>3</span> <span class='Separator'>⋄</span> <span class='Value'>a</span> <span class='Brace'>}</span> <span class='Comment'># Read before shadowing
</span>6
</pre>
<p>Each scope can only define a given name once. Trying to shadow a name that's in the current scope and not a higher one gives an error at compilation.</p>
<a class="replLink" title="Open in the REPL" target="_blank" href="https://mlochbaum.github.io/BQN/try.html#code=eyBpbmPihpAzIOKLhCBpbmPihpA0IH0=">↗️</a><pre> <span class='Brace'>{</span> <span class='Value'>inc</span><span class='Gets'>←</span><span class='Number'>3</span> <span class='Separator'>⋄</span> <span class='Value'>inc</span><span class='Gets'>←</span><span class='Number'>4</span> <span class='Brace'>}</span>
ERROR
</pre>
<p>Let's go all in on shadowing and make a modifier that creates its own copies of <code><span class='Value'>counter</span></code> and <code><span class='Value'>inc</span></code>, returning a custom version of the <code><span class='Function'>Count</span></code> function above.</p>
<a class="replLink" title="Open in the REPL" target="_blank" href="https://mlochbaum.github.io/BQN/try.html#code=X21ha2VDb3VudCDihpAgeyBjb3VudGVy4oC/aW5j4oaQ8J2VlyDii4QgeyBjb3VudGVyICvihqkg8J2VqSDDlyBpbmMgfSB9CgpDM183IOKGkCAz4oC/NyBfbWFrZUNvdW50ICAjIFN0YXJ0IGF0IDM7IGluYyBieSA3CgpDM183IDAKQzNfNyAxCkNvdW50IDAgICMgT2xkIGNvdW50ZXIgc3RheXMgdGhlIHNhbWU=">↗️</a><pre> <span class='Modifier'>_makeCount</span> <span class='Gets'>←</span> <span class='Brace'>{</span> <span class='Value'>counter</span><span class='Ligature'>‿</span><span class='Value'>inc</span><span class='Gets'>←</span><span class='Value'>𝕗</span> <span class='Separator'>⋄</span> <span class='Brace'>{</span> <span class='Value'>counter</span> <span class='Function'>+</span><span class='Gets'>↩</span> <span class='Value'>𝕩</span> <span class='Function'>×</span> <span class='Value'>inc</span> <span class='Brace'>}</span> <span class='Brace'>}</span>
<span class='Function'>C3_7</span> <span class='Gets'>←</span> <span class='Number'>3</span><span class='Ligature'>‿</span><span class='Number'>7</span> <span class='Modifier'>_makeCount</span> <span class='Comment'># Start at 3; inc by 7
</span>
<span class='Function'>C3_7</span> <span class='Number'>0</span>
3
<span class='Function'>C3_7</span> <span class='Number'>1</span>
10
<span class='Function'>Count</span> <span class='Number'>0</span> <span class='Comment'># Old counter stays the same
</span>42
</pre>
<p>The function <code><span class='Function'>C3_7</span></code> uses the versions of <code><span class='Value'>counter</span></code> and <code><span class='Value'>inc</span></code> created in <code><span class='Modifier'>_makeCount</span></code>, even though it's called not from inside <code><span class='Modifier'>_makeCount</span></code>, but from the top-level program. This is what it means for BQN's scoping to be lexical rather than dynamic. Which identifiers are visible is determined by where the code containing them is located in the source code, not how it's called at runtime. The static nature of lexical scoping makes it much easier to keep track of how variables are used (for compilers, this means optimization opportunities), and for this reason dynamic scoping is very rare in programming languages today.</p>
<h3 id="post-definition"><a class="header" href="#post-definition">Post-definition</a></h3>
<p>In the top level a block, a variable can only be used after it's defined, and the compiler rejects any code that tries to use an undefined variable. But blocks contained in that block see variables it defines regardless of the positioning. Below, <code><span class='Function'>PlusC</span></code> references the variable <code><span class='Value'>c</span></code> that's defined after it (but when code is executed one line at a time like it is here, I still have to put both definitions on the same line so they are compiled together).</p>
<a class="replLink" title="Open in the REPL" target="_blank" href="https://mlochbaum.github.io/BQN/try.html#code=UGx1c0Mg4oaQIHsg8J2VqStjIH0g4ouEIGPihpDCrzEKUGx1c0MgNw==">↗️</a><pre> <span class='Function'>PlusC</span> <span class='Gets'>←</span> <span class='Brace'>{</span> <span class='Value'>𝕩</span><span class='Function'>+</span><span class='Value'>c</span> <span class='Brace'>}</span> <span class='Separator'>⋄</span> <span class='Value'>c</span><span class='Gets'>←</span><span class='Number'>¯1</span>
<span class='Function'>PlusC</span> <span class='Number'>7</span>
6
</pre>
<p>But you'll still get an error if the variable is used before its definition is run. Unlike the single-level case, this is a runtime error and only appears when the variable is actually accessed.</p>
<a class="replLink" title="Open in the REPL" target="_blank" href="https://mlochbaum.github.io/BQN/try.html#code=eyAyK2QgfSDii4QgZOKGkMKvMg==">↗️</a><pre> <span class='Brace'>{</span> <span class='Number'>2</span><span class='Function'>+</span><span class='Value'>d</span> <span class='Brace'>}</span> <span class='Separator'>⋄</span> <span class='Value'>d</span><span class='Gets'>←</span><span class='Number'>¯2</span>
ERROR
</pre>
<p>Why define things this way? It's easier to see if you imagine the variable used is also a function. It's normal for a function to call other functions defined at the top level, of course. And it would be pretty unpleasant for BQN to enforce a specific ordering for them. It would also make recursive functions impossible except by using <code><span class='Function'>𝕊</span></code>, and mutually recursive ones completely impossible. A simple rule that makes all these things just work smoothly seems much better than any alternative.</p>
<h2 id="closures"><a class="header" href="#closures">Closures</a></h2>
<p>Let's run <code><span class='Modifier'>_makeCount</span></code> from above a few more times.</p>
<a class="replLink" title="Open in the REPL" target="_blank" href="https://mlochbaum.github.io/BQN/try.html#code=QzRfMiDihpAgNOKAvzIgX21ha2VDb3VudCAgIyBTdGFydCBhdCA0OyBpbmNyZW1lbnQgYnkgMgpDMV80IOKGkCAx4oC/NCBfbWFrZUNvdW50ICAjICAgICAgICAgIDE7ICAgICAgICAgICAgICA0CgpDNF8yIDAKQzFfNCAwCkM0XzIgMTAKQzFfNCAxMApDNF8yIDAKQzNfNyAwICAjIFRoZSBmaXJzdCBvbmUncyBzdGlsbCBhcm91bmQgdG9v">↗️</a><pre> <span class='Function'>C4_2</span> <span class='Gets'>←</span> <span class='Number'>4</span><span class='Ligature'>‿</span><span class='Number'>2</span> <span class='Modifier'>_makeCount</span> <span class='Comment'># Start at 4; increment by 2
</span> <span class='Function'>C1_4</span> <span class='Gets'>←</span> <span class='Number'>1</span><span class='Ligature'>‿</span><span class='Number'>4</span> <span class='Modifier'>_makeCount</span> <span class='Comment'># 1; 4
</span>
<span class='Function'>C4_2</span> <span class='Number'>0</span>
4
<span class='Function'>C1_4</span> <span class='Number'>0</span>
1
<span class='Function'>C4_2</span> <span class='Number'>10</span>
24
<span class='Function'>C1_4</span> <span class='Number'>10</span>
41
<span class='Function'>C4_2</span> <span class='Number'>0</span>
24
<span class='Function'>C3_7</span> <span class='Number'>0</span> <span class='Comment'># The first one's still around too
</span>10
</pre>
<p>Each result keeps its own counter and the different copies don't interfere with each other. This is because every call to <code><span class='Modifier'>_makeCount</span></code> is isolated, happening in its own world. We say that whenever a block begins execution it creates an <em>environment</em> where all its variables are stored. This environment might even be exposed later on, as a <a href="namespace.html">namespace</a>.</p>
<p>Each counter function has access to the environment containing its <code><span class='Value'>counter</span></code> and <code><span class='Value'>inc</span></code>, even though the block that created that environment (<code><span class='Modifier'>_makeCount</span></code>) has finished execution—it must have finished, since we are now using the function it returns on the last line. There's nothing particularly weird about this; just because a block creates an environment when it starts doesn't mean it has to destroy it when it finishes. From the mathematical perspective, it's easiest to say the environment exists forever, but a practical implementation will perform garbage collection to free environments that are no longer reachable.</p>
<p>Since a function like <code><span class='Function'>C1_4</span></code> maintains access to all the variables it needs to run, we say it <em>encloses</em> those variables, and call it a <em>closure</em>. It doesn't need to modify them. For example, even the following definition of <code><span class='Value'>stdDev</span></code> is a closure.</p>
<pre><span class='Value'>stdDev</span> <span class='Gets'>←</span> <span class='Brace'>{</span>
<span class='Comment'># Arithmetic mean
</span> <span class='Function'>Mean</span> <span class='Gets'>←</span> <span class='Function'>+</span><span class='Modifier'>´</span> <span class='Function'>÷</span> <span class='Function'>≠</span>
<span class='Comment'># Return this function
</span> <span class='Brace'>{</span>
<span class='Function'>Mean</span><span class='Modifier2'>⌾</span><span class='Paren'>(</span><span class='Function'>×</span><span class='Modifier'>˜</span><span class='Paren'>)</span> <span class='Value'>𝕩</span> <span class='Function'>-</span> <span class='Function'>Mean</span> <span class='Value'>𝕩</span>
<span class='Brace'>}</span>
<span class='Brace'>}</span>
</pre>
<p>The variable <code><span class='Function'>Mean</span></code> is visible only within the outer immediate block. The only way it can be accessed is by code in this block: the two calls in the returned function, which will later be renamed <code><span class='Value'>stdDev</span></code>. Nothing in the block modifies it, so its value is constant. It's just a little utility that exists only for code in the block. Making it visible elsewhere is as simple as moving it out of the block, but it's best not to do this without reason. Keeping a variable in the smallest possible scope makes it easier to understand the program, because it reduces the amount of information needed to understand scopes where that variable doesn't apply.</p>
<p>Neither the specification nor a typical implementation keep track of what is and isn't a closure, although an advanced interpreter will probably work with some related properties. The existence of closures is an ordinary feature of lexical scoping and not a special case. However, it's sometimes a useful term for discussing the operation of a program. We might define a closure as a block that can be run and access variables from a parent scope even after the block that created that scope finishes execution.</p>
<h3 id="environments-form-a-tree"><a class="header" href="#environments-form-a-tree">Environments form a tree</a></h3>
<p>So a block has access to every environment that it might need a variable from, for as long as it needs. This idea is a little fuzzy, so let's clarify by describing how an implementation would support figure out what can access where.</p>
<p>The mechanism is that each environment can have a <em>parent</em> environment (the topmost environment, which corresponds to the entire program, has no parent). When a variable is accessed, it might be in the current environment, or its parent, or that environment's parent, and so on. Every environment corresponds to one block in the source code, and its parent corresponds to the parent block, so a compiler can figure out how many levels up it will have to go based on the source code.</p>
<p>We've seen that one block can create many environments. An environment can have only one parent, but many children, so environments form a tree. A forest to be precise, as one execution of BQN can involve multiple programs.</p>
<p>How does an environment know which of the many environments corresponding to the parent scope is its parent? This information is saved when the block is reached in the program and a <em>block instance</em> is created. Unless it's an immediate block, the block instance won't be run right away: a block instance isn't the same as a block evaluation. But each block evaluation starts with a block instance, and that's where it gets the parent environment. Unlike block evaluation, which can happen anywhere, a block instance is created only during evaluation of the parent block. So the saved parent environment is simply the current environment.</p>
<h2 id="mutation"><a class="header" href="#mutation">Mutation</a></h2>
<p>The value of a variable can be modified with <code><span class='Gets'>↩</span></code>. It's similar to definition <code><span class='Gets'>←</span></code> in that it sets the value of the variable, but the way it interacts with scoping is completely different. Defining creates a new variable in the current scope, and modifying refers to an existing variable in the current scope or a parent. In scoping terms, modifying is more like an ordinary variable reference than a definition.</p>
<p>When a variable's modified, functions with access to it see the new value. They have access to the variable, not any particular value that it has.</p>
<a class="replLink" title="Open in the REPL" target="_blank" href="https://mlochbaum.github.io/BQN/try.html#code=ZmFjdG9yIOKGkCAzCk11bCDihpAgeyBmYWN0b3Igw5cg8J2VqSB9CgpNdWwgNgpmYWN0b3Ig4oapIDUKTXVsIDYgICAjIEEgbmV3IHJlc3VsdA==">↗️</a><pre> <span class='Value'>factor</span> <span class='Gets'>←</span> <span class='Number'>3</span>
<span class='Function'>Mul</span> <span class='Gets'>←</span> <span class='Brace'>{</span> <span class='Value'>factor</span> <span class='Function'>×</span> <span class='Value'>𝕩</span> <span class='Brace'>}</span>
<span class='Function'>Mul</span> <span class='Number'>6</span>
18
<span class='Value'>factor</span> <span class='Gets'>↩</span> <span class='Number'>5</span>
<span class='Function'>Mul</span> <span class='Number'>6</span> <span class='Comment'># A new result
</span>30
</pre>
<p>Only code with access to a variable can modify it! This means that if none of the code in a variable's scope modifies it, then the variable is a constant in each environment that contains it (not necessarily across environments). That is, constant once it's defined: it's still possible to get an error if the variable is accessed before being defined.</p>
<a class="replLink" title="Open in the REPL" target="_blank" href="https://mlochbaum.github.io/BQN/try.html#code=eyB7IGEgfSDii4QgYeKGkDQgfQ==">↗️</a><pre> <span class='Brace'>{</span> <span class='Brace'>{</span> <span class='Value'>a</span> <span class='Brace'>}</span> <span class='Separator'>⋄</span> <span class='Value'>a</span><span class='Gets'>←</span><span class='Number'>4</span> <span class='Brace'>}</span>
ERROR
</pre>
<p>With lexical scoping, variable mutation automatically leads to mutable data. This is because a function or modifier that depends on the variable value changes its behavior when the variable changes. For further discussion see the documentation on <a href="oop.html#mutability">mutable objects</a>.</p>
|