aboutsummaryrefslogtreecommitdiff
path: root/docs/spec
diff options
context:
space:
mode:
authorMarshall Lochbaum <mwlochbaum@gmail.com>2020-07-28 15:55:50 -0400
committerMarshall Lochbaum <mwlochbaum@gmail.com>2020-07-28 16:29:50 -0400
commit3fd2b860b26878470011fc18cb8351867a5d7639 (patch)
tree9b60b59fcea66dc32267eeb3fda86211c5ea35ee /docs/spec
parent4a3fdd8225e90abb703a0e1cd1f89ff6aa7b7538 (diff)
Specify variable scoping
Diffstat (limited to 'docs/spec')
-rw-r--r--docs/spec/evaluate.html2
-rw-r--r--docs/spec/index.html2
-rw-r--r--docs/spec/scope.html21
3 files changed, 23 insertions, 2 deletions
diff --git a/docs/spec/evaluate.html b/docs/spec/evaluate.html
index 9f9781d3..38fecd93 100644
--- a/docs/spec/evaluate.html
+++ b/docs/spec/evaluate.html
@@ -11,7 +11,7 @@
<p>The only remaining step before evaluating the <code><span class='Function'>BODY</span></code> is to bind the inputs and other names. Special names are always bound when applicable: <code><span class='Value'>𝕨𝕩𝕤</span></code> if arguments are used, <code><span class='Value'>𝕨</span></code> if there is a left argument, <code><span class='Value'>𝕗𝕘</span></code> if operands are used, and <code><span class='Modifier2'>_</span><span class='Value'>𝕣</span></code> and <code><span class='Modifier2'>_</span><span class='Value'>𝕣</span><span class='Modifier2'>_</span></code> for modifiers and combinators, respectively. Any names in the header are also bound, allowing multiple assignment for arguments.</p>
<p>If there is no left argument, but the <code><span class='Function'>BODY</span></code> contains <code><span class='Value'>𝕨</span></code> at the top level, then it is conceptually re-parsed with <code><span class='Value'>𝕨</span></code> replaced by <code><span class='Nothing'>·</span></code> to give a monadic version before application. As the only effect when this re-parsed form is valid is to change some instances of <code><span class='Value'>arg</span></code> to <code><span class='Value'>nothing</span></code>, this can be achieved efficiently by annotating parts of the AST that depend on <code><span class='Value'>𝕨</span></code> as conditionally-nothing. However, it also causes an error if <code><span class='Value'>𝕨</span></code> is used as an operand or list element, where <code><span class='Value'>nothing</span></code> is not allowed by the grammar.</p>
<h3 id="assignment">Assignment</h3>
-<p>An <em>assignment</em> is one of the four rules containing <code><span class='Function'>ASGN</span></code>. It is evaluated by first evaluating the right-hand-side <code><span class='Value'>subExpr</span></code>, <code><span class='Function'>FuncExpr</span></code>, <code><span class='Modifier'>_m1Expr</span></code>, or <code><span class='Modifier2'>_m2Exp_</span></code> expression, and then storing the result in the left-hand-side identifier or identifiers. The result of the assignment expression is the result of its right-hand side. Except for subjects, only a lone identifier is allowed on the left-hand side and storage is obvious. For subjects, <em>multiple assignment</em> with a list left-hand side is also allowed. Multiple assignment is performed recursively by assigning right-hand-side values to the left-hand-side targets, with single-identifier (<code><span class='Value'>s</span></code>) assignment as the base case. When matching the right-hand side to a list left-hand side, the left hand side is treated as a list of <code><span class='Value'>lhs</span></code> targets. The evaluated right-hand side must be a list (rank-1 array) of the same length, and is matched to these targets element-wise.</p>
+<p>An <em>assignment</em> is one of the four rules containing <code><span class='Function'>ASGN</span></code>. It is evaluated by first evaluating the right-hand-side <code><span class='Value'>subExpr</span></code>, <code><span class='Function'>FuncExpr</span></code>, <code><span class='Modifier'>_m1Expr</span></code>, or <code><span class='Modifier2'>_m2Exp_</span></code> expression, and then storing the result in the left-hand-side identifier or identifiers. The result of the assignment expression is the result of its right-hand side. Except for subjects, only a lone identifier is allowed on the left-hand side and storage sets it equal to the result. For subjects, <em>multiple assignment</em> with a list left-hand side is also allowed. Multiple assignment is performed recursively by assigning right-hand-side values to the left-hand-side targets, with single-identifier (<code><span class='Value'>s</span></code>) assignment as the base case. When matching the right-hand side to a list left-hand side, the left hand side is treated as a list of <code><span class='Value'>lhs</span></code> targets. The evaluated right-hand side must be a list (rank-1 array) of the same length, and is matched to these targets element-wise.</p>
<p><em>Modified assignment</em> is the subject assignment rule <code><span class='Value'>lhs</span> <span class='Function'>Derv</span> <span class='String'>&quot;↩&quot;</span> <span class='Value'>subExpr</span></code>. In this case, <code><span class='Value'>lhs</span></code> should be evaluated as if it were a <code><span class='Value'>subExpr</span></code> (the syntax is a subset of <code><span class='Value'>subExpr</span></code>), and the result of the function application <code><span class='Value'>lhs</span> <span class='Function'>Derv</span> <span class='Value'>subExpr</span></code> should be assigned to <code><span class='Value'>lhs</span></code>, and is also the result of the modified assignment expression.</p>
<h3 id="expressions">Expressions</h3>
<p>We now give rules for evaluating an <code><span class='Value'>atom</span></code>, <code><span class='Function'>Func</span></code>, <code><span class='Modifier'>_mod1</span></code> or <code><span class='Modifier2'>_mod2_</span></code> expression (the possible options for <code><span class='Function'>ANY</span></code>). A literal or primitive <code><span class='Value'>sl</span></code>, <code><span class='Function'>Fl</span></code>, <code><span class='Modifier'>_ml</span></code>, or <code><span class='Modifier2'>_cl_</span></code> has a fixed value defined by the specification (<a href="literal.html">literals</a> and <a href="primitive.html">built-ins</a>). An identifier <code><span class='Value'>s</span></code>, <code><span class='Function'>F</span></code>, <code><span class='Modifier'>_m</span></code>, or <code><span class='Modifier2'>_c_</span></code> is evaluated by returning its value; because of the scoping rules it must have one when evaluated. A parenthesized expression such as <code><span class='String'>&quot;(&quot;</span> <span class='Modifier'>_modExpr</span> <span class='String'>&quot;)&quot;</span></code> simply returns the result of the interior expression. A braced construct such as <code><span class='Function'>BraceFunc</span></code> is defined by the evaluation of the statements it contains after all parameters are accepted. Finally, a list <code><span class='String'>&quot;⟨&quot;</span> <span class='Separator'>⋄</span><span class='Value'>?</span> <span class='Paren'>(</span> <span class='Paren'>(</span> <span class='Function'>EXPR</span> <span class='Separator'>⋄</span> <span class='Paren'>)</span><span class='Value'>*</span> <span class='Function'>EXPR</span> <span class='Separator'>⋄</span><span class='Value'>?</span> <span class='Paren'>)</span><span class='Value'>?</span> <span class='String'>&quot;⟩&quot;</span></code> or <code><span class='Function'>ANY</span> <span class='Paren'>(</span> <span class='String'>&quot;‿&quot;</span> <span class='Function'>ANY</span> <span class='Paren'>)</span><span class='Function'>+</span></code> consists grammatically of a list of expressions. To evaluate it, each expression is evaluated in source order and their results are placed as elements of a rank-1 array. The two forms have identical semantics but different punctuation.</p>
diff --git a/docs/spec/index.html b/docs/spec/index.html
index fc7941fb..44308925 100644
--- a/docs/spec/index.html
+++ b/docs/spec/index.html
@@ -8,7 +8,7 @@
<li><a href="token.html">Token formation</a></li>
<li><a href="literal.html">Literals</a></li>
<li><a href="grammar.html">Grammar</a></li>
-<li>Scoping</li>
+<li><a href="scope.html">Variable scoping</a></li>
<li><a href="evaluate.html">Evaluation semantics</a></li>
<li>Primitives (<a href="https://github.com/mlochbaum/BQN/blob/master/spec/reference.bqn">reference implementations</a>)</li>
</ul>
diff --git a/docs/spec/scope.html b/docs/spec/scope.html
new file mode 100644
index 00000000..22a05d0e
--- /dev/null
+++ b/docs/spec/scope.html
@@ -0,0 +1,21 @@
+<head><link href="../style.css" rel="stylesheet"/></head>
+<div class="nav"><a href="https://github.com/mlochbaum/BQN">BQN</a></div>
+<h1 id="specification-bqn-variable-scoping">Specification: BQN variable scoping</h1>
+<p>BQN uses lexical scoping for variables, where scopes correspond roughly to blocks, or pairs of curly braces separated by semicolons. At the top level in a scope, new variables are visible only after they are defined, but in the scopes it contains, all variables defined in that scope are visible. This system is specified more precisely below.</p>
+<p>A running BQN program manipulates variables during its <a href="evaluate.html">execution</a>, but it is important to distinguish these variables from the identifiers that refer to them. As defined in the <a href="token.html">tokenization rules</a>, an identifier is a particular kind of token found in a program's source code. The lexical scoping rules in this page define which identifiers are considered the same; these identifiers will refer to the same variables when the program is run. While each variable has only one identifier, an identifier can refer to any number of variables because a new variable is created for that identifier each time its containing scope is instantiated (that is, each time the contents of the block are evaluated).</p>
+<h2 id="identifier-equivalence-with-lexical-scoping">Identifier equivalence with lexical scoping</h2>
+<p>In this section the concept of an identifier's definition, a possibly different instance of that identifier, is specified. The definition determines when identifiers refer to the &quot;same thing&quot;. In concrete terms, identifiers with the same definition all manipulate the same variable in a particular instance of the definition's containing scope.</p>
+<p>A <em>scope</em> is a <code><span class='Function'>PROGRAM</span></code>, <code><span class='Value'>brSub</span></code>, <code><span class='Function'>FCase</span></code>, <code><span class='Function'>FMain</span></code>, <code><span class='Modifier'>_mCase</span></code>, <code><span class='Modifier'>_mMain</span></code>, <code><span class='Modifier2'>_cCase_</span></code>, or <code><span class='Modifier2'>_cMain_</span></code> node as defined by the BQN <a href="grammar.html">grammar</a>. An <em>identifier instance</em> is an <code><span class='Value'>s</span></code>, <code><span class='Function'>F</span></code>, <code><span class='Modifier'>_m</span></code>, or <code><span class='Modifier2'>_c_</span></code> node; its <em>containing scope</em> is the &quot;smallest&quot; scope that contains it—the scope that contains the identifier but not any other scopes containing the identifier. An identifier instance is <em>defined</em> when it is contained in the left hand side of an <code><span class='Gets'>←</span></code> assignment expression, that is, the leftmost component of one of the four grammatical rules with <code><span class='Function'>ASGN</span></code>, provided that the <code><span class='Function'>ASGN</span></code> node is <code><span class='String'>&quot;←&quot;</span></code>, or in a scope header, that is, a component immediately preceding <code><span class='String'>&quot;:&quot;</span></code>. Each identifier instance in a valid BQN program corresponds to exactly one such defined identifier, called its <em>definition</em>, and two instances are considered to refer to the same identifier if they have the same definition.</p>
+<p>Two identifier instances have the <em>same name</em> if their tokens, as strings, match after removing all underscores <code><span class='Modifier2'>_</span></code> and ignoring case (so that the letters a to z are equal to their uppercase equivalents A to Z for this comparison). However, instances with the same name are not necessarily the same identifier, as they must also have the same definition. A defined identifier is a <em>potential definition</em> of another identifier instance if the two have the same name, and either:</p>
+<ul>
+<li>The defined identifier's containing scope contains the other identifier's containing scope, or</li>
+<li>The two identifiers share the same containing scope, and the defined identifier comes first in program order as defined below, or</li>
+<li>The two identifiers are the same instance (a defined variable is its own definition).</li>
+</ul>
+<p>The definition for an identifier is chosen from the potential definitions based on their containing scopes: it is the one whose containing scope does not contain or match the containing scope of any other potential definition. If for any identifier there is no definition, then the program is not valid and results in an error. This can occur if the identifier has no potential definition, and also if two potential definitions appear in the same scope. In fact, under this scheme it is never valid to make two definitions with the same name at the top level of a single scope, because both definitions would be potential definitions for the one that comes second in program order. Both definitions have the same containing scope, and any potential definition must contain or match this scope, so no potential definition can be selected.</p>
+<p>The definition of <em>program order</em> for identifier tokens follows the order of BQN <a href="evaluate.html">execution</a>. It corresponds to the order of a particular traversal of the abstract syntax tree for a program. To find the relative ordering of two identifiers in a program, we consider the highest-depth node that they both belong to; in this node they must occur in different components, or that component would be a higher-depth node containing both of them. In most nodes, the program order goes from right to left: components further to the left come earlier in program order. The exceptions are <code><span class='Function'>PROGRAM</span></code>, <code><span class='Function'>BODY</span></code>, <code><span class='Value'>list</span></code>, <code><span class='Value'>subject</span></code> (for stranding), and body case (<code><span class='Function'>FCase</span></code>, <code><span class='Modifier'>_mCase</span></code>, <code><span class='Modifier2'>_cCase_</span></code>, <code><span class='Function'>FMain</span></code>, <code><span class='Modifier'>_mMain</span></code>, <code><span class='Modifier2'>_cMain_</span></code>, <code><span class='Value'>brSub</span></code>, <code><span class='Function'>BrFunc</span></code>, <code><span class='Modifier'>_brMod1</span></code>, and <code><span class='Modifier2'>_brMod2_</span></code>) nodes, in which program order goes in the opposite order, from left to right (some assignment target nodes also contain lists or strands, but their ordering is irrelevant because if two identifiers with the same name appear in such a list, then it can't be a definition).</p>
+<h2 id="variables">Variables</h2>
+<p>A <em>variable</em> is an entity that permits two operations: it can be <em>set</em> to a particular value, and its <em>value</em> can be obtained, resulting in the last value it was set to. When either operation is performed it is referred to as <em>accessing</em> the variable.</p>
+<p>When a body in a block is evaluated, a variable is created for each definition (that is, defined identifier instance) the body contains. Whenever another block—the block itself, not its contents—is evaluated during the execution of the block, it is linked to the currently-evaluating block, so that it will use the variables defined in this instance. These links are recursive, so that every instance of a block is linked to exactly one instance of each block that contains it. These links form a tree that is not necessarily related to the call stack of functions and modifiers. Using these links, the variable an identifier refers to is the one corresponding to that variable's definition in the linked instance of the containing scope for the definition.</p>
+<p>The first access to a variable must be made by its definition (this also means it sets the variable). If a different instance of its identifier accesses it first, then an error results. This can happen because every scope contained in a particular scope sees all the definitions it uses, and such a scope could be called before the definition is run. Because of conditional execution, this property must be checked at run time in general; however, in cases where it is possible to statically determine that a program will always violate it, a BQN instance can give an error at compile time rather than run time.</p>
+