VM documentation for new block/body layout

author: Marshall Lochbaum <mwlochbaum@gmail.com> 2021-07-13 21:33:41 -0400
committer: Marshall Lochbaum <mwlochbaum@gmail.com> 2021-07-13 21:33:41 -0400
commit: fe1f2dd52b99b7ad420ad1b11189255c81954f04 (patch)
tree: dd77a63f7ed6243c04b2cf2e0ed24d77f54ec471
parent: c2663b00019222b81d6b3a1bb1e7b9f2c2e84f5c (diff)
2 files changed, 31 insertions, 13 deletions
diff --git a/docs/implementation/vm.html b/docs/implementation/vm.html
index 1a5de057..109416ff 100644
--- a/docs/implementation/vm.html
+++ b/docs/implementation/vm.html
@@ -16,22 +16,30 @@
 <ul>
 <li>A bytecode sequence <code><span class='Value'>code</span></code></li>
 <li>A list <code><span class='Value'>consts</span></code> of constants that can be loaded</li>
-<li>A list <code><span class='Value'>blocks</span></code> of block information, described in the next section</li>
+<li>A list <code><span class='Value'>blocks</span></code> of per-block information, described in the next section</li>
+<li>A list <code><span class='Value'>bodies</span></code> of per-body information, described in the section after</li>
 <li>Optionally, source locations for each instruction</li>
 <li>Optionally, tokenization information</li>
 </ul>
-<h3 id="blocks">Blocks</h3>
-<p>Each block in <code><span class='Value'>blocks</span></code> is a list of the following properties:</p>
+<h4 id="blocks">Blocks</h4>
+<p>Each entry in <code><span class='Value'>blocks</span></code> is a list of the following properties:</p>
 <ul>
 <li>Block type: (0) function/immediate, (1) 1-modifier, (2) 2-modifier</li>
 <li>Block immediateness: (1) immediate or (0) deferred</li>
-<li>Block starting index in <code><span class='Value'>code</span></code></li>
+<li>Index or indices in <code><span class='Value'>bodies</span></code></li>
+</ul>
+<p>Compilation separates blocks so that they are not nested in bytecode. A block consists of bodies, so that all compiled code is contained in some body of a block. The self-hosted compiler compiles the entire program into an immediate block, and the program is run by evaluating this block. Bodies are terminated with a RETN or RETD instruction.</p>
+<p>When the block is evaluated depends on its type and immediateness. An immediate block (0,1) is evaluated as soon as it is pushed; a function (0,0) is evaluated when called on arguments, an immediate modifier (1 or 2, 1) is evaluated when called on operands, and a deferred modifier (1 or 2, 0) creates a derived function when called on operands and is evaluated when this derived function is called on arguments.</p>
+<p>The last property can be a single number, or, if it's a deferred block, might be a pair of lists. For a single number the block is always evaluated by evaluating the body with the given index. For a pair, the first element gives the monadic case and the second the dyadic one. A given valence should begin at the first body in the appropriate list, moving to the next one if a header test (SETH instruction) fails.</p>
+<h4 id="bodies">Bodies</h4>
+<p>Bodies in a block are separated by <code><span class='Value'>;</span></code>. Each entry in <code><span class='Value'>bodies</span></code> is a list containing:</p>
+<ul>
+<li>Starting index in <code><span class='Value'>code</span></code></li>
 <li>Number of variables the block needs to allocate</li>
 <li>Variable names, as indices into the program's symbol list</li>
 <li>A mask indicating which variables are exported</li>
 </ul>
-<p>Compilation separates blocks so that they are not nested in bytecode. All compiled code is contained in some block. The self-hosted compiler compiles the entire program into an immediate block, and the program is run by evaluating this block. Blocks are terminated with the RETN instruction.</p>
-<p>The starting index refers to the position where execution starts in order to evaluate the block. When the block is evaluated depends on its type and immediateness. An immediate block (0,1) is evaluated as soon as it is pushed; a function (0,0) is evaluated when called on arguments, an immediate modifier (1 or 2, 1) is evaluated when called on operands, and a deferred modifier (1 or 2, 0) creates a derived function when called on operands and is evaluated when this derived function is called on arguments.</p>
+<p>The starting index refers to the position in bytecode where execution starts in order to evaluate the block. Different bodies will always have the same set of special names, but the variables they define are unrelated, so of course they can have different counts. The given number of variables includes special names, but list of names and export mask don't.</p>
 <p>The program's symbol list is included in the tokenization information <code><span class='Value'>t</span></code>: it is <code><span class='Number'>0</span><span class='Function'>⊑</span><span class='Number'>2</span><span class='Function'>⊑</span><span class='Value'>t</span></code>. Since the entire program (the source code passed in one compiler call) uses this list, namespace field accesses can be performed with indices alone within a program. The symbol list is needed for cross-program access, for example if <code><span class='Function'>•BQN</span></code> returns a namespace.</p>
 <h3 id="instructions">Instructions</h3>
 <p>The following instructions are defined by dzaima/BQN. The ones emitted by the self-hosted BQN compiler are marked in the &quot;used&quot; column. Instructions marked <code><span class='Function'>NS</span></code> are used only in programs with namespaces, and so are not needed to support the compiler or self-hosted runtime.</p>
diff --git a/implementation/vm.md b/implementation/vm.md
index 2cb2f526..36e27185 100644
--- a/implementation/vm.md
+++ b/implementation/vm.md
@@ -19,23 +19,33 @@ dzaima/BQN can interpret bytecode or convert it to [JVM](https://en.wikipedia.or
 The complete bytecode for a program consists of the following:
 * A bytecode sequence `code`
 * A list `consts` of constants that can be loaded
-* A list `blocks` of block information, described in the next section
+* A list `blocks` of per-block information, described in the next section
+* A list `bodies` of per-body information, described in the section after
 * Optionally, source locations for each instruction
 * Optionally, tokenization information
 
-### Blocks
+#### Blocks
 
-Each block in `blocks` is a list of the following properties:
+Each entry in `blocks` is a list of the following properties:
 * Block type: (0) function/immediate, (1) 1-modifier, (2) 2-modifier
 * Block immediateness: (1) immediate or (0) deferred
-* Block starting index in `code`
+* Index or indices in `bodies`
+
+Compilation separates blocks so that they are not nested in bytecode. A block consists of bodies, so that all compiled code is contained in some body of a block. The self-hosted compiler compiles the entire program into an immediate block, and the program is run by evaluating this block. Bodies are terminated with a RETN or RETD instruction.
+
+When the block is evaluated depends on its type and immediateness. An immediate block (0,1) is evaluated as soon as it is pushed; a function (0,0) is evaluated when called on arguments, an immediate modifier (1 or 2, 1) is evaluated when called on operands, and a deferred modifier (1 or 2, 0) creates a derived function when called on operands and is evaluated when this derived function is called on arguments.
+
+The last property can be a single number, or, if it's a deferred block, might be a pair of lists. For a single number the block is always evaluated by evaluating the body with the given index. For a pair, the first element gives the monadic case and the second the dyadic one. A given valence should begin at the first body in the appropriate list, moving to the next one if a header test (SETH instruction) fails.
+
+#### Bodies
+
+Bodies in a block are separated by `;`. Each entry in `bodies` is a list containing:
+* Starting index in `code`
 * Number of variables the block needs to allocate
 * Variable names, as indices into the program's symbol list
 * A mask indicating which variables are exported
 
-Compilation separates blocks so that they are not nested in bytecode. All compiled code is contained in some block. The self-hosted compiler compiles the entire program into an immediate block, and the program is run by evaluating this block. Blocks are terminated with the RETN instruction.
-
-The starting index refers to the position where execution starts in order to evaluate the block. When the block is evaluated depends on its type and immediateness. An immediate block (0,1) is evaluated as soon as it is pushed; a function (0,0) is evaluated when called on arguments, an immediate modifier (1 or 2, 1) is evaluated when called on operands, and a deferred modifier (1 or 2, 0) creates a derived function when called on operands and is evaluated when this derived function is called on arguments.
+The starting index refers to the position in bytecode where execution starts in order to evaluate the block. Different bodies will always have the same set of special names, but the variables they define are unrelated, so of course they can have different counts. The given number of variables includes special names, but list of names and export mask don't.
 
 The program's symbol list is included in the tokenization information `t`: it is `0⊑2⊑t`. Since the entire program (the source code passed in one compiler call) uses this list, namespace field accesses can be performed with indices alone within a program. The symbol list is needed for cross-program access, for example if `•BQN` returns a namespace.
author	Marshall Lochbaum <mwlochbaum@gmail.com>	2021-07-13 21:33:41 -0400
committer	Marshall Lochbaum <mwlochbaum@gmail.com>	2021-07-13 21:33:41 -0400
commit	fe1f2dd52b99b7ad420ad1b11189255c81954f04 (patch)
tree	dd77a63f7ed6243c04b2cf2e0ed24d77f54ec471
parent	c2663b00019222b81d6b3a1bb1e7b9f2c2e84f5c (diff)