aboutsummaryrefslogtreecommitdiff
path: root/docs/implementation/vm.html
blob: 07fd6e95e32eeb8146b24d94e8ae3fb9232799a0 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
<head>
  <link href="../favicon.ico" rel="shortcut icon" type="image/x-icon"/>
  <link href="../style.css" rel="stylesheet"/>
  <title>The BQN virtual machine and runtime</title>
</head>
<div class="nav">(<a href="https://github.com/mlochbaum/BQN">github</a>) / <a href="../index.html">BQN</a> / <a href="index.html">implementation</a></div>
<h1 id="the-bqn-virtual-machine-and-runtime"><a class="header" href="#the-bqn-virtual-machine-and-runtime">The BQN virtual machine and runtime</a></h1>
<p>BQN's self-hosted compiler and runtime mean that only a small amount of native code is needed to run BQN on any given platform. The <a href="https://github.com/mlochbaum/BQN/blob/master/implementation/../docs/bqn.js">Javascript environment</a> requires about 600 lines of Javascript code including system functions and performance improvements; probably around 250 would be required just to run the core language. This makes it fairly easy to port BQN to new platforms, allowing BQN to be <a href="../doc/embed.html">embedded</a> within other programming languages and interact with arrays or functions in those languages.</p>
<p>There's a short <a href="https://www.youtube.com/watch?v=FxU5tZZ1gNc">video introduction</a> to the VM architecture thanks to Asher Mancinelli.</p>
<p>The way data is represented is part of the VM implementation: it can use native arrays or a custom data structure, depending on what the language supports. An initial implementation will be very slow, but can be improved by replacing functions from the BQN-based runtime with native code. As the VM system can be hard to work with if you're not familiar with it, I advise you to contact me to discuss implementing a VM if you are interested.</p>
<h2 id="bytecode"><a class="header" href="#bytecode">Bytecode</a></h2>
<p>The BQN implementation here and dzaima/BQN share a stack-based object code format used to represent compiled code. This format is a list of numbers of unspecified precision (small precision will limit the length of list literals and number of locals per block, blocks, and constants). Previously it was encoded as bytes with the <a href="https://en.wikipedia.org/wiki/LEB128">LEB128</a> format; while it no longer has anything to do with bytes it's called a &quot;bytecode&quot; because this is shorter than &quot;object code&quot;.</p>
<p>The self-hosted compiler uses a simpler, and less capable, format for block and variable data than dzaima/BQN. Only this format is described here; <a href="https://github.com/mlochbaum/BQN/blob/master/implementation/../dc.bqn">dc.bqn</a> adapts it to be compatible with dzaima/BQN.</p>
<p>dzaima/BQN can interpret bytecode or convert it to <a href="https://en.wikipedia.org/wiki/Java_virtual_machine">JVM</a> bytecode, while the Javascript VM previously interpreted bytecode but now always compiles it. Since interpretation is a simpler strategy, it may be helpful to use the <a href="https://github.com/mlochbaum/BQN/blob/f74d9223ef880f2914030c2375f680dcc7e8c92b/bqn.js#L36">old Javascript bytecode interpreter</a> as a reference (for bytecode execution only) when implementing a BQN virtual machine.</p>
<h3 id="components"><a class="header" href="#components">Components</a></h3>
<p>The complete bytecode for a program consists of the following:</p>
<ul>
<li>A bytecode sequence <code><span class='Value'>code</span></code></li>
<li>A list <code><span class='Value'>consts</span></code> of constants that can be loaded</li>
<li>A list <code><span class='Value'>blocks</span></code> of per-block information, described in the next section</li>
<li>A list <code><span class='Value'>bodies</span></code> of per-body information, described in the section after</li>
<li>Optionally, source locations for each instruction</li>
<li>Optionally, tokenization information</li>
</ul>
<h4 id="blocks"><a class="header" href="#blocks">Blocks</a></h4>
<p>Each entry in <code><span class='Value'>blocks</span></code> is a list of the following properties:</p>
<ul>
<li>Block type: (0) function/immediate, (1) 1-modifier, (2) 2-modifier</li>
<li>Block immediateness: (1) immediate or (0) deferred</li>
<li>Index or indices in <code><span class='Value'>bodies</span></code></li>
</ul>
<p>Compilation separates blocks so that they are not nested in bytecode. A block consists of bodies, so that all compiled code is contained in some body of a block. The self-hosted compiler compiles the entire program into an immediate block, and the program is run by evaluating this block. Bodies are terminated with a RETN or RETD instruction.</p>
<p>When the block is evaluated depends on its type and immediateness. An immediate block (0,1) is evaluated as soon as it is pushed; a function (0,0) is evaluated when called on arguments, an immediate modifier (1 or 2, 1) is evaluated when called on operands, and a deferred modifier (1 or 2, 0) creates a derived function when called on operands and is evaluated when this derived function is called on arguments.</p>
<p>The last property can be a single number or a list of lists. A single number indicates the body to be executed, and is used only for blocks with exactly one body. If it's a list of lists, the length is 1 for a block without arguments and 2 or more for a block with arguments (function or deferred modifier). Each element is a list of body indices. After selecting the appropriate list, execution begins at the first body in the appropriate list, moving to the next one if a header test (SETH or PRED instruction) fails. If a test fails but there's no next body, block evaluation is an error.</p>
<p>The five possible cases for a function are monadic, dyadic, inverse monadic (<code><span class='Function'>π•Š</span><span class='Modifier'>⁼</span><span class='Value'>x</span></code>), inverse dyadic (<code><span class='Value'>w</span><span class='Function'>π•Š</span><span class='Modifier'>⁼</span><span class='Value'>x</span></code>), and swapped-inverse dyadic (<code><span class='Value'>x</span><span class='Function'>π•Š</span><span class='Modifier'>˜⁼</span><span class='Value'>w</span></code>). The first two will always be provided, while the remaining three typically don't exist as they have to be specified with undo headers. The smallest length that covers all possible cases will be used.</p>
<h4 id="bodies"><a class="header" href="#bodies">Bodies</a></h4>
<p>Bodies in a block are separated by <code><span class='Value'>;</span></code>. Each entry in <code><span class='Value'>bodies</span></code> is a list containing:</p>
<ul>
<li>Starting index in <code><span class='Value'>code</span></code></li>
<li>Number of variables the block needs to allocate</li>
<li>Variable names, as indices into the program's symbol list</li>
<li>A mask indicating which variables are exported</li>
</ul>
<p>The starting index refers to the position in bytecode where execution starts in order to evaluate the block. Different bodies will always have the same set of special names, but the variables they define are unrelated, so of course they can have different counts. The given number of variables includes special names, but list of names and export mask don't.</p>
<p>The program's symbol list is included in the tokenization information <code><span class='Value'>t</span></code>: it is <code><span class='Number'>0</span><span class='Function'>βŠ‘</span><span class='Number'>2</span><span class='Function'>βŠ‘</span><span class='Value'>t</span></code>. Since the entire program (the source code passed in one compiler call) uses this list, namespace field accesses can be performed with indices alone within a program. The symbol list is needed for cross-program access, for example if <code><span class='Function'>β€’BQN</span></code> returns a namespace.</p>
<h3 id="instructions"><a class="header" href="#instructions">Instructions</a></h3>
<p>The following instructions are defined by dzaima/BQN. The ones emitted by the self-hosted BQN compiler are marked in the &quot;used&quot; column. Instructions marked <code><span class='Function'>NS</span></code> are used only in programs with namespaces, and so are not needed to support the compiler or self-hosted runtime. Similarly, <code><span class='Function'>SETH</span></code> is only needed in programs with destructuring headers.</p>
<table>
<thead>
<tr>
<th align="right">B</th>
<th>Name</th>
<th align="center">Used</th>
<th align="right">Like</th>
<th align="left">Args</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td align="right">00</td>
<td>PUSH</td>
<td align="center">X</td>
<td align="right"></td>
<td align="left"><code><span class='Function'>I</span></code></td>
<td>Push object <code><span class='Function'>I</span></code></td>
</tr>
<tr>
<td align="right">01</td>
<td>DFND</td>
<td align="center">X</td>
<td align="right"></td>
<td align="left"><code><span class='Function'>I</span></code></td>
<td>Localize and push block <code><span class='Function'>I</span></code></td>
</tr>
<tr>
<td align="right">02</td>
<td>SYSV</td>
<td align="center"></td>
<td align="right">00</td>
<td align="left"><code><span class='Function'>I</span></code></td>
<td>Push dynamic system value <code><span class='Function'>I</span></code></td>
</tr>
<tr>
<td align="right">03</td>
<td></td>
<td align="center"></td>
<td align="right"></td>
<td align="left"><code><span class='Function'>D</span></code></td>
<td>Push return function for lexical depth <code><span class='Function'>D</span></code></td>
</tr>
<tr>
<td align="right">06</td>
<td>POPS</td>
<td align="center">X</td>
<td align="right"></td>
<td align="left"></td>
<td>Pop and discard top of stack</td>
</tr>
<tr>
<td align="right">07</td>
<td>RETN</td>
<td align="center">X</td>
<td align="right"></td>
<td align="left"></td>
<td>Returns top of stack</td>
</tr>
<tr>
<td align="right">08</td>
<td>RETD</td>
<td align="center">NS</td>
<td align="right">07</td>
<td align="left"></td>
<td>Return the running scope's namespace</td>
</tr>
<tr>
<td align="right">0B</td>
<td>ARRO</td>
<td align="center">X</td>
<td align="right"></td>
<td align="left"><code><span class='Function'>N</span></code></td>
<td>Create length-<code><span class='Function'>N</span></code> list</td>
</tr>
<tr>
<td align="right">0C</td>
<td>ARRM</td>
<td align="center">X</td>
<td align="right">0B</td>
<td align="left"><code><span class='Function'>N</span></code></td>
<td>Create length-<code><span class='Function'>N</span></code> reference list</td>
</tr>
<tr>
<td align="right">0E</td>
<td></td>
<td align="center"></td>
<td align="right"></td>
<td align="left"></td>
<td>Merge top of stack (for <code><span class='Value'>[]</span></code>)</td>
</tr>
<tr>
<td align="right">10</td>
<td>FN1C</td>
<td align="center">X</td>
<td align="right"></td>
<td align="left"></td>
<td>Monadic function call</td>
</tr>
<tr>
<td align="right">11</td>
<td>FN2C</td>
<td align="center">X</td>
<td align="right"></td>
<td align="left"></td>
<td>Dyadic function call</td>
</tr>
<tr>
<td align="right">12</td>
<td>FN1O</td>
<td align="center">X</td>
<td align="right">10</td>
<td align="left"></td>
<td>Monadic call, checking for <code><span class='Nothing'>Β·</span></code></td>
</tr>
<tr>
<td align="right">13</td>
<td>FN2O</td>
<td align="center">X</td>
<td align="right">11</td>
<td align="left"></td>
<td>Dyadic call, checking for <code><span class='Nothing'>Β·</span></code></td>
</tr>
<tr>
<td align="right">14</td>
<td>TR2D</td>
<td align="center">X</td>
<td align="right"></td>
<td align="left"></td>
<td>Create 2-train</td>
</tr>
<tr>
<td align="right">15</td>
<td>TR3D</td>
<td align="center">X</td>
<td align="right"></td>
<td align="left"></td>
<td>Create 3-train</td>
</tr>
<tr>
<td align="right">16</td>
<td>CHKV</td>
<td align="center">X</td>
<td align="right"></td>
<td align="left"></td>
<td>Error if top of stack is <code><span class='Nothing'>Β·</span></code></td>
</tr>
<tr>
<td align="right">17</td>
<td>TR3O</td>
<td align="center">X</td>
<td align="right">15</td>
<td align="left"></td>
<td>Create 3-train, checking for <code><span class='Nothing'>Β·</span></code></td>
</tr>
<tr>
<td align="right">1A</td>
<td>MD1C</td>
<td align="center">X</td>
<td align="right"></td>
<td align="left"></td>
<td>1-modifier call</td>
</tr>
<tr>
<td align="right">1B</td>
<td>MD2C</td>
<td align="center">X</td>
<td align="right"></td>
<td align="left"></td>
<td>2-modifier call</td>
</tr>
<tr>
<td align="right">1C</td>
<td>MD2L</td>
<td align="center"></td>
<td align="right"></td>
<td align="left"></td>
<td>Bind left operand to 2-modifier</td>
</tr>
<tr>
<td align="right">1D</td>
<td>MD2R</td>
<td align="center"></td>
<td align="right"></td>
<td align="left"></td>
<td>Bind right operand to 2-modifier</td>
</tr>
<tr>
<td align="right">20</td>
<td>VARO</td>
<td align="center">X</td>
<td align="right"></td>
<td align="left"><code><span class='Function'>D</span></code>, <code><span class='Function'>I</span></code></td>
<td>Push local variable <code><span class='Function'>I</span></code> from <code><span class='Function'>D</span></code> frames up</td>
</tr>
<tr>
<td align="right">21</td>
<td>VARM</td>
<td align="center">X</td>
<td align="right"></td>
<td align="left"><code><span class='Function'>D</span></code>, <code><span class='Function'>I</span></code></td>
<td>Push local variable reference <code><span class='Function'>I</span></code> from <code><span class='Function'>D</span></code> frames up</td>
</tr>
<tr>
<td align="right">22</td>
<td>VARU</td>
<td align="center">X</td>
<td align="right">21</td>
<td align="left"><code><span class='Function'>D</span></code>, <code><span class='Function'>I</span></code></td>
<td>Push and clear local variable <code><span class='Function'>I</span></code> from <code><span class='Function'>D</span></code> frames up</td>
</tr>
<tr>
<td align="right">26</td>
<td>DYNO</td>
<td align="center"></td>
<td align="right"></td>
<td align="left"><code><span class='Function'>I</span></code></td>
<td>Push named variable <code><span class='Function'>I</span></code></td>
</tr>
<tr>
<td align="right">27</td>
<td>DYNM</td>
<td align="center"></td>
<td align="right"></td>
<td align="left"><code><span class='Function'>I</span></code></td>
<td>Push named variable <code><span class='Function'>I</span></code> reference</td>
</tr>
<tr>
<td align="right">2A</td>
<td>PRED</td>
<td align="center"></td>
<td align="right">06</td>
<td align="left"></td>
<td>Check predicate and drop</td>
</tr>
<tr>
<td align="right">2B</td>
<td>VFYM</td>
<td align="center">X</td>
<td align="right"></td>
<td align="left"></td>
<td>Convert constant to matcher (for headers)</td>
</tr>
<tr>
<td align="right">2C</td>
<td>NOTM</td>
<td align="center">X</td>
<td align="right"></td>
<td align="left"></td>
<td>Push placeholder assignment matcher</td>
</tr>
<tr>
<td align="right">2F</td>
<td>SETH</td>
<td align="center">X</td>
<td align="right">30</td>
<td align="left"></td>
<td>Test and set header</td>
</tr>
<tr>
<td align="right">30</td>
<td>SETN</td>
<td align="center">X</td>
<td align="right"></td>
<td align="left"></td>
<td>Define variable</td>
</tr>
<tr>
<td align="right">31</td>
<td>SETU</td>
<td align="center">X</td>
<td align="right"></td>
<td align="left"></td>
<td>Change variable</td>
</tr>
<tr>
<td align="right">32</td>
<td>SETM</td>
<td align="center">X</td>
<td align="right"></td>
<td align="left"></td>
<td>Modify variable</td>
</tr>
<tr>
<td align="right">33</td>
<td>SETC</td>
<td align="center">X</td>
<td align="right"></td>
<td align="left"></td>
<td>Monadic modify variable</td>
</tr>
<tr>
<td align="right">40</td>
<td>FLDO</td>
<td align="center">NS</td>
<td align="right"></td>
<td align="left"><code><span class='Function'>I</span></code></td>
<td>Read field <code><span class='Function'>I</span></code> from namespace</td>
</tr>
<tr>
<td align="right">41</td>
<td>FLDM</td>
<td align="center"></td>
<td align="right">40</td>
<td align="left"><code><span class='Function'>I</span></code></td>
<td>Push mutable field <code><span class='Function'>I</span></code> from namespace</td>
</tr>
<tr>
<td align="right">42</td>
<td>ALIM</td>
<td align="center">NS</td>
<td align="right"></td>
<td align="left"><code><span class='Function'>I</span></code></td>
<td>Mutable target aliases field <code><span class='Function'>I</span></code></td>
</tr>
</tbody>
</table>
<p>Stack effects for most instructions are given below. Instructions <code><span class='Function'>FN1O</span></code>, <code><span class='Function'>FN2O</span></code>, and <code><span class='Function'>TR3O</span></code> are identical to <code><span class='Function'>FN1C</span></code>, <code><span class='Function'>FN2C</span></code>, and <code><span class='Function'>TR3D</span></code> except that the indicated values in the higher-numbered instructions may be <code><span class='Nothing'>Β·</span></code>. The non-checking instructions can be implemented using the checking ones, but avoiding the check could improve speed. <code><span class='Function'>VARU</span></code> is identical to <code><span class='Function'>VARM</span></code> but indicates that the local variable's value will never be used again, which may be useful for optimization.</p>
<table>
<thead>
<tr>
<th align="right">B</th>
<th>Name</th>
<th>Stack effect</th>
<th>Comments</th>
</tr>
</thead>
<tbody>
<tr>
<td align="right">00</td>
<td>PUSH</td>
<td><code><span class='Gets'>β†’</span> <span class='Paren'>(</span><span class='Value'>i</span><span class='Function'>βŠ‘</span><span class='Value'>consts</span><span class='Paren'>)</span></code></td>
<td></td>
</tr>
<tr>
<td align="right">01</td>
<td>DFND</td>
<td><code><span class='Gets'>β†’</span> <span class='Paren'>(</span><span class='Value'>i</span><span class='Function'>βŠ‘</span><span class='Value'>blocks</span><span class='Paren'>)</span></code></td>
<td>Also sets block's parent scope</td>
</tr>
<tr>
<td align="right">06</td>
<td>POPS</td>
<td><code><span class='Value'>x</span> <span class='Gets'>β†’</span></code></td>
<td></td>
</tr>
<tr>
<td align="right">07</td>
<td>RETN</td>
<td><code><span class='Value'>x</span> <span class='Gets'>β†’</span> <span class='Value'>x</span></code></td>
<td>Returns from current block</td>
</tr>
<tr>
<td align="right">08</td>
<td>RETD</td>
<td><code><span class='Value'>x?</span> <span class='Gets'>β†’</span> <span class='Value'>n</span></code></td>
<td>Clears stack, dropping 0 or 1 value</td>
</tr>
<tr>
<td align="right">0B</td>
<td>ARRO</td>
<td><code><span class='Value'>x0</span> <span class='Value'>…</span> <span class='Value'>xm</span> <span class='Gets'>β†’</span> <span class='Bracket'>⟨</span><span class='Value'>x0</span> <span class='Value'>…</span> <span class='Value'>xm</span><span class='Bracket'>⟩</span></code></td>
<td><code><span class='Function'>N</span></code> total variables (<code><span class='Value'>m</span><span class='Function'>=</span><span class='Value'>n</span><span class='Function'>-</span><span class='Number'>1</span></code>)</td>
</tr>
<tr>
<td align="right">10</td>
<td>FN1C</td>
<td><code><span class='Value'>𝕩</span> <span class='Value'>𝕀</span> <span class='Gets'>β†’</span> <span class='Paren'>(</span><span class='Function'>π•Š</span> <span class='Value'>𝕩</span><span class='Paren'>)</span></code></td>
<td>12: <code><span class='Value'>𝕩</span></code> may be <code><span class='Nothing'>Β·</span></code></td>
</tr>
<tr>
<td align="right">11</td>
<td>FN2C</td>
<td><code><span class='Value'>𝕩</span> <span class='Value'>𝕀</span> <span class='Value'>𝕨</span> <span class='Gets'>β†’</span> <span class='Paren'>(</span><span class='Value'>𝕨</span> <span class='Function'>π•Š</span> <span class='Value'>𝕩</span><span class='Paren'>)</span></code></td>
<td>13: <code><span class='Value'>𝕨</span></code> or <code><span class='Value'>𝕩</span></code> may be <code><span class='Nothing'>Β·</span></code></td>
</tr>
<tr>
<td align="right">14</td>
<td>TR2D</td>
<td><code><span class='Value'>h</span> <span class='Value'>g</span> <span class='Gets'>β†’</span> <span class='Paren'>(</span><span class='Function'>G</span> <span class='Function'>H</span><span class='Paren'>)</span></code></td>
<td></td>
</tr>
<tr>
<td align="right">15</td>
<td>TR3D</td>
<td><code><span class='Value'>h</span> <span class='Value'>g</span> <span class='Value'>f</span> <span class='Gets'>β†’</span> <span class='Paren'>(</span><span class='Function'>F</span> <span class='Function'>G</span> <span class='Function'>H</span><span class='Paren'>)</span></code></td>
<td>17: <code><span class='Function'>F</span></code> may be <code><span class='Nothing'>Β·</span></code></td>
</tr>
<tr>
<td align="right">1A</td>
<td>MD1C</td>
<td><code><span class='Value'>𝕣</span> <span class='Value'>𝕗</span> <span class='Gets'>β†’</span> <span class='Paren'>(</span><span class='Function'>𝔽</span> <span class='Modifier'>_𝕣</span><span class='Paren'>)</span></code></td>
<td></td>
</tr>
<tr>
<td align="right">1B</td>
<td>MD2C</td>
<td><code><span class='Value'>π•˜</span> <span class='Value'>𝕣</span> <span class='Value'>𝕗</span> <span class='Gets'>β†’</span> <span class='Paren'>(</span><span class='Function'>𝔽</span> <span class='Modifier2'>_𝕣_</span> <span class='Function'>𝔾</span><span class='Paren'>)</span></code></td>
<td></td>
</tr>
<tr>
<td align="right">1C</td>
<td>MD2L</td>
<td><code><span class='Value'>𝕣</span> <span class='Value'>𝕗</span> <span class='Gets'>β†’</span> <span class='Paren'>(</span><span class='Value'>𝕗</span> <span class='Modifier2'>_𝕣_</span><span class='Paren'>)</span></code></td>
<td></td>
</tr>
<tr>
<td align="right">1D</td>
<td>MD2R</td>
<td><code><span class='Value'>π•˜</span> <span class='Value'>𝕣</span> <span class='Gets'>β†’</span> <span class='Paren'>(</span><span class='Modifier2'>_𝕣_</span> <span class='Value'>π•˜</span><span class='Paren'>)</span></code></td>
<td></td>
</tr>
<tr>
<td align="right">20</td>
<td>VARO</td>
<td><code><span class='Gets'>β†’</span> <span class='Value'>x</span></code></td>
<td>Local variable value</td>
</tr>
<tr>
<td align="right">21</td>
<td>VARM</td>
<td><code><span class='Gets'>β†’</span> <span class='Value'>r</span></code></td>
<td>Local variable reference</td>
</tr>
<tr>
<td align="right">2B</td>
<td>VFYM</td>
<td><code><span class='Value'>c</span> <span class='Gets'>β†’</span> <span class='Value'>r</span></code></td>
<td>Constant to match reference</td>
</tr>
<tr>
<td align="right">2B</td>
<td>NOTM</td>
<td><code><span class='Gets'>β†’</span> <span class='Value'>r</span></code></td>
<td>Constant to not-reference</td>
</tr>
<tr>
<td align="right">30</td>
<td>SETN</td>
<td><code><span class='Value'>x</span> <span class='Value'>r</span> <span class='Gets'>β†’</span> <span class='Paren'>(</span><span class='Value'>r</span><span class='Gets'>←</span><span class='Value'>x</span><span class='Paren'>)</span></code></td>
<td><code><span class='Value'>r</span></code> is a reference; 2F: no result</td>
</tr>
<tr>
<td align="right">31</td>
<td>SETU</td>
<td><code><span class='Value'>x</span> <span class='Value'>r</span> <span class='Gets'>β†’</span> <span class='Paren'>(</span><span class='Value'>r</span><span class='Gets'>↩</span><span class='Value'>x</span><span class='Paren'>)</span></code></td>
<td><code><span class='Value'>r</span></code> is a reference</td>
</tr>
<tr>
<td align="right">32</td>
<td>SETM</td>
<td><code><span class='Value'>x</span> <span class='Value'>f</span> <span class='Value'>r</span> <span class='Gets'>β†’</span> <span class='Paren'>(</span><span class='Value'>r</span> <span class='Function'>F</span><span class='Gets'>↩</span> <span class='Value'>x</span><span class='Paren'>)</span></code></td>
<td><code><span class='Value'>r</span></code> is a reference</td>
</tr>
<tr>
<td align="right">33</td>
<td>SETC</td>
<td><code><span class='Value'>f</span> <span class='Value'>r</span> <span class='Gets'>β†’</span> <span class='Paren'>(</span><span class='Value'>r</span> <span class='Function'>F</span><span class='Gets'>↩</span><span class='Paren'>)</span></code></td>
<td><code><span class='Value'>r</span></code> is a reference</td>
</tr>
<tr>
<td align="right">40</td>
<td>FLDO</td>
<td><code><span class='Value'>n</span> <span class='Gets'>β†’</span> <span class='Value'>n.i</span></code></td>
<td></td>
</tr>
<tr>
<td align="right">42</td>
<td>ALIM</td>
<td><code><span class='Value'>r</span> <span class='Gets'>β†’</span> <span class='Value'>s</span></code></td>
<td>Reference <code><span class='Value'>s</span></code> gets field <code><span class='Value'>i</span></code> and puts in <code><span class='Value'>r</span></code></td>
</tr>
</tbody>
</table>
<p>Many instructions just call functions or modifiers or otherwise have fairly obvious implementations. Instructions to handle variables and blocks are more complicated (although very typical of bytecode representations for lexically-scoped languages) and are described in more detail below.</p>
<h3 id="local-variables-dfnd-varo-varu-varm-retn"><a class="header" href="#local-variables-dfnd-varo-varu-varm-retn">Local variables: DFND VARO VARU VARM RETN</a></h3>
<p>The bytecode representation is designed with the assumption that variables will be stored in frames, one for each time an instance of a block is run. dzaima/BQN has facilities to give frame slots names, in order to support dynamic execution, but self-hosted BQN doesn't. A new frame is created when the block is evaluated (see <a href="#blocks">#blocks</a>) and in general has to be cleaned up by garbage collection, because a lexical closure might need to refer to the frame even after the corresponding block finishes. Lexical closures can form loops, so simple reference counting can leak memory, but it could be used in addition to less frequent tracing garbage collection or another strategy.</p>
<p>A frame is a mutable list of <em>slots</em> for variable values. It has slots for any special names that are available during the blocks execution followed by the local variables it defines. Special names use the ordering <code><span class='Value'>π•€π•©π•¨π•£π•—π•˜</span></code>; the first three of these are available in non-immediate blocks while <code><span class='Value'>𝕣</span></code> and <code><span class='Value'>𝕗</span></code> are available in modifiers and <code><span class='Value'>π•˜</span></code> in 2-modifiers specifically.</p>
<p>When a block is pushed with <strong>DFND</strong>, an instance of the block is created, with its <em>parent frame</em> set to be the frame of the currently-executing block. Setting the parent frame when the block is first seen, instead of when it's evaluated, is what distinguishes lexical from dynamic scoping. If it's an immediate block, it's evaluated immediately, and otherwise it's pushed onto the stack. When the block is evaluated, its frame is initialized using any arguments passed to it, the next instruction's index is pushed onto the return stack, and execution moves to the first instruction in the block. When the RETN instruction is encountered, an index is popped from the return stack and execution returns to this location. As an alternative to maintaining an explicit return stack, a block can be implemented as a native function that creates a new execution stack and returns the value in it when the <strong>RETN</strong> instruction is reached. This approach uses the implementation language's call stack for the return stack.</p>
<p>Local variables are manipulated with the <strong>VARO</strong> (or <strong>VARU</strong>) and <strong>VARM</strong> instructions, which load the value of a variable and a reference to it (see the next section) respectively. These instructions reference variables by <em>frame depth</em> and <em>slot index</em>. The frame depth indicates in which frame the variable is found: the current frame has depth 0, its block's parent frame has depth 1, and so on. The slot index is an index within that frame.</p>
<p>Slots should be initialized with some indication they are not yet defined. The variable can be defined with SETN only if it hasn't been defined yet, and can be accessed with VARO or VARU or modified with SETU, SETM, or SETC only if it <em>has</em> been defined.</p>
<h3 id="variable-references-arrm-varm-setn-setu-setm-setc"><a class="header" href="#variable-references-arrm-varm-setn-setu-setm-setc">Variable references: ARRM VARM SETN SETU SETM SETC</a></h3>
<p>A <em>variable reference</em> indicates a particular frame slot in a way that's independent of the execution context. For example, it could be a pointer to the slot, or a reference to the frame along with the index of the slot. <strong>VARM</strong> pushes a variable reference to the stack.</p>
<p>A <em>reference list</em> is a list of variable references or reference lists. It's created with the <strong>ARRM</strong> instruction. In the Javascript VM there's no difference between a reference list and an ordinary BQN list other than the contents.</p>
<p>The <strong>SETN</strong>, <strong>SETU</strong>, <strong>SETM</strong>, and <strong>SETC</strong> instructions set a value for a reference. If the reference is to a variable, they simply set its value. For a reference list, the value needs to be destructured. It must be a list of the same length, and each reference in the reference list is set to the corresponding element of the value list.</p>
<p><strong>SETM</strong> and <strong>SETC</strong> additionally need to get the current value of a reference. For a variable reference this is its current value (with an error if it's not defined yet); for a reference list it's a list of the values of each reference in the list.</p>
<table>
<thead>
<tr>
<th>Opcode</th>
<th>Slot must be</th>
<th>Reads value first</th>
</tr>
</thead>
<tbody>
<tr>
<td>SETN</td>
<td>Unset</td>
<td></td>
</tr>
<tr>
<td>SETU</td>
<td>Set</td>
<td></td>
</tr>
<tr>
<td>SETM</td>
<td>Set</td>
<td>Yes</td>
</tr>
<tr>
<td>SETC</td>
<td>Set</td>
<td>Yes</td>
</tr>
</tbody>
</table>
<h3 id="bodies-seth-vfym-notm-pred"><a class="header" href="#bodies-seth-vfym-notm-pred">Bodies: SETH VFYM NOTM PRED</a></h3>
<p><strong>SETH</strong> is a modification of SETN for use in header destructuring. It differs in that it doesn't place its result on the stack (making it more like SETN followed by POPS), and that if the assignment fails because the reference and value don't conform then it moves on to the next eligible body in the block rather than giving an error. <strong>VFYM</strong> converts a BQN value <code><span class='Value'>c</span></code> to a special reference: assigning a value <code><span class='Value'>v</span></code> to it should check if <code><span class='Value'>v</span><span class='Function'>≑</span><span class='Value'>c</span></code> but do no assignment. Only SETH needs to accept these references, and it should treat non-matching values as failing assignment. <strong>NOTM</strong> also creates a special reference, but it takes no inputs and the reference has no effectβ€”it always counts as matching but performs no assignment.</p>
<p><strong>PRED</strong> drops the top value of the stack, but also checks whether it matches the number 1. If it does match, execution continues; if not, evaluation of the current body ends and evaluation moves to the next eligible body.</p>
<h3 id="namespaces-fldo-fldm-alim-retd"><a class="header" href="#namespaces-fldo-fldm-alim-retd">Namespaces: FLDO FLDM ALIM RETD</a></h3>
<p>A <em>namespace</em> is the collection of variables in a particular scope, along with an association mapping some exported <em>symbols</em> (case- and underscore-normalized strings) to a subset of these. It can be represented using a frame for the variables, plus the variable name list and mask of exported variables from that block's properties in the bytecode. <strong>RETD</strong> finishes executing the block and returns the namespace for that execution.</p>
<p>The variable name list is split into a program-level list of names and a list of indices into these names: within-program field accesses can be done with the indices only while cross-program ones must use names. One way to check whether an access is cross-program is to compare the accessor's program-level name list with the one for the accessed namespace as references or pointers. Then a lookup should be performed as appropriate. A persistent lookup table is needed to make this efficient.</p>
<p><strong>FLDO</strong> reads a field from a namespace. The parameter <code><span class='Function'>I</span></code> is a program-level identifier index for this field. The VM must ensure that the field is exported, possibly by leaving unexported identifiers out of the namespace's lookup table. <strong>FLDM</strong> does the same but pushes a reference to the field, to be modified by assignment.</p>
<p><strong>ALIM</strong> is used for aliased assignment such as <code><span class='Bracket'>⟨</span><span class='Value'>a</span><span class='Ligature'>β€Ώ</span><span class='Value'>b</span><span class='Gets'>⇐</span><span class='Value'>c</span><span class='Bracket'>⟩</span><span class='Gets'>←</span><span class='Value'>ns</span></code>. It tags a reference with a namespace field, identified with a program-level index. A value assigned to the tagged reference must be a namespace. The relevant field is extracted, and then stored in the original reference.</p>
<h2 id="runtime"><a class="header" href="#runtime">Runtime</a></h2>
<p>Primitive functions and modifiers used in a program are stored in its <code><span class='Value'>consts</span></code> array. The compiler needs to be passed a <em>runtime</em> with the value of every primitive so that these functions and modifiers are available.</p>
<p>While it's perfectly possible to implement the runtime from scratch, the pseudo-BQN files <a href="https://github.com/mlochbaum/BQN/blob/master/implementation/../src/r0.bqn">r0.bqn</a> and <a href="https://github.com/mlochbaum/BQN/blob/master/implementation/../src/r1.bqn">r1.bqn</a> implement the full runtime in terms of a <em>core runtime</em> consisting of a smaller number of much simpler functions. <a href="https://github.com/mlochbaum/BQN/blob/master/implementation/../src/pr.bqn">pr.bqn</a> converts this file so that it can be compiled. It changes values in the core runtime to primitives and primitives to generated identifiers, so that the first 22 values in the output's <code><span class='Value'>consts</span></code> array are exactly the core runtime, and no other primitives are required. The result is a list of two elements: first the list of all primitive values, and then a function that can be called to pass in two additional core functions used for inferred properties.</p>
<p>The contents of a core runtime are given below. The names given are those used in r1.bqn; the environment only provides a list of values and therefore doesn't need to use the same names. For named functions a description is given. For primitives, the implementation should match the BQN specification for that primitive on the specified domain, or the entire domain if left empty. They won't be called outside that domain except if there are bugs in the BQN sources.</p>
<table>
<thead>
<tr>
<th align="right">Ind</th>
<th>Name</th>
<th>Description / restrictions</th>
</tr>
</thead>
<tbody>
<tr>
<td align="right">0</td>
<td><code><span class='Function'>Type</span></code></td>
<td><code><span class='Function'>β€’Type</span></code></td>
</tr>
<tr>
<td align="right">1</td>
<td><code><span class='Function'>Fill</span></code></td>
<td>Get or set the fill value for array <code><span class='Value'>𝕩</span></code></td>
</tr>
<tr>
<td align="right">2</td>
<td><code><span class='Function'>Log</span></code></td>
<td><code><span class='Function'>⋆</span><span class='Modifier'>⁼</span></code> (natural or base-<code><span class='Value'>𝕨</span></code> logarithm) for atomic arguments</td>
</tr>
<tr>
<td align="right">3</td>
<td><code><span class='Function'>GroupLen</span></code></td>
<td><code><span class='Function'>β‰ </span><span class='Modifier'>Β¨</span><span class='Function'>βŠ”</span><span class='Value'>𝕩</span></code> for a valid list <code><span class='Value'>𝕩</span></code>, with minimum length <code><span class='Value'>𝕨</span></code></td>
</tr>
<tr>
<td align="right">4</td>
<td><code><span class='Function'>GroupOrd</span></code></td>
<td><code><span class='Function'>βˆΎβŠ”</span><span class='Value'>𝕩</span></code> provided <code><span class='Value'>𝕨</span></code> is <code><span class='Value'>l</span> <span class='Function'>GroupLen</span> <span class='Value'>𝕩</span></code> (any <code><span class='Value'>l</span></code>)</td>
</tr>
<tr>
<td align="right">5</td>
<td><code><span class='Function'>!</span></code></td>
<td></td>
</tr>
<tr>
<td align="right">6</td>
<td><code><span class='Function'>+</span></code></td>
<td>On one or two atoms</td>
</tr>
<tr>
<td align="right">7</td>
<td><code><span class='Function'>-</span></code></td>
<td>On one or two atoms</td>
</tr>
<tr>
<td align="right">8</td>
<td><code><span class='Function'>Γ—</span></code></td>
<td>On two atoms</td>
</tr>
<tr>
<td align="right">9</td>
<td><code><span class='Function'>Γ·</span></code></td>
<td>On one or two atoms</td>
</tr>
<tr>
<td align="right">10</td>
<td><code><span class='Function'>⋆</span></code></td>
<td>On one or two atoms</td>
</tr>
<tr>
<td align="right">11</td>
<td><code><span class='Function'>⌊</span></code></td>
<td>On one atom</td>
</tr>
<tr>
<td align="right">12</td>
<td><code><span class='Function'>=</span></code></td>
<td>On one value or two atoms</td>
</tr>
<tr>
<td align="right">13</td>
<td><code><span class='Function'>≀</span></code></td>
<td>On two atoms</td>
</tr>
<tr>
<td align="right">14</td>
<td><code><span class='Function'>β‰’</span></code></td>
<td>For array <code><span class='Value'>𝕩</span></code></td>
</tr>
<tr>
<td align="right">15</td>
<td><code><span class='Function'>β₯Š</span></code></td>
<td>For array <code><span class='Value'>𝕩</span></code> with no <code><span class='Value'>𝕨</span></code> or <code><span class='Value'>𝕨</span><span class='Function'>=</span><span class='Modifier2'>β—‹</span><span class='Paren'>(</span><span class='Function'>Γ—</span><span class='Modifier'>Β΄</span><span class='Paren'>)</span><span class='Function'>β‰’</span><span class='Value'>𝕩</span></code></td>
</tr>
<tr>
<td align="right">16</td>
<td><code><span class='Function'>βŠ‘</span></code></td>
<td>For atom <code><span class='Value'>𝕨</span></code> and list <code><span class='Value'>𝕩</span></code></td>
</tr>
<tr>
<td align="right">17</td>
<td><code><span class='Function'>↕</span></code></td>
<td>For natural number <code><span class='Value'>𝕩</span></code></td>
</tr>
<tr>
<td align="right">18</td>
<td><code><span class='Modifier'>⌜</span></code></td>
<td>On arrays</td>
</tr>
<tr>
<td align="right">19</td>
<td><code><span class='Modifier'>`</span></code></td>
<td></td>
</tr>
<tr>
<td align="right">20</td>
<td><code><span class='Modifier2'>_fillBy_</span></code></td>
<td><code><span class='Function'>𝔽</span></code> with result fill computed using <code><span class='Function'>𝔾</span></code></td>
</tr>
<tr>
<td align="right">21</td>
<td><code><span class='Modifier2'>⊘</span></code></td>
<td></td>
</tr>
<tr>
<td align="right">22</td>
<td><code><span class='Modifier2'>⎊</span></code></td>
<td></td>
</tr>
<tr>
<td align="right">β€”</td>
<td><code><span class='Function'>Decompose</span></code></td>
<td><code><span class='Function'>β€’Decompose</span></code></td>
</tr>
<tr>
<td align="right">β€”</td>
<td><code><span class='Function'>PrimInd</span></code></td>
<td>Index for primitive <code><span class='Value'>𝕩</span></code></td>
</tr>
</tbody>
</table>
<p>To define the final two functions, call the second returned element as a function, with argument <code><span class='Function'>Decompose</span><span class='Ligature'>β€Ώ</span><span class='Function'>PrimInd</span></code>. The function <code><span class='Function'>PrimInd</span></code> gives the index of <code><span class='Value'>𝕩</span></code> in the list of all primitives (defined as <code><span class='Value'>glyphs</span></code> in pr.bqn), or the length of the runtime if <code><span class='Value'>𝕩</span></code> is not a primitive. The two functions are only needed for computing inferred properties, and are defined by default so that every value is assumed to be a primitive, and <code><span class='Function'>PrimInd</span></code> performs a linear search over the returned runtime. If the runtime is used directly, then this means that without setting <code><span class='Function'>Decompose</span><span class='Ligature'>β€Ώ</span><span class='Function'>PrimInd</span></code>, function inferred properties will work slowly and for primitives only; if values from the runtime are wrapped then function inferred properties will not work at all. The compiler uses Under with compound right operands, so <code><span class='Function'>Decompose</span></code> is needed to self-host.</p>
<p>Remember that <code><span class='Function'>+</span></code> and <code><span class='Function'>-</span></code> can also work on characters in some circumstances, under the rules of affine characters. Other arithmetic functions should only accept numbers. <code><span class='Function'>=</span></code> must work on any atoms including functions and modifiers. <code><span class='Function'>≀</span></code> must work on numbers and characters.</p>
<h3 id="grouplen-and-groupord"><a class="header" href="#grouplen-and-groupord">GroupLen and GroupOrd</a></h3>
<p>GroupLen and GroupOrd, short for Group length and Group order, are used to implement <a href="../doc/group.html">Group</a> (<code><span class='Function'>βŠ”</span></code>) and also to grade small-range lists and invert permutations (the inverse of a permutation <code><span class='Value'>p</span></code> is <code><span class='Number'>1</span><span class='Modifier'>Β¨</span><span class='Modifier2'>⊸</span><span class='Function'>GroupOrd</span> <span class='Value'>p</span></code>). Each of these only needs to work on lists of integers. Shown below are efficient implementations using BQN extended with the notation <code><span class='Paren'>(</span><span class='Value'>i</span><span class='Function'>βŠ‘</span><span class='Value'>l</span><span class='Paren'>)</span> <span class='Function'>Fn</span><span class='Gets'>↩</span> <span class='Value'>x</span></code> meaning <code><span class='Value'>l</span> <span class='Gets'>↩</span> <span class='Function'>Fn</span><span class='Modifier2'>⟜</span><span class='Value'>x</span><span class='Modifier2'>⌾</span><span class='Paren'>(</span><span class='Value'>i</span><span class='Modifier2'>⊸</span><span class='Function'>βŠ‘</span><span class='Paren'>)</span> <span class='Value'>l</span></code>, where <code><span class='Function'>Fn</span></code> is <code><span class='Function'>⊒</span></code> if omitted. Since these special assignments only change one element of <code><span class='Value'>l</span></code>, each can be a fast constant-time operation.</p>
<pre><span class='Function'>GroupLen</span> <span class='Gets'>←</span> <span class='Brace'>{</span>
  <span class='Value'>l</span> <span class='Gets'>←</span> <span class='Number'>Β―1</span> <span class='Function'>⌈</span><span class='Modifier'>Β΄</span> <span class='Value'>𝕩</span>
  <span class='Value'>r</span> <span class='Gets'>←</span> <span class='Paren'>(</span><span class='Value'>l</span><span class='Function'>+</span><span class='Number'>1</span><span class='Paren'>)</span> <span class='Function'>β₯Š</span> <span class='Number'>0</span>
  <span class='Brace'>{</span> <span class='Paren'>(</span><span class='Value'>𝕩</span><span class='Function'>βŠ‘</span><span class='Value'>r</span><span class='Paren'>)</span> <span class='Function'>+</span><span class='Gets'>↩</span> <span class='Number'>1</span> <span class='Brace'>}</span><span class='Modifier2'>⍟</span><span class='Paren'>(</span><span class='Number'>0</span><span class='Modifier2'>⊸</span><span class='Function'>≀</span><span class='Paren'>)</span><span class='Modifier'>Β¨</span> <span class='Value'>𝕩</span>
  <span class='Paren'>(</span><span class='Value'>𝕨</span><span class='Function'>βŒˆβ‰ </span><span class='Value'>r</span><span class='Paren'>)</span> <span class='Function'>↑</span> <span class='Value'>r</span>
<span class='Brace'>}</span>

<span class='Function'>GroupOrd</span> <span class='Gets'>←</span> <span class='Brace'>{</span>
  <span class='Comment'># 𝕨 ≑ GroupLen 𝕩
</span>  <span class='Value'>s</span> <span class='Gets'>←</span> <span class='Number'>0</span> <span class='Function'>∾</span> <span class='Function'>+</span><span class='Modifier'>`</span> <span class='Value'>𝕨</span>
  <span class='Value'>r</span> <span class='Gets'>←</span> <span class='Paren'>(</span><span class='Number'>Β―1</span><span class='Function'>βŠ‘</span><span class='Value'>s</span><span class='Paren'>)</span> <span class='Function'>β₯Š</span> <span class='String'>@</span>   <span class='Comment'># Every element will be overwritten
</span>  <span class='Paren'>(</span><span class='Function'>↕≠</span><span class='Value'>𝕩</span><span class='Paren'>)</span> <span class='Brace'>{</span> <span class='Paren'>((</span><span class='Value'>𝕩</span><span class='Function'>βŠ‘</span><span class='Value'>s</span><span class='Paren'>)</span><span class='Function'>βŠ‘</span><span class='Value'>r</span><span class='Paren'>)</span><span class='Gets'>↩</span><span class='Value'>𝕨</span> <span class='Separator'>β‹„</span> <span class='Paren'>(</span><span class='Value'>𝕩</span><span class='Function'>βŠ‘</span><span class='Value'>s</span><span class='Paren'>)</span><span class='Function'>+</span><span class='Gets'>↩</span><span class='Number'>1</span> <span class='Brace'>}</span><span class='Modifier2'>⍟</span><span class='Paren'>(</span><span class='Number'>0</span><span class='Modifier2'>⊸</span><span class='Function'>≀</span><span class='Paren'>)</span><span class='Modifier'>Β¨</span> <span class='Value'>𝕩</span>
  <span class='Value'>r</span>
<span class='Brace'>}</span>
</pre>
<h2 id="compiler-arguments"><a class="header" href="#compiler-arguments">Compiler arguments</a></h2>
<p>The compiler takes the source code as <code><span class='Value'>𝕩</span></code>. The execution environment is passed as <code><span class='Value'>𝕨</span></code>, which can contain up to four values:</p>
<ul>
<li><strong>Runtime</strong>: list of primitive values; see <a href="#runtime">previous section</a></li>
<li><strong>System</strong>: function that takes a list of strings and returns corresponding system values</li>
<li><strong>Variables</strong>: names of existing variables in the scope</li>
<li><strong>Depths</strong>: lexical depth of these variables (default <code><span class='Number'>0</span></code>; <code><span class='Number'>Β―1</span></code> for depth 0 but allowing shadowing)</li>
</ul>
<p>If <code><span class='Value'>𝕨</span></code> has length greater than 4 it's assumed to be the runtime only. If the length is less than 4, empty defaults that don't define any values are used for the missing arguments.</p>
<p>The system-value function is passed a list of unique normalized names, meaning that each name is lowercase and contains no underscores. It should return a corresponding list of system values. Because system values are requested on each program run, a function that has access to context such as <code><span class='Value'>β€’path</span></code> can construct appropriate system values on demand.</p>
<p>The variable list is used to create REPLs, but has other uses as well, such as allowing execution to take place within a surrounding scope. It consists of a list of normalized names. The corresponding depth list indicates the lexical depth of each of these, with 0 and -1 indicating that the variable should exist directly in the top-level scope. A typical interactive REPL uses only the value -1, because it allows variables to be shadowed. It maintains a single top-level environment to be used for all evaluations. When the programmer enters a line, it's compiled, then the environment and list of top-level names is extended according to the result.</p>
<h2 id="assembly"><a class="header" href="#assembly">Assembly</a></h2>
<p>The full BQN implementation is made up of the two components aboveβ€”virtual machine and core runtimeβ€”and the compiled runtime, compiler, and formatter. Since the compiler unlikely to work right away, I suggest initially testing the virtual machine on smaller pieces of code compiled by an existing, working, BQN implementation.</p>
<p>BQN sources are compiled with <a href="https://github.com/mlochbaum/BQN/blob/master/implementation/../src/cjs.bqn">cjs.bqn</a>, which runs under <a href="https://github.com/dzaima/BQN/">dzaima/BQN</a> as a Unix-style script. It has two modes. If given a command-line argument <code><span class='Value'>r</span></code>, <code><span class='Value'>c</span></code>, or <code><span class='Value'>fmt</span></code>, it compiles one of the source files. With any other command-line arguments, it will compile each one, and format it as a single line of output. The output is in a format designed for Javascript, but it can be adjusted to work in other languages either by text replacement on the output or changes to the formatting functions in cjs.bqn.</p>
<h3 id="structure"><a class="header" href="#structure">Structure</a></h3>
<p>The following steps give a working BQN system, assuming a working VM and core runtime:</p>
<ul>
<li>Evaluate the bytecode <code><span class='Value'>$</span> <span class='Value'>src</span><span class='Function'>/</span><span class='Value'>cjs.bqn</span> <span class='Value'>r</span></code>, passing the core runtime <code><span class='Value'>provide</span></code> in the constants array. The result is a BQN list of a full runtime, a function <code><span class='Function'>SetPrims</span></code>, and a function <code><span class='Function'>SetInv</span></code>.</li>
<li>Call <code><span class='Function'>SetPrims</span></code> on a two-element list <code><span class='Bracket'>⟨</span><span class='Function'>Decompose</span><span class='Separator'>,</span> <span class='Function'>PrimInd</span><span class='Bracket'>⟩</span></code>.</li>
<li>Optionally, call <code><span class='Function'>SetInv</span></code> with a function <code><span class='Value'>𝕩</span></code> that updates <code><span class='Function'>Inverse</span></code> and (more optionally) a function <code><span class='Value'>𝕨</span></code> that updates <code><span class='Function'>SwapInverse</span></code>.</li>
<li>Evaluate the bytecode <code><span class='Value'>$</span> <span class='Value'>src</span><span class='Function'>/</span><span class='Value'>cjs.bqn</span> <span class='Value'>c</span></code>, which uses primitives from the runtime in its constants array. This is the compiler.</li>
<li>Evaluate the bytecode <code><span class='Value'>$</span> <span class='Value'>src</span><span class='Function'>/</span><span class='Value'>cjs.bqn</span> <span class='Value'>f</span></code>. This returns a function. Then call it on a four-element list <code><span class='Bracket'>⟨</span><span class='Function'>Type</span><span class='Separator'>,</span> <span class='Function'>Decompose</span><span class='Separator'>,</span> <span class='Function'>Glyph</span><span class='Separator'>,</span> <span class='Function'>FmtNum</span><span class='Bracket'>⟩</span></code> to obtain the two-element list <code><span class='Bracket'>⟨</span><span class='Function'>β€’Fmt</span><span class='Separator'>,</span> <span class='Function'>β€’Repr</span><span class='Bracket'>⟩</span></code>.</li>
</ul>
<p>The compiler takes the runtime as <code><span class='Value'>𝕨</span></code> and source code as <code><span class='Value'>𝕩</span></code>. To evaluate BQN source code, convert it into a BQN string (rank-1 array of characters), pass this string and runtime to the compiler, and evaluate the result as bytecode. Results can be formatted with the formatter for use in a REPL, or used from the implementation language.</p>
<p>Two formatter arguments <code><span class='Function'>Glyph</span></code> and <code><span class='Function'>FmtNum</span></code> are not part of the runtime. <code><span class='Function'>Glyph</span></code> assumes <code><span class='Value'>𝕩</span></code> is a primitive and returns the character (not string) that represents it, and <code><span class='Function'>FmtNum</span></code> assumes <code><span class='Value'>𝕩</span></code> is a number and returns a string representing it.</p>
<h3 id="testing"><a class="header" href="#testing">Testing</a></h3>
<p>I recommend roughly the following sequence of tests to get everything working smoothly. It can be very difficult to figure out where in a VM things went wrong, so it's important to work methodically and make sure each component is all right before moving to the next. In order to run test cases before the compiler runs, I strongly recommend building an automated system to compile the test to bytecode using an existing BQN implementation, and run it with the VM being developed.</p>
<p>Because the compiler works almost entirely with lists of numbers, a correct fill implementation is not needed to run the compiler. Instead, you can define <code><span class='Function'>Fill</span></code> as <code><span class='Number'>0</span><span class='Modifier2'>⊘</span><span class='Function'>⊒</span></code> and <code><span class='Modifier2'>_fillBy_</span></code> as <code><span class='Brace'>{</span><span class='Value'>π•˜</span><span class='Separator'>β‹„</span><span class='Function'>𝔽</span><span class='Brace'>}</span></code> to always use a fill element of 0.</p>
<ul>
<li>Test the virtual machine with the output of <code><span class='Value'>src</span><span class='Function'>/</span><span class='Value'>cjs.bqn</span></code> on the primitive-less test expressions in <a href="https://github.com/mlochbaum/BQN/blob/master/implementation/../test/cases/bytecode.bqn">test/cases/bytecode.bqn</a>.</li>
<li>There isn't currently a test suite for provided functions (although <a href="https://github.com/mlochbaum/BQN/blob/master/implementation/../test/cases/simple.bqn">test/cases/simple.bqn</a> has some suitable tests for arithmetic): your options are to write tests based on knowledge of these functions and primitive tests, or try to load the runtime and work backwards from any failures. The r1 runtime runs code to initialize some primitive lookup tables so failures are likely.</li>
<li>Once the runtime is loaded, begin working through the tests in <a href="https://github.com/mlochbaum/BQN/blob/master/implementation/../test/cases/prim.bqn">test/cases/prim.bqn</a> with the full runtime but no self-hosted compiler.</li>
<li>Now, if you haven't already, add a call to <code><span class='Function'>SetPrims</span></code>. Test for inferred properties: identity, under, and undo.</li>
<li>After primitive and inferred tests pass, try to load the compiler, and run it on a short expression. If it runs, you have a complete (not necessarily correct) system, and remaining tests can be run end-to-end!</li>
<li>Headers and namespace support aren't required to support the runtime or compiler, but they can be tested as you add them with the header and namespace tests. Undo headers are tested with the unhead tests.</li>
</ul>