diff options
Diffstat (limited to 'implementation/codfns.md')
| -rw-r--r-- | implementation/codfns.md | 21 |
1 files changed, 10 insertions, 11 deletions
diff --git a/implementation/codfns.md b/implementation/codfns.md index 45cede0e..89a03491 100644 --- a/implementation/codfns.md +++ b/implementation/codfns.md @@ -8,21 +8,20 @@ The shared goals of BQN and Co-dfns are to implement a compiler for an array lan ## Compilation strategy -Co-dfns development has primarily been focused on the core compiler, and not parsing, code generation, or the runtime. The associated Ph.D. thesis and famous 17 lines figure refer to this section, which transforms the abstract syntax tree (AST) of a program to a lower-level form, and resolves lexical scoping by linking variables to their definitions. While all of Co-dfns is written in APL, other sections aren't necessarily designed to be data-parallel and don't have the same performance guarantees. For example, the parser appears to be written with some sort of parser combinators. In contrast, BQN is entirely written in a data-parallel style. It does not maintain the same clean separation between compiler sections: [token formation](../spec/token.md) and literal evaluation is separated into its own function, but parsing, AST manipulation, and code generation overlap. +Co-dfns development has primarily been focused on the core compiler, and not parsing, code generation, or the runtime. The associated Ph.D. thesis and famous 17 lines figure refer to this section, which transforms the abstract syntax tree (AST) of a program to a lower-level form, and resolves lexical scoping by linking variables to their definitions. While all of Co-dfns is written in APL, other sections aren't necessarily designed to be data-parallel and don't have the same performance guarantees. For example, the parser uses a parsing expression grammar (PEG), a sequential algorithm. In contrast, BQN is entirely written in a data-parallel style. It does not maintain the same clean separation between compiler sections: [token formation](../spec/token.md) and literal evaluation is separated into its own function, but parsing, AST manipulation, and code generation overlap. The core Co-dfns compiler is based on manipulating the syntax tree, which is mostly stored as parent and sibling vectors—that is, lists of indices of other nodes in the tree. BQN is less methodical, but generally treats the source tokens as a simple list. This list is reordered in various ways, allowing operations that behave like tree traversals to be performed with scans under the right ordering. This strategy allows BQN to be much stricter in the kinds of operations it uses. Co-dfns regularly uses `⍣≡` (repeat until convergence) for iteration and creates nested arrays with `⌸` (Key, like [Group](../doc/group.md)), but BQN has only a single instance of iteration to resolve quotes and comments, plus one complex but parallelizable scan for numeric literal processing, and only uses Group to extract identifiers and strings. This means that most primitives, if we count simple reductions and scans as single primitives, are executed a fixed number of times during execution, making complexity analysis even easier than in Co-dfns. -A goal for BQN was to not only write the compiler in BQN but to use BQN for the runtime as much as possible, while Co-dfns uses a conventional runtime written in C with ArrayFire. This goal has largely been achieved, so that the current BQN runtime uses a very small number of basic array operations currently provided by Javascript. However, basic operations may be added in the future for performance reasons. +## Backends and optimization -## Code style +Co-dfns was designed from the beginning to build GPU programs, and outputs code in ArrayFire (a C++ framework), which is then compiled. GPU programming is quite limiting, and as a result Co-dfns has strict limitations in functionality that are slowly being removed. It now has partial support for nested arrays and array ranks higher than 4. BQN is designed with performance in mind, but implementation effort focused on functionality first, so that arbitrary array structures as well as trains and lexical closures have been supported from the beginning. Rather than target a specific language, it outputs object code to be interpreted by a [virtual machine](vm.md). Another goal for BQN was to not only write the compiler in BQN but to use BQN for the runtime as much as possible. The BQN-based runtime uses a small number of basic array operations provided by the VM. The extra abstraction causes this runtime to be very slow, but this can be fixed by overwriting functions from the runtime with natively-implemented ones. -Initially, the BQN compiler was written with a similar style to Co-dfns, primarily because I wanted to quickly have a working BQN implementation and reach parity with the Co-dfns compiler (not the runtime!). This style is now shifting to support several goals that Co-dfns doesn't have: -- Error reporting -- Teaching -- Multiple backends (bytecode, Wasm, machine code) -- Optimization -- Testing and debugging the compiler +Neither BQN nor Co-dfns significantly optimize their output at the time of writing (it could be said that Co-dfns relies on the ArrayFire backend to optimize). BQN does have one optimization, which is to compute variable lifetimes in functions so that the last access to a variable can clear it. Further optimizations often require finding properties such as reachability in a graph of expressions that probably can't be done efficiently in a strict array style. For this and other reasons it would probably be best to structure compiler optimization as a set of additional modules that can be provided during a given compilation. -So far the most notable difference is the addition of per-line comments. Aaron advocates the almost complete separation of code from comments (thesis) in addition to his very terse style as a general programming methodology. I find that this practice makes it hard to connect the documentation to the code and is very slow in providing a summary or reminder of functionality that a comment might. However, I do plan to write long-form material providing the necessary context and explanations required to understand the compiler. +## Error reporting -Co-dfns doesn't check for compilation errors, while BQN has complete error checking and good error messages, and includes source positions in compiler errors as well as in the compiled code for use in runtime errors. I have found that such checking imposes a low cost on compilation speed (perhaps 25%), but a greater cost on the compiler codebase size and legibility. Maintaining source locations is also a small expense relative to other compilation. Aside from that maintenance, more user-friendly error reporting has almost no performance cost for working programs, because error reporting code only needs to be run when compilation fails. This leaves room for potentially very sophisticated error analysis to attempt to track down the root cause of a compilation error, but I haven't yet done any work along these lines. +Co-dfns doesn't check for compilation errors, while BQN has complete error checking and good error messages, and includes source positions in compiler errors as well as in the compiled code for use in runtime errors. Position tracking and error checking add up to a little more than 20% overhead for the compiler, both in runtime and lines of code. And improving the way errors are reported once found has no cost for working programs, because reporting code only needs to be run if there's a compiler error. This leaves room for potentially very sophisticated error analysis to attempt to track down the root cause of a compilation error, but I haven't yet done any work along these lines. + +## Comments + +Aaron advocates the almost complete separation of code from comments (thesis) in addition to his very terse style as a general programming methodology. I find that this practice makes it hard to connect the documentation to the code, and is very slow in providing a summary or reminder of functionality that a comment might. One comment on each line makes a better balance of compactness and faster accessibility in my opinion. However, I do plan to write long-form material providing the necessary context and explanations required to understand the compiler. |
