Updates based on current state of Co-dfns

author: Marshall Lochbaum <mwlochbaum@gmail.com> 2022-05-05 22:20:02 -0400
committer: Marshall Lochbaum <mwlochbaum@gmail.com> 2022-05-05 22:21:50 -0400
commit: e8d8a48b15389d43ada53e4de6c458ce6240c643 (patch)
tree: e66ca4d57c8fa1f36e3d657e737c7f3c0f41f68f /implementation/codfns.md
parent: 54c04b9a04619ae4ef6159a2ca9e59ad00b99cc9 (diff)
1 files changed, 6 insertions, 4 deletions
diff --git a/implementation/codfns.md b/implementation/codfns.md
index d8e79b4e..57d1952d 100644
--- a/implementation/codfns.md
+++ b/implementation/codfns.md
@@ -2,13 +2,15 @@
 
 # Co-dfns versus BQN's implementation
 
+*Co-dfns is under active development so this document might not reflect its current state. Last update 2022-05-05.*
+
 The BQN self-hosted compiler is directly inspired by the [Co-dfns](https://github.com/Co-dfns/Co-dfns) project, a compiler for a subset of [Dyalog APL](../doc/fromDyalog.md). I'm very grateful to Aaron for showing that array-oriented compilation is even possible! In addition to the obvious difference of target language, BQN differs from Co-dfns both in goals and methods.
 
 The shared goals of BQN and Co-dfns are to implement a compiler for an array language with whole-array operations. This provides the theoretical benefit of a short *critical path*, which in practice means that both compilers can make good use of a GPU or a CPU's vector instructions simply by providing an appropriate runtime (Co-dfns has good runtimes—an ArrayFire program on the GPU and Dyalog APL on the CPU—while CBQN isn't at the same level yet). The two implementations also share a preference for working "close to the metal" by passing around arrays of numbers rather than creating abstract types to work with data. Objects are right out. These choices lead to a compact source code implementation, and may have some benefits in terms of how easy it is to write and understand the compiler.
 
 ## Compilation strategy
 
-Co-dfns development has primarily been focused on the core compiler, and not parsing, code generation, or the runtime. The associated Ph.D. thesis and famous 17 lines figure refer to this section, which transforms the abstract syntax tree (AST) of a program to a lower-level form, and resolves lexical scoping by linking variables to their definitions. While all of Co-dfns is written in APL, other sections aren't necessarily designed to be data-parallel and don't have the same performance guarantees. For example, the parser uses a parsing expression grammar (PEG), a sequential algorithm. In contrast, BQN is entirely written in a data-parallel style. It does not maintain the same clean separation between compiler sections: [token formation](../spec/token.md) and literal evaluation is separated into its own function, but parsing, AST manipulation, and code generation overlap.
+Co-dfns development historically focused on the core compiler, and not parsing, code generation, or the runtime. The associated Ph.D. thesis and famous 17 lines figure refer to this section, which transforms the abstract syntax tree (AST) of a program to a lower-level form, and resolves lexical scoping by linking variables to their definitions. While all of Co-dfns is written in APL, other sections aren't necessarily data-parallel, and could perform differently. As of late 2021, tokenization and most of parsing also follow the style used by the core compiler, but looser about nesting in my estimation. This brings Co-dfns closer to BQN, which is all in a data-parallel style.
 
 The core Co-dfns compiler is based on manipulating the syntax tree, which is mostly stored as parent and sibling vectors—that is, lists of indices of other nodes in the tree. BQN is less methodical, but generally treats the source tokens as a simple list. This list is reordered in various ways, allowing operations that behave like tree traversals to be performed with scans under the right ordering. This strategy allows BQN to be much stricter in the kinds of operations it uses. Co-dfns regularly uses `⍣≡` (repeat until convergence) for iteration and creates nested arrays with `⌸` (Key, like [Group](../doc/group.md)), but BQN has only a single instance of iteration to resolve quotes and comments, plus one complex but parallelizable scan for numeric literal processing, and only uses Group to extract identifiers and strings. This means that most primitives, if we count simple reductions and scans as single primitives, are executed a fixed number of times during execution, making complexity analysis even easier than in Co-dfns.
 
@@ -20,11 +22,11 @@ Neither BQN nor Co-dfns significantly optimize their output at the time of writi
 
 ## Error reporting
 
-Co-dfns doesn't check for compilation errors, while BQN has complete error checking and good error messages, and includes source positions in compiler errors as well as in the compiled code for use in runtime errors. Position tracking and error checking add up to a little more than 20% overhead for the compiler, both in runtime and lines of code. And improving the way errors are reported once found has no cost for working programs, because reporting code only needs to be run if there's a compiler error. The only thing that really takes advantage of this now is the reporting for bracket matching, which goes over all brackets with a stack-based (not array-oriented or parallel) method.
+Co-dfns initially didn't check for compilation errors, but has started to add some checks with messages. It's behind BQN, which has complete error checking and good error messages, and includes source positions in compiler errors as well as in the compiled code for use in runtime errors. Position tracking and error checking add up to a little more than 20% overhead for the compiler, both in runtime and lines of code. And improving the way errors are reported once found has no cost for working programs, because reporting code only needs to be run if there's a compiler error. The only thing that really takes advantage of this now is the reporting for bracket matching, which goes over all brackets with a stack-based (not array-oriented or parallel) method.
 
 ## Comments
 
-Aaron advocates the almost complete separation of code from comments (thesis) in addition to his very terse style as a general programming methodology. I find that this practice makes it hard to connect the documentation to the code, and is very slow in providing a summary or reminder of functionality that a comment might. One comment on each line makes a better balance of compactness and faster accessibility in my opinion. However, I do plan to write long-form material providing the necessary context and explanations required to understand the compiler.
+Aaron advocates the almost complete separation of code from comments (thesis) in addition to his very terse style as a general programming methodology. I find that this practice makes it hard to connect the documentation to the code, and is very slow in providing a summary or reminder of functionality that a comment might. One comment on each line makes a better balance of compactness and faster accessibility in my opinion. Recently Co-dfns has added comments in a similar volume, with a full-line comment for every group of two to three lines.
 
 ## Is it a good idea?
 
@@ -36,7 +38,7 @@ It's important to note that my judgment here applies specifically to implementin
 
 It needs to be said: Aaron is easily one of the most talented programmers I know, and I have something of a knack for arrays myself. At present, building an array compiler requires putting together array operations in new and complicated ways, often with nothing but intuition to hint at which ones to use. It is much harder than making a compiler the normal way. However, there is some reason for hope in the [history](https://en.wikipedia.org/wiki/History_of_compiler_construction) of traditional compilers. It took a team led by John Backus two and a half years to produce the first FORTRAN compiler, and they gave him a Turing award for it! Between the 1950s and 1970s, developments like the LR parser brought compilers from being a challenge for the greatest minds to something accessible to a typical programmer with some effort. I don't believe the change will be so dramatic for array-based compilers, because many advantages in languages and tooling—keep in mind the FORTRAN implementers used assembly—are shared with ordinary programming. But Aaron and I have already discovered some concepts like tree manipulation and depth-based reordering that make it easier to think about compilation, and there are certainly more to be found.
 
-I find that variable management is a major difficulty in working with the compiler. This is a problem that Aaron doesn't have, because his compiler is 17 lines long. What happens in a larger program is that various properties need to be computed in one place and used in another, making it hard to keep track of how these were computed and what they mean. In BQN, different sections of the compiler use different source orderings (one thing I've expended some effort on is to reduce the number of orderings used). A tree-based compiler would probably have similar problems, unless all the state is going to be transformed at each step, which would perform poorly. Using a variable with one ordering in the wrong place is a frequent source of errors, particularly if the ordering is something like expanding function bodies that has no effect in many small programs. Is there some way to protect against these errors?
+I find that variable management is a major difficulty in working with the compiler. This is a problem that Aaron doesn't (or didn't) have, because his compiler is 17 lines long. What happens in a larger program is that various properties need to be computed in one place and used in another, making it hard to keep track of how these were computed and what they mean. In BQN, different sections of the compiler use different source orderings (one thing I've expended some effort on is to reduce the number of orderings used). A tree-based compiler would probably have similar problems, unless all the state is going to be transformed at each step, which would perform poorly. Using a variable with one ordering in the wrong place is a frequent source of errors, particularly if the ordering is something like expanding function bodies that has no effect in many small programs. Is there some way to protect against these errors?
 
 #### Does APL Need a Type System?
author	Marshall Lochbaum <mwlochbaum@gmail.com>	2022-05-05 22:20:02 -0400
committer	Marshall Lochbaum <mwlochbaum@gmail.com>	2022-05-05 22:21:50 -0400
commit	e8d8a48b15389d43ada53e4de6c458ce6240c643 (patch)
tree	e66ca4d57c8fa1f36e3d657e737c7f3c0f41f68f /implementation/codfns.md
parent	54c04b9a04619ae4ef6159a2ca9e59ad00b99cc9 (diff)