1 files changed, 50 insertions, 1 deletions
diff --git a/implementation/codfns.md b/implementation/codfns.md
index 89a03491..42523c49 100644
--- a/implementation/codfns.md
+++ b/implementation/codfns.md
@@ -4,7 +4,7 @@
 
 The BQN self-hosted compiler is directly inspired by the [Co-dfns](https://github.com/Co-dfns/Co-dfns) project, a compiler for a subset of [Dyalog APL](../doc/fromDyalog.md). I'm very grateful to Aaron for showing that array-oriented compilation is even possible! In addition to the obvious difference of target language, BQN differs from Co-dfns both in goals and methods.
 
-The shared goals of BQN and Co-dfns are to implement a compiler for an array language with whole-array operations. This provides the theoretical benefit of a short *critical path*, which in practice means that both compilers can make good use of a GPU or a CPU's vector instructions simply by providing an appropriate runtime (however, only Co-dfns has such a runtime—an ArrayFire program on the GPU and Dyalog APL on the CPU). The two implementations also share a preference for working "close to the metal" by passing around arrays of numbers rather than creating abstract types to work with data. Objects are right out. These choices lead to a compact source code implementation, and may have some benefits in terms of how easy it is to write and understand the compiler.
+The shared goals of BQN and Co-dfns are to implement a compiler for an array language with whole-array operations. This provides the theoretical benefit of a short *critical path*, which in practice means that both compilers can make good use of a GPU or a CPU's vector instructions simply by providing an appropriate runtime (Co-dfns has good runtimes—an ArrayFire program on the GPU and Dyalog APL on the CPU—while CBQN isn't at this level yet). The two implementations also share a preference for working "close to the metal" by passing around arrays of numbers rather than creating abstract types to work with data. Objects are right out. These choices lead to a compact source code implementation, and may have some benefits in terms of how easy it is to write and understand the compiler.
 
 ## Compilation strategy
 
@@ -25,3 +25,52 @@ Co-dfns doesn't check for compilation errors, while BQN has complete error check
 ## Comments
 
 Aaron advocates the almost complete separation of code from comments (thesis) in addition to his very terse style as a general programming methodology. I find that this practice makes it hard to connect the documentation to the code, and is very slow in providing a summary or reminder of functionality that a comment might. One comment on each line makes a better balance of compactness and faster accessibility in my opinion. However, I do plan to write long-form material providing the necessary context and explanations required to understand the compiler.
+
+## Is it a good idea?
+
+In short: **no**, I don't think it makes sense to use an array style for a compiler without a good reason. BQN uses it so it can self-host while maintaining good performance; Co-dfns uses it to prove it's possible to implement the core of a compiler with low parallel asymptotic complexity. It could also make a fun hobby project, although it's very draining. If the goal is to produce a working compiler then my opinion is that using the array style will take longer and require more skill, and the resulting compiler will be slower than a traditional one in a low-level language for practical tasks. Improvements in methodology could turn this around, but I'm pessimistic.
+
+### Ease of development
+
+It needs to be said: Aaron is easily one of the most talented programmers I know, and I have something of a knack for arrays myself. At present, building an array compiler requires putting together array operations in new and complicated ways, often with nothing but intuition to hint at which ones to use. It is much harder than making a compiler the normal way. However, there is some reason for hope in the [history](https://en.wikipedia.org/wiki/History_of_compiler_construction) of traditional compilers. It took a team led by John Backus two and a half years to produce the first FORTRAN compiler, and they gave him a Turing award for it! Between the 1950s and 1970s, developments like the LR parser brought compilers from being a challenge for the greatest minds to something accessible to a typical programmer with some effort. I don't believe the change will be so dramatic for array-based compilers, because many advantages in languages and tooling—keep in mind the FORTRAN implementers used assembly—are shared with ordinary programming. But Aaron and I have already discovered some concepts like tree manipulation and depth-based reordering that make it easier to think about compilation, and there are certainly more to be found.
+
+I find that variable management is a major difficulty in working with the compiler. This is a problem that Aaron doesn't have, because his compiler is 17 lines long. What happens in a larger program is that various properties need to be computed in one place and used in another, making it hard to keep track of how these were computed and what they mean. In BQN, different sections of the compiler use different source orderings (one thing I've expended some effort on is to reduce the number of orderings used). A tree-based compiler would probably have similar problems, unless all the state is going to be transformed at each step, which would perform poorly. Using a variable with one ordering in the wrong place is a frequent source of errors, particularly if the ordering is something like expanding function bodies that has no effect in many small programs. Is there some way to protect against these errors?
+
+#### Does APL Need a Type System?
+
+[Here's Aaron's take](https://www.youtube.com/watch?v=z8MVKianh54). Honestly I can't really go along with this: I think he ignores a lot of real distinctions between array and typed functional programming because it's convenient for the point he wants to make. On the other hand, it's abundantly clear that C-style types would be useless for an array compiler, because nearly every variable is a list of integers.
+
+The sort of static guarantee I want is not really a type system but an *axis* system. That is, if I take `a∧b` I want to know that the arithmetic mapping makes sense because the two variables use the same axis. And I want to know that if `a` and `b` are compatible, then so are `i⊏a` and `i⊏b`, but not `a` and `i⊏b`. I could use a form of [Hungarian notation](https://en.wikipedia.org/wiki/Hungarian_notation) for this, and write `ia←i⊏a` and `ib←i⊏b`, but it's inconvenient to rewrite the axis every time the variable appears, and I'd much prefer a computer checking agreement rather than my own fallible self.
+
+### Performance
+
+In his Co-dfns paper Aaron compares to nanopass implementations of his compiler passes. Running on the CPU and using Chez Scheme (not Racket, which is also presented) for nanopass, he finds Co-dfns is up to **10 times faster** for large programs. The GPU is of course slower for small programs and faster for larger ones, breaking even above 100,000 AST nodes—quite a large program. I think comparing the self-hosted BQN compiler to the one in dzaima/BQN shows that this large improvement is caused as much by nanopass being slow as Co-dfns being fast.
+
+The self-hosted compiler running in CBQN achieves full performance at about 1KB of dense source code. On large files it achieves speeds around 2MB/s, slightly less than **half as fast** as dzaima/BQN's compiler. This compiler was written in Java by dzaima in a much shorter time than the self-hosted compiler, and is equivalent for benchmarking purposes. While there are minor differences in syntax accepted and the exact bytecode output, I'm sure that either compiler could be modified to match the other with negligible changes in compilation time. The Java compiler is written with performance in mind, but dzaima has expended only a moderate amount of effort to optimize it.
+
+A few factors other than the speed of the nanopass compiler might partly cause the discrepancy, or otherwise be worth taking into account. I doubt that these can add up to a factor of 20, so I think that nanopass is simply not as fast as more typical imperative compiler methods.
+
+- The CBQN runtime is still suboptimal, with no integer types smaller than 4 bytes (in particular no bit booleans). Small types would improve things but this is limited as the bottleneck shifts over to operations like selection that don't scale well to small types. My estimate is a factor of 3 improvement from improving speed to match Dyalog, and I think more than a factor of 5 is unlikely.
+- On the other hand Java isn't the fastest language for a compiler and a C-based compiler would likely be faster. I don't have an estimate for the size of the difference.
+- Co-dfns and BQN use different compilation strategies. I think that my methods are at least as fast, and scale better to a full compiler.
+- The portion of the compiler implemented by Co-dfns could be better for arrays than other sections, which in fact seems likely to me—I would say parsing is the worst for array relative to scalar programming. I think the whole-compiler comparison is more informative, although if this effect is very strong (I don't think it is), then hybrid array-scalar compilers could make sense.
+- The Co-dfns and nanopass implementations are pass-for-pass equivalent, while BQN and Java are only comparable as a whole. As the passes were chosen to be ideal for Co-dfns, I think this could be slowing down nanopass in a way that doesn't translate to real-world performance.
+
+Overall, it seems safe to say that an ideal array-based compiler would be competitive with a scalar one on the CPU, not vastly better as Aaron's benchmarks suggest. This result is still remarkable! APL and BQN are high-level dynamically-typed languages, and wouldn't be expected to go near the performance of a compiled language like Java. However, it makes it much harder to recommend array-based compilation than the numbers from the Co-dfns paper would.
+
+I stress here that I don't think there's anything wrong about the way Aaron has conducted or presented his research. The considerations described above are speculative even in (partial) hindsight. I think Aaron chose nanopass because of his familiarity with the functional programming compiler literature and because its multi-pass system is more similar to Co-dfns. And I know that he actively sought out other programmers willing to implement the compiler in other ways including imperative methods; apparently these efforts didn't pan out. Aaron even mentioned to me that his outstanding results were something of a problem for him, because reviewers found them unbelievable!
+
+#### What about the GPU?
+
+BQN's compiler could certainly be made to run on a GPU, and it's fascinating that this is possible merely because I stuck to an array-based style. In Co-dfns, Aaron found a maximum factor of 6 improvement by running on the GPU, and this time it's the GPU runtime that we should expect to be slower than Dyalog. So we could expect an array-based compiler to run faster on large source files in this case. The problem is this: who could benefit from this speed?
+
+Probably not BQN. Recall that the BQN compiler runs at 2MB/s. This is fast enough that it almost certainly takes much longer to compile the program than to run it. The exception would be when a lot of code is loaded but not used, which can be solved by splitting the code into smaller files and only using those which are needed.
+
+The programmers who complain about compile times are those who use low-level languages like C++, Rust, and Julia. The things that these compilers spend time on aren't the things that the BQN compiler does! BQN has a rigid syntax, no metaprogramming, and compiles to bytecode. The slow compilers are churning to perform tasks like:
+- Type checking
+- Templates, polymorphism, and other code generation
+- Reachability computations, for optimization
+- Automatic vectorization
+I don't know how to implement these in an array style. I suspect many are simply not possible. They tend to involve relationships between distant parts of the program, and are approached with graph-based methods for which efficient parallel implementations aren't known.
+
+It might be possible to translate a compiler with less optimization such as Go to an array style, partially or completely. But Go compiles very fast, so speeding it up isn't as desirable. In slower languages attacking problems like the above strikes me as many levels beyond the work on Co-dfns or BQN.