1 files changed, 4 insertions, 0 deletions
diff --git a/docs/implementation/codfns.html b/docs/implementation/codfns.html
index aab008a5..4e9647fe 100644
--- a/docs/implementation/codfns.html
+++ b/docs/implementation/codfns.html
@@ -53,3 +53,7 @@
 </ul>
 <p>I don't know how to implement these in an array style. I suspect many are simply not possible. They tend to involve relationships between distant parts of the program, and are approached with graph-based methods for which efficient array-style implementations aren't known.</p>
 <p>It might be possible to translate a compiler with less optimization such as Go to an array style, partially or completely. But Go compiles very fast, so speeding it up isn't as desirable. In slower compilers, attacking problems like the above strikes me as many levels beyond the work on Co-dfns or BQN.</p>
+<h2 id="pareas"><a class="header" href="#pareas">Pareas</a></h2>
+<p><a href="https://github.com/Snektron/pareas">Pareas</a> is a GPU-based compiler that takes a simple custom-designed imperative language as input and outputs RISC-V. It's implemented in <a href="https://github.com/diku-dk/futhark">Futhark</a>, an array-oriented ML-family language with GPU support. Pareas and the page you are reading were developed in… parallel, and I became aware of it after writing the pessimistic take on GPU compilers above. I'm very glad this work is being done, but it seems fully compatible with my conclusion that our current data-parallel methods aren't advanced enough to make useful improvements in slow compilers. Both the source and target language had to be kept simple and not many optimizations are performed. It does show that APL syntax isn't necessary to develop a GPU compiler. I think it's too different from Co-dfns and BQN to say whether one development style might be more effective, though.</p>
+<p>Pareas seems to focus on applying general methods in a smart way before going special-purpose, even more so than Co-dfns. I think this is absolutely a good choice and the right direction for GPU research. But it does reduce the power available to the compiler writers in some sense, making Pareas appear less capable than it might otherwise—until you account for the flexibility to handle other languages. The main parser targets a class of languages called <em>LLP</em>(<em>q</em>,<em>k</em>), and for simplicity it's restricted further to <em>LLP</em>(1,1). This isn't expressive enough for any serious language, and even with their specialized input language the authors add a second pass using tree manipulation to refine the grammar. In particular, BQN isn't in this class because any part of an expression can be parenthesized an arbitrary number of times, requiring unbounded lookahead (or behind) to determine the role from the outside.</p>
+<p>As it stands, the compiler's performance isn't great, but that's entirely due to slow register allocation. The rest of the compiler is capable of over 1GB/s on their high-end GPU. Sure, it takes megabytes of source to reach these speeds, but that leaves us at &quot;compile just about anything in a fraction of a second&quot; which is very attractive. The register allocation is kind of worrying because they rejected graph coloring for performance reasons and went with a very basic allocator that spills whenever things get tough. Nonetheless I think challenges like more realistic syntax and a good allocator can probably be accomodated and perform substantially faster than current CPU-based code. But I see that allocator as the tip of an iceberg: it's nowhere near the hardest or scalar-est task for an optimizing compiler, and a few intractable problems can sink the whole thing. Maybe there's room to pass such things off to the CPU. Maybe a linker is a better goal for GPU hosting? Even though I don't expect a GPU gcc on the horizon, I'm excited to see what comes out of this line of research!</p>