Editing

author: Marshall Lochbaum <mwlochbaum@gmail.com> 2022-01-25 19:03:40 -0500
committer: Marshall Lochbaum <mwlochbaum@gmail.com> 2022-01-25 20:24:00 -0500
commit: 29bc342af8527f9bada1d011b17d7fd87d4ebdad (patch)
tree: 4e40b0d315bbc2c308888833df5bdab61cba1637 /docs/implementation/codfns.html
parent: d12d44dd0d6288ca7b41c113b21105abe107a367 (diff)
1 files changed, 1 insertions, 1 deletions
diff --git a/docs/implementation/codfns.html b/docs/implementation/codfns.html
index 65ac1391..ba71fb26 100644
--- a/docs/implementation/codfns.html
+++ b/docs/implementation/codfns.html
@@ -28,7 +28,7 @@
 <p>The sort of static guarantee I want is not really a type system but an <em>axis</em> system. That is, if I take <code><span class='Value'>a</span><span class='Function'>∧</span><span class='Value'>b</span></code> I want to know that the arithmetic mapping makes sense because the two variables use the same axis. And I want to know that if <code><span class='Value'>a</span></code> and <code><span class='Value'>b</span></code> are compatible, then so are <code><span class='Value'>i</span><span class='Function'>⊏</span><span class='Value'>a</span></code> and <code><span class='Value'>i</span><span class='Function'>⊏</span><span class='Value'>b</span></code>, but not <code><span class='Value'>a</span></code> and <code><span class='Value'>i</span><span class='Function'>⊏</span><span class='Value'>b</span></code>. I could use a form of <a href="https://en.wikipedia.org/wiki/Hungarian_notation">Hungarian notation</a> for this, and write <code><span class='Value'>ia</span><span class='Gets'>←</span><span class='Value'>i</span><span class='Function'>⊏</span><span class='Value'>a</span></code> and <code><span class='Value'>ib</span><span class='Gets'>←</span><span class='Value'>i</span><span class='Function'>⊏</span><span class='Value'>b</span></code>, but it's inconvenient to rewrite the axis every time the variable appears, and I'd much prefer a computer checking agreement rather than my own fallible self.</p>
 <h3 id="performance"><a class="header" href="#performance">Performance</a></h3>
 <p>In his Co-dfns paper Aaron compares to nanopass implementations of his compiler passes. Running on the CPU and using Chez Scheme (not Racket, which is also presented) for nanopass, he finds Co-dfns is up to <strong>10 times faster</strong> for large programs. The GPU is of course slower for small programs and faster for larger ones, breaking even above 100,000 AST nodes—quite a large program. I think comparing the self-hosted BQN compiler to the one in dzaima/BQN shows that this large improvement is caused as much by nanopass being slow as Co-dfns being fast.</p>
-<p>The self-hosted compiler running in CBQN reachej full performance at about 1KB of dense source code. On large files it achieves speeds around 3MB/s, about <strong>two-thirds as fast</strong> as dzaima/BQN's compiler. This compiler was written in Java by dzaima in a much shorter time than the self-hosted compiler, and is equivalent for benchmarking purposes. While there are minor differences in syntax accepted and the exact bytecode output, I'm sure that either compiler could be modified to match the other with negligible changes in compilation time. The Java compiler is written with performance in mind, but dzaima has expended only a moderate amount of effort to optimize it.</p>
+<p>The self-hosted compiler running in CBQN reaches full performance at about 1KB of dense source code. On large files it achieves speeds around 3MB/s, about <strong>two-thirds as fast</strong> as dzaima/BQN's compiler. This compiler was written in Java by dzaima in a much shorter time than the self-hosted compiler, and is equivalent for benchmarking purposes. While there are minor differences in syntax accepted and the exact bytecode output, I'm sure that either compiler could be modified to match the other with negligible changes in compilation time. The Java compiler is written with performance in mind, but dzaima has expended only a moderate amount of effort to optimize it.</p>
 <p>A few factors other than the speed of the nanopass compiler might partly cause the discrepancy, or otherwise be worth taking into account. I doubt that these can add up to a factor of 15, so I think that nanopass is simply not as fast as more typical imperative compiler methods.</p>
 <ul>
 <li>The CBQN runtime is still suboptimal, missing SIMD implementations for some primitives used in the compiler. But improvements will be limited for operations like selection that don't vectorize as well. My estimate is a little less than a factor of 2 improvement remaining from improving speed to match Dyalog, and I think more than a factor of 4 is unlikely.</li>
author	Marshall Lochbaum <mwlochbaum@gmail.com>	2022-01-25 19:03:40 -0500
committer	Marshall Lochbaum <mwlochbaum@gmail.com>	2022-01-25 20:24:00 -0500
commit	29bc342af8527f9bada1d011b17d7fd87d4ebdad (patch)
tree	4e40b0d315bbc2c308888833df5bdab61cba1637 /docs/implementation/codfns.html
parent	d12d44dd0d6288ca7b41c113b21105abe107a367 (diff)