From 7cdda7ebdf565ee88276592d30691f14306f9bd1 Mon Sep 17 00:00:00 2001 From: Marshall Lochbaum Date: Fri, 1 Jul 2022 09:56:13 -0400 Subject: Revisit CBQN versus dzaima/BQN performance comparison --- docs/implementation/codfns.html | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'docs/implementation') diff --git a/docs/implementation/codfns.html b/docs/implementation/codfns.html index 45dbfc19..b9b830dd 100644 --- a/docs/implementation/codfns.html +++ b/docs/implementation/codfns.html @@ -29,10 +29,10 @@

The sort of static guarantee I want is not really a type system but an axis system. That is, if I take ab I want to know that the arithmetic mapping makes sense because the two variables use the same axis. And I want to know that if a and b are compatible, then so are ia and ib, but not a and ib. I could use a form of Hungarian notation for this, and write iaia and ibib, but it's inconvenient to rewrite the axis every time the variable appears, and I'd much prefer a computer checking agreement rather than my own fallible self.

Performance

In his Co-dfns paper Aaron compares to nanopass implementations of his compiler passes. Running on the CPU and using Chez Scheme (not Racket, which is also presented) for nanopass, he finds Co-dfns is up to 10 times faster for large programs. The GPU is of course slower for small programs and faster for larger ones, breaking even above 100,000 AST nodes—quite a large program. I think comparing the self-hosted BQN compiler to the one in dzaima/BQN shows that this large improvement is caused as much by nanopass being slow as Co-dfns being fast.

-

The self-hosted compiler running in CBQN reaches full performance at about 1KB of dense source code. On large files it achieves speeds around 3MB/s, about two-thirds as fast as dzaima/BQN's compiler. This compiler was written in Java by dzaima in a much shorter time than the self-hosted compiler, and is equivalent for benchmarking purposes. While there are minor differences in syntax accepted and the exact bytecode output, I'm sure that either compiler could be modified to match the other with negligible changes in compilation time. The Java compiler is written with performance in mind, but dzaima has expended only a moderate amount of effort to optimize it.

-

A few factors other than the speed of the nanopass compiler might partly cause the discrepancy, or otherwise be worth taking into account. I doubt that these can add up to a factor of 15, so I think that nanopass is simply not as fast as more typical imperative compiler methods.

+

The self-hosted compiler running in CBQN reaches full performance at about 1KB of dense source code. Handling over 3MB/s, it's around half as fast as dzaima/BQN's compiler (but it's complicated—dbqn is usually slower on the first run but gets up to 3 times faster with some files, after hundreds of runs and with 3GB of memory use). This compiler was written in Java by dzaima in a much shorter time than the self-hosted compiler, and is equivalent for benchmarking purposes. While there are minor differences in syntax accepted and the exact bytecode output, I'm sure that either compiler could be modified to match the other with negligible changes in compilation time. The Java compiler is written with performance in mind, but dzaima has expended only a moderate amount of effort to optimize it.

+

A few factors other than the speed of the nanopass compiler might partly cause the discrepancy, or otherwise be worth taking into account. I doubt that these can add up to a factor of 20, so I think that nanopass is simply not as fast as more typical imperative compiler methods.