Links and further editing

author: Marshall Lochbaum <mwlochbaum@gmail.com> 2021-08-31 21:36:12 -0400
committer: Marshall Lochbaum <mwlochbaum@gmail.com> 2021-08-31 21:36:12 -0400
commit: 651003ec166e6288328857153fc8ccb1c5145d3f (patch)
tree: 9979e10a13dd7227fdf78487ce0667dcde5aa991 /implementation/kclaims.md
parent: 10c7a9a7009e3738312fcf24d572a4f31b26f599 (diff)
1 files changed, 10 insertions, 6 deletions
diff --git a/implementation/kclaims.md b/implementation/kclaims.md
index 1eee6966..98c22803 100644
--- a/implementation/kclaims.md
+++ b/implementation/kclaims.md
@@ -10,13 +10,13 @@ The points I argue here are narrow. To some extent I'm picking out the craziest
 
 On that note, it's possible *I've* made mistakes, such as incorrectly designing or interpreting benchmarks. If you present me with concrete evidence against something I wrote below, I promise I'll revise this page to include it, even if I just have to quote verbatim because I don't understand a word of it.
 
-This page has now been [discussed](https://news.ycombinator.com/item?id=28365645) on Hacker News, and I have amended and added information based on comments there and elsewhere.
+This page has now been discussed on [Hacker News](https://news.ycombinator.com/item?id=28365645) and [Lobsters](https://lobste.rs/s/d3plgr/wild_claims_about_k_performance) (not really the intention… just try not to be *too* harsh to the next person who says L1 okay?), and I have amended and added information based on comments there and elsewhere.
 
 ## The fastest array language
 
 When you ask what the fastest array language is, chances are someone is there to answer one of k, kdb, or q. I can't offer benchmarks that contradict this, but I will argue that there's little reason to take these people at their word.
 
-The reason I have no measurements is that every contract for a commercial K includes an anti-benchmark clause. For example, Shakti's [license](https://shakti.com/download/license) says users cannot "distribute or otherwise make available to any third party any report regarding the performance of the Software benchmarks or any information from such a report". This ["DeWitt Clause"](https://en.wikipedia.org/wiki/DeWitt_Clause) is common in many databases. It takes much more work to refute bad benchmarks than to produce them so perhaps in the high-stakes DBMS world it's not as crazy as it seems—even though it's a terrible outcome. As I would be unable to share the results, I have not taken benchmarks of any commercial K. Or downloaded one for that matter. So I'm limited to a small number of approved benchmarks that have been published, which focus on performance as a database and not a programming language. I also run tests with [ngn/k](https://codeberg.org/ngn/k), which is developed with goals similar to Whitney's K; the author says it's slower than Shakti but not by too much.
+The reason I have no measurements is that every contract for a commercial K includes an anti-benchmark clause. For example, Shakti's [license](https://shakti.com/download/license) says users cannot "distribute or otherwise make available to any third party any report regarding the performance of the Software benchmarks or any information from such a report". This ["DeWitt Clause"](https://en.wikipedia.org/wiki/DeWitt_Clause) is common in commercial databases ([more context](https://danluu.com/anon-benchmark/)). It takes much more work to refute bad benchmarks than to produce them, so perhaps in the high-stakes DBMS world it's not as crazy as it seems—even though it's a terrible outcome. As I would be unable to share the results, I have not taken benchmarks of any commercial K. Or downloaded one for that matter. So I'm limited to a small number of approved benchmarks that have been published, which focus on performance as a database and not a programming language. I also run tests with [ngn/k](https://codeberg.org/ngn/k), which is developed with goals similar to Whitney's K; the author says it's slower than Shakti but not by too much.
 
 The primary reason I don't give any credence to claims that K is the best is that they are always devoid of specifics. Most importantly, the same assertion is made across decades even though performance in J, Dyalog, and NumPy has improved by leaps and bounds in the meantime—I participated in advances of [26%](https://www.dyalog.com/dyalog/dyalog-versions/170/performance.htm) and [10%](https://www.dyalog.com/dyalog-versions/180/performance.htm) in overall Dyalog benchmarks in the last two major versions. Has K4 (the engine behind kdb and Q) kept pace? Maybe it's fallen behind since Arthur left but Shakti K is better? Which other array languages has the poster used? Doesn't matter—*they* are all the same but *K* is better.
 
@@ -28,11 +28,15 @@ Before getting into array-based versus scalar code, here's a simpler case. It's
 
 I have a suspicion that users sometimes think K is faster than APL because they try out a Fibonacci function or other one-number-at-a-time code. Erm, your boat turns faster than a battleship, congratulations? *Python* beats these languages at interpreted performance. By like a factor of five. The only reason for anyone to think this is relevant is if they have a one-dimensional model where J is "better" than Python, so K is "better" than both.
 
+*That's CPython of course. Language names here refer to the commonly-used implementations, such as V8 or SpiderMonkey for Javascript.*
+
 Popular APL and J implementations interpret source code directly, without even building an AST. This is very slow, and Dyalog has several other pathologies that get in the way as well. Like storing the execution stack in the workspace to prevent stack overflows, and the requirement that a user can save a workspace with paused code and resume it *in a later version*. But the overhead is per token executed, and a programmer can avoid the cost by working on large arrays where one token does a whole lot of work. If you want to show a language is faster than APL generally, this is the kind of code to look at.
 
-K's design is well-suited to interpreting scalar code because of its simplicity: for example, it has only one kind of user-defined function and doesn't allow lexical closures. Its [grammar](https://k.miraheze.org/wiki/Grammar) is much simpler than [BQN's](../spec/grammar.md), although [it seems](https://news.ycombinator.com/item?id=6118565) that K uses bytecode compilation (ngn/k certainly does), making this irrelevant after one quick compilation pass. As a result, K interpreters can be very fast.
+K's design is well-suited to interpreting scalar code because of its simplicity. It has only one kind of user-defined function and doesn't allow lexical closures. Implementations always compile to bytecode, which for example Q's [value](https://code.kx.com/q/ref/value/) function shows. Having to keep track of integers versus floats is a drag, but ngn/k is able to use [tagged pointers](https://en.wikipedia.org/wiki/Tagged_pointer) to store smaller integers without an allocation, and I doubt Whitney would miss a trick like that. So K interpreters can be fast.
+
+But K still isn't good at scalar code! It's an interpreter (if a good one) for a dynamically-typed language, and will be slower than compiled languages like C and Go, or JIT-compiled ones like Javascript and Java. A compiler generates code to do what you want, while an interpreter (including a bytecode VM) is code that reads data (the program) to do what you want. Once the code is compiled, the interpreter has an extra step and *has* to be slower.
 
-But K still isn't good at scalar code! It's an interpreter (if a good one) for a dynamically-typed language, and will be slower than compiled languages like C and Go, or JIT-compiled ones like Javascript and Java. A compiler generates code to do what you want, while an interpreter (including a bytecode VM) is code that reads data (the program) to do what you want. Once the code is compiled, the interpreter has an extra step and *has* to be slower. This is why BQN uses compiler-based strategies to speed up execution, first compiling to bytecode and then usually post-processing that bytecode. Compilation is fast enough that it's perfectly fine to compile code every time it's run.
+This is why BQN uses compiler-based strategies to speed up execution, first compiling to [object code](vm.md#bytecode) and then usually further processing it (compilation is fast enough that it's perfectly fine to compile code every time it's run). In particular, CBQN can compile to x86 to get rid of dispatching overhead. K and Q are always described by developers as interpreters, not compilers, and if they do anything like this then they have kept very quiet about it.
 
 ## Parallel execution
 
@@ -44,7 +48,7 @@ A more specific claim about K is that the key to its speed is that the interpret
 
 Let's clarify terms first. The CPU cache is a set of storage areas that are smaller and faster than RAM; memory is copied there when it's used so it will be faster to access it again later. L1 is the smallest and fastest level. On a typical CPU these days it might consist of 64KB of *data* cache for memory to be read and written, and 64KB of *instruction* cache for memory to be executed by the CPU. When I've seen it the L1 cache claim is specifically about the K interpreter (and not the data it works with) fitting in the cache, so it clearly refers to the instruction cache.
 
-(Unlike the instruction cache, the data cache is a major factor that makes array languages faster. It's what terms like "cache-friendly" typically refer to. I think the reason KX prefers to talk about the instruction cache is that it allows them to link this well-known consideration to the size of the kdb binary, which is easily measured and clearly different from other products. Anyone can claim to use cache-friendly algorithms.)
+(Unlike the instruction cache, the data cache is a major factor that makes array languages faster. It's what terms like "cache-friendly" typically refer to. I think the reason K users prefer to talk about the instruction cache is that it allows them to link this well-known consideration to the size of the kdb binary, which is easily measured and clearly different from other database products. But [this great article](https://matklad.github.io/2021/07/10/its-not-always-icache.html) discusses jumping to blame ICache in Rust, so maybe it's just an explanation that sounds better than it is.)
 
 A K interpreter will definitely benefit from the instruction cache. Unfortunately, that's where the truth of this claim runs out. Any other interpreter you use will get just about the same benefit, because the most used code will fit in the cache with plenty of room to spare. And the best case you get from a fast core interpreter loop is fast handling of scalar code—exactly the case that array languages typically ignore.
 
@@ -75,7 +79,7 @@ That's just the whole cost (in cycles) of L1 misses, exactly what we want! First
 
            0.557255985 seconds time elapsed
 
-Here's the BQN call that builds [CBQN](https://github.com/dzaima/CBQN)'s bytecode sources:
+Here's the BQN call that builds [CBQN](https://github.com/dzaima/CBQN)'s object code sources:
 
      Performance counter stats for './genRuntime /home/marshall/BQN/':
author	Marshall Lochbaum <mwlochbaum@gmail.com>	2021-08-31 21:36:12 -0400
committer	Marshall Lochbaum <mwlochbaum@gmail.com>	2021-08-31 21:36:12 -0400
commit	651003ec166e6288328857153fc8ccb1c5145d3f (patch)
tree	9979e10a13dd7227fdf78487ce0667dcde5aa991 /implementation/kclaims.md
parent	10c7a9a7009e3738312fcf24d572a4f31b26f599 (diff)