aboutsummaryrefslogtreecommitdiff
path: root/docs/implementation
diff options
context:
space:
mode:
authorMarshall Lochbaum <mwlochbaum@gmail.com>2022-06-03 21:12:39 -0400
committerMarshall Lochbaum <mwlochbaum@gmail.com>2022-06-03 21:12:39 -0400
commitc2f267499984ee478d5d6aa34b30acd1f1fff58e (patch)
treeae354c7cf351c96bff58877162b1aa254e6e23a1 /docs/implementation
parent5c2e7dcc6883f121088b1e6556698a02c4668749 (diff)
New DeWitt clause article listing which databases have it
Diffstat (limited to 'docs/implementation')
-rw-r--r--docs/implementation/kclaims.html2
1 files changed, 1 insertions, 1 deletions
diff --git a/docs/implementation/kclaims.html b/docs/implementation/kclaims.html
index 79731f87..c92318c3 100644
--- a/docs/implementation/kclaims.html
+++ b/docs/implementation/kclaims.html
@@ -12,7 +12,7 @@
<p>This page has now been discussed on <a href="https://news.ycombinator.com/item?id=28365645">Hacker News</a> and <a href="https://lobste.rs/s/d3plgr/wild_claims_about_k_performance">Lobsters</a> (not really the intention… just try not to be <em>too</em> harsh to the next person who says L1 okay?), and I have amended and added information based on comments there and elsewhere.</p>
<h2 id="the-fastest-array-language"><a class="header" href="#the-fastest-array-language">The fastest array language</a></h2>
<p>When you ask what the fastest array language is, chances are someone is there to answer one of k, kdb, or q. I can't offer benchmarks that contradict this, but I will argue that there's little reason to take these people at their word.</p>
-<p>The reason I have no measurements is that every contract for a commercial K includes an anti-benchmark clause. For example, Shakti's <a href="https://shakti.com/download/license">license</a> says users cannot &quot;distribute or otherwise make available to any third party any report regarding the performance of the Software benchmarks or any information from such a report&quot;. This <a href="https://en.wikipedia.org/wiki/DeWitt_Clause">&quot;DeWitt Clause&quot;</a> is common in commercial databases (<a href="https://danluu.com/anon-benchmark/">more context</a>). It takes much more work to refute bad benchmarks than to produce them, so perhaps in the high-stakes DBMS world it's not as crazy as it seems—even though it's a terrible outcome. As I would be unable to share the results, I have not taken benchmarks of any commercial K. Or downloaded one for that matter. So I'm limited to a small number of approved benchmarks that have been published, which focus on performance as a database and not a programming language. I also run tests with <a href="https://codeberg.org/ngn/k">ngn/k</a>, which is developed with goals similar to Whitney's K; the author says it's slower than Shakti but not by too much.</p>
+<p>The reason I have no measurements is that every contract for a commercial K includes an anti-benchmark clause. For example, Shakti's <a href="https://shakti.com/download/license">license</a> says users cannot &quot;distribute or otherwise make available to any third party any report regarding the performance of the Software benchmarks or any information from such a report&quot;. This <a href="https://en.wikipedia.org/wiki/DeWitt_Clause">&quot;DeWitt clause&quot;</a> is common in commercial databases (<a href="https://danluu.com/anon-benchmark/">history</a>; <a href="https://cube.dev/blog/dewitt-clause-or-can-you-benchmark-a-database">survey</a>). It takes much more work to refute bad benchmarks than to produce them, so perhaps in the high-stakes DBMS world it's not as crazy as it seems—even though it's a terrible outcome. As I would be unable to share the results, I have not taken benchmarks of any commercial K. Or downloaded one for that matter. So I'm limited to a small number of approved benchmarks that have been published, which focus on performance as a database and not a programming language. I also run tests with <a href="https://codeberg.org/ngn/k">ngn/k</a>, which is developed with goals similar to Whitney's K; the author says it's slower than Shakti but not by too much.</p>
<p>The primary reason I don't give any credence to claims that K is the best is that they are always devoid of specifics. Most importantly, the same assertion is made across decades even though performance in J, Dyalog, and NumPy has improved by leaps and bounds in the meantime—I participated in advances of <a href="https://www.dyalog.com/dyalog/dyalog-versions/170/performance.htm">26%</a> and <a href="https://www.dyalog.com/dyalog-versions/180/performance.htm">10%</a> in overall Dyalog benchmarks in the last two major versions. Has K4 (the engine behind kdb and Q) kept pace? Maybe it's fallen behind since Arthur left but Shakti K is better? Which other array languages has the poster used? Doesn't matter—<em>they</em> are all the same but <em>K</em> is better.</p>
<p>A related theme I find is equivocating between different kinds of performance. I suspect that for interpreting scalar code K is faster than APL and J but slower than Javascript, and certainly any compiled language. For operations on arrays, maybe it beats Javascript and Java but loses to current Dyalog and tensor frameworks. Simple database queries, Shakti says it's faster than Spark and Postgres but is silent about newer in-memory databases. The most extreme K advocates sweep away all this complexity by comparing K to weaker contenders in each category. Just about any language can be &quot;the best&quot; with this approach.</p>
<p>Before getting into array-based versus scalar code, here's a simpler case. It's well known that K works on one list at a time, that is, if you have a matrix—in K, a list of lists—then applying an operation (say sum) to each row works on each one independently. If the rows are short then there's function overhead for each one. In APL, J, and BQN, the matrix is stored as one unit with a stride. The sum can use one metadata computation for all rows, and there's usually special code for many row-wise functions. I measured that Dyalog is 30 times faster than ngn/k to sum rows of a ten-million by three float (double) matrix, for one fairly representative example. It's fine to say—as many K-ers do—that these cases don't matter or can be avoided in practice; it's dishonest (or ignorant) to claim they don't exist.</p>