From 9b005a4c978c582b362f7fb8e6b086e1b62b8e4f Mon Sep 17 00:00:00 2001
From: Marshall Lochbaum <mwlochbaum@gmail.com>
Date: Fri, 22 Jul 2022 10:55:36 -0400
Subject: Can't find a reliable attribution of the L1 cache claim to Whitney

---
 docs/implementation/kclaims.html | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'docs/implementation')
diff --git a/docs/implementation/kclaims.html b/docs/implementation/kclaims.html
index 04612d3e..05e1dbe3 100644
--- a/docs/implementation/kclaims.html
+++ b/docs/implementation/kclaims.html
@@ -26,7 +26,7 @@
 <h2 id="parallel-execution"><a class="header" href="#parallel-execution">Parallel execution</a></h2>
 <p>As of 2020, Q supports <a href="https://code.kx.com/q/kb/mt-primitives/">multithreaded primitives</a> that can run on multiple CPU cores. I think Shakti supports multi-threading as well. Oddly enough, J user Monument AI has also been working on their own parallel <a href="https://www.monument.ai/m/parallel">J engine</a>. So array languages are finally moving to multiple cores (the reason this hasn't happened sooner is probably that array language users often have workloads where they can run one instance on each core, which is easier and tends to be faster than splitting one run across multiple cores). It's interesting, and a potential reason to use K or Q, although it's too recent to be part of the &quot;K is fastest&quot; mythos. Not every K claim is a wild one!</p>
 <h2 id="instruction-cache"><a class="header" href="#instruction-cache">Instruction cache</a></h2>
-<p>A more specific claim about K is that the key to its speed is that the interpreter, or some part of it, fits in L1 cache. I know Arthur Whitney himself has said this; I can't find that now but <a href="https://kx.com/blog/what-makes-time-series-database-kdb-so-fast/">here</a>'s some material from KX about the &quot;L1/2 cache&quot;. Maybe this was a relevant factor in the early days of K around 2000—I'm doubtful. In the 2020s it's ridiculous to say that instruction caching matters.</p>
+<p>A more specific claim about K is that the key to its speed is that the interpreter, or some part of it, fits in L1 cache. This is often attributed to Arthur Whitney, and I also seem to remember reading an interview where he mentioned caching, but I haven't found any publication that backs this up. KX has at least published <a href="https://kx.com/blog/what-makes-time-series-database-kdb-so-fast/">this article</a> that talks about the &quot;L1/2 cache&quot;. Maybe instruction caching was a relevant factor in the early days of K around 2000—I'm doubtful. In the 2020s it's ridiculous to say that it matters.</p>
 <p>Let's clarify terms first. The CPU cache is a set of storage areas that are smaller and faster than RAM; memory is copied there when it's used so it will be faster to access it again later. L1 is the smallest and fastest level. On a typical CPU these days it might consist of 64KB of <em>data</em> cache for memory to be read and written, and 64KB of <em>instruction</em> cache for memory to be executed by the CPU. When I've seen it the L1 cache claim is specifically about the K interpreter (and not the data it works with) fitting in the cache, so it clearly refers to the instruction cache.</p>
 <p>(Unlike the instruction cache, the data cache is a major factor that makes array languages faster. It's what terms like &quot;cache-friendly&quot; typically refer to. I think the reason K users prefer to talk about the instruction cache is that it allows them to link this well-known consideration to the size of the kdb binary, which is easily measured and clearly different from other database products. But <a href="https://matklad.github.io/2021/07/10/its-not-always-icache.html">this great article</a> discusses jumping to blame ICache in Rust, so maybe it's just an explanation that sounds better than it is.)</p>
 <p>A K interpreter will definitely benefit from the instruction cache. Unfortunately, that's where the truth of this claim runs out. Any other interpreter you use will get just about the same benefit, because the most used code will fit in the cache with plenty of room to spare. And the best case you get from a fast core interpreter loop is fast handling of scalar code—exactly the case that array languages typically ignore.</p>
-- 
cgit v1.2.3