Link to ktye's K compiler

author: Marshall Lochbaum <mwlochbaum@gmail.com> 2022-03-10 21:41:32 -0500
committer: Marshall Lochbaum <mwlochbaum@gmail.com> 2022-03-10 21:41:32 -0500
commit: c66d51a9515a887e1ea8a35e6ebd16109ceaf7dd (patch)
tree: c757660446552034a3e218ba43a8ab8b38418d4d /docs/implementation
parent: 0539dbf1c8ed11e32f2a111c5d6da928c0b61f9f (diff)
1 files changed, 1 insertions, 1 deletions
diff --git a/docs/implementation/kclaims.html b/docs/implementation/kclaims.html
index d99d8ea4..8627b228 100644
--- a/docs/implementation/kclaims.html
+++ b/docs/implementation/kclaims.html
@@ -22,7 +22,7 @@
 <p>Popular APL and J implementations interpret source code directly, without even building an AST. This is very slow, and Dyalog has several other pathologies that get in the way as well. Like storing the execution stack in the workspace to prevent stack overflows, and the requirement that a user can save a workspace with paused code and resume it <em>in a later version</em>. But the overhead is per token executed, and a programmer can avoid the cost by working on large arrays where one token does a whole lot of work. If you want to show a language is faster than APL generally, this is the kind of code to look at.</p>
 <p>K's design is well-suited to interpreting scalar code because of its simplicity. It has only one kind of user-defined function and doesn't allow lexical closures. Implementations always compile to bytecode, which for example Q's <a href="https://code.kx.com/q/ref/value/">value</a> function shows. Having to keep track of integers versus floats is a drag, but ngn/k is able to use <a href="https://en.wikipedia.org/wiki/Tagged_pointer">tagged pointers</a> to store smaller integers without an allocation, and I doubt Whitney would miss a trick like that. So K interpreters can be fast.</p>
 <p>But K still isn't good at scalar code! It's an interpreter (if a good one) for a dynamically-typed language, and will be slower than compiled languages like C and Go, or JIT-compiled ones like Javascript and Java. A compiler generates code to do what you want, while an interpreter (including a bytecode VM) is code that reads data (the program) to do what you want. Once the code is compiled, the interpreter has an extra step and <em>has</em> to be slower.</p>
-<p>This is why BQN uses compiler-based strategies to speed up execution, first compiling to <a href="vm.html#bytecode">object code</a> and then usually further processing it (compilation is fast enough that it's perfectly fine to compile code every time it's run). In particular, CBQN can compile to x86 to get rid of dispatching overhead. K and Q are always described by developers as interpreters, not compilers, and if they do anything like this then they have kept very quiet about it.</p>
+<p>This is why BQN uses compiler-based strategies to speed up execution, first compiling to <a href="vm.html#bytecode">object code</a> and then usually further processing it (compilation is fast enough that it's perfectly fine to compile code every time it's run). In particular, CBQN can compile to x86 to get rid of dispatching overhead. And ktye's somewhat obscure K implementation now has <a href="https://github.com/ktye/i/tree/master/kom">an ahead-of-time compiler</a> targetting C, which is great news. Commercial K and Q are always described by developers as interpreters, not compilers, and if they do anything like this then they have kept very quiet about it.</p>
 <h2 id="parallel-execution"><a class="header" href="#parallel-execution">Parallel execution</a></h2>
 <p>As of 2020, Q supports <a href="https://code.kx.com/q/kb/mt-primitives/">multithreaded primitives</a> that can run on multiple CPU cores. I think Shakti supports multi-threading as well. Oddly enough, J user Monument AI has also been working on their own parallel <a href="https://www.monument.ai/m/parallel">J engine</a>. So array languages are finally moving to multiple cores (the reason this hasn't happened sooner is probably that array language users often have workloads where they can run one instance on each core, which is easier and tends to be faster than splitting one run across multiple cores). It's interesting, and a potential reason to use K or Q, although it's too recent to be part of the &quot;K is fastest&quot; mythos. Not every K claim is a wild one!</p>
 <h2 id="instruction-cache"><a class="header" href="#instruction-cache">Instruction cache</a></h2>
author	Marshall Lochbaum <mwlochbaum@gmail.com>	2022-03-10 21:41:32 -0500
committer	Marshall Lochbaum <mwlochbaum@gmail.com>	2022-03-10 21:41:32 -0500
commit	c66d51a9515a887e1ea8a35e6ebd16109ceaf7dd (patch)
tree	c757660446552034a3e218ba43a8ab8b38418d4d /docs/implementation
parent	0539dbf1c8ed11e32f2a111c5d6da928c0b61f9f (diff)