aboutsummaryrefslogtreecommitdiff
path: root/docs/implementation
diff options
context:
space:
mode:
authorMarshall Lochbaum <mwlochbaum@gmail.com>2022-06-27 22:00:55 -0400
committerMarshall Lochbaum <mwlochbaum@gmail.com>2022-06-27 22:00:55 -0400
commit8389e763344637c01d0d7161091e5f2cd9b14251 (patch)
tree5406113f6711c6b1222650228cba453c2e4641fa /docs/implementation
parentb6185d5029e2adcc721c0cc2097f591d9a09f135 (diff)
Yet still more editing
Diffstat (limited to 'docs/implementation')
-rw-r--r--docs/implementation/compile/dynamic.html2
-rw-r--r--docs/implementation/kclaims.html2
2 files changed, 2 insertions, 2 deletions
diff --git a/docs/implementation/compile/dynamic.html b/docs/implementation/compile/dynamic.html
index 62c8ec2b..30a84d38 100644
--- a/docs/implementation/compile/dynamic.html
+++ b/docs/implementation/compile/dynamic.html
@@ -51,7 +51,7 @@
<p>A simple and widely-used strategy to reduce slowdown due to dynamic compilation is to compile blocks in a separate thread from the one that runs them. The new code needs to be added in a thread-safe manner, which is not hard as the set of optimized implementations is a small lookup table of some sort with only one writer.</p>
<p>If the implementation is able to make use of all available threads (possible when working with large arrays), then it's still important to minimize compilation time as that thread could be put to better use. If there are idle threads then the only costs of compilation overhead are minor: the optimized code can't be put to use as quickly, and there is more power draw and possible downclocking.</p>
<h3 id="anticipation"><a class="header" href="#anticipation">Anticipation</a></h3>
-<p>The <a href="#hot-paths">hot path</a> strategy depends on targetting code for optimization based on history. Anticipation would identify in advance what code will take longer to run, and allocate a fraction of the time taken for optimizing that code. This is most useful for code that runs a small number of times on large arrays. An example where anticipation would be very important is for a programmer trying experimental one-off queries on a large dataset.</p>
+<p>The <a href="#hot-paths">hot path</a> strategy depends on targeting code for optimization based on history. Anticipation would identify in advance what code will take longer to run, and allocate a fraction of the time taken for optimizing that code. This is most useful for code that runs a small number of times on large arrays. An example where anticipation would be very important is for a programmer trying experimental one-off queries on a large dataset.</p>
<p>The end result seems similar to that obtained by thunks as discussed at Dyalog '18 (<a href="https://dyalog.tv/Dyalog18/?v=-6no6N3i9Tg">video</a>, <a href="https://www.dyalog.com/user-meetings/uploads/conference/dyalog18/presentations/D15_The_Interpretive_Advantage.zip">slides</a>). A thunk runs as part of a primitive, detecting that computing the result will be slow and outputting a deferred computation instead. Anticipation is more powerful because it can scan ahead in the bytecode instead of deciding as primitives are called whether or not to expand the thunk.</p>
<p>Anticipation attempts to improve program speed while bounding the added overhead. For example, it might be constrained to add no more than 5% to the time to first program output, relative to base-level interpretation. The idea is to exit normal interpretation as soon as a large enough lower bound is established on this time, for example if an operation would create a large array. At this point it begins analysis, which will involve at least some shape propagation and probably increase the lower bound and optimization budget.</p>
<p>Optimization level can be gated based on the ratio of expected time to code length (which presumably controls cost of optimization). But optimization doesn't need to be performed all at once: upcoming code should be run as soon as it can be optimized at an appropriate level, in order to have more information available for later operations. Optimization might include primitive combinations or intermediate data formats, so it's important to check how the results will be used before running expensive code.</p>
diff --git a/docs/implementation/kclaims.html b/docs/implementation/kclaims.html
index 0eb0e756..0bc4c005 100644
--- a/docs/implementation/kclaims.html
+++ b/docs/implementation/kclaims.html
@@ -22,7 +22,7 @@
<p>Popular APL and J implementations interpret source code directly, without even building an AST. This is very slow, and Dyalog has several other pathologies that get in the way as well. Like storing the execution stack in the workspace to prevent stack overflows, and the requirement that a user can save a workspace with paused code and resume it <em>in a later version</em>. But the overhead is per token executed, and a programmer can avoid the cost by working on large arrays where one token does a whole lot of work. If you want to show a language is faster than APL generally, this is the kind of code to look at.</p>
<p>K's design is well-suited to interpreting scalar code because of its simplicity. It has only one kind of user-defined function and doesn't allow lexical closures. Implementations always compile to bytecode, which for example Q's <a href="https://code.kx.com/q/ref/value/">value</a> function shows. Having to keep track of integers versus floats is a drag, but ngn/k is able to use <a href="https://en.wikipedia.org/wiki/Tagged_pointer">tagged pointers</a> to store smaller integers without an allocation, and I doubt Whitney would miss a trick like that. So K interpreters can be fast.</p>
<p>But K still isn't good at scalar code! It's an interpreter (if a good one) for a dynamically-typed language, and will be slower than compiled languages like C and Go, or JIT-compiled ones like Javascript and Java. A compiler generates code to do what you want, while an interpreter (including a bytecode VM) is code that reads data (the program) to do what you want. Once the code is compiled, the interpreter has an extra step and <em>has</em> to be slower.</p>
-<p>This is why BQN uses compiler-based strategies to speed up execution, first compiling to <a href="vm.html#bytecode">object code</a> and then usually further processing it (compilation is fast enough that it's perfectly fine to compile code every time it's run). In particular, CBQN can compile to x86 to get rid of dispatching overhead. And ktye's somewhat obscure K implementation now has <a href="https://github.com/ktye/i/tree/master/kom">an ahead-of-time compiler</a> targetting C, which is great news. Commercial K and Q are always described by developers as interpreters, not compilers, and if they do anything like this then they have kept very quiet about it.</p>
+<p>This is why BQN uses compiler-based strategies to speed up execution, first compiling to <a href="vm.html#bytecode">object code</a> and then usually further processing it (compilation is fast enough that it's perfectly fine to compile code every time it's run). In particular, CBQN can compile to x86 to get rid of dispatching overhead. And ktye's somewhat obscure K implementation now has <a href="https://github.com/ktye/i/tree/master/kom">an ahead-of-time compiler</a> targeting C, which is great news. Commercial K and Q are always described by developers as interpreters, not compilers, and if they do anything like this then they have kept very quiet about it.</p>
<h2 id="parallel-execution"><a class="header" href="#parallel-execution">Parallel execution</a></h2>
<p>As of 2020, Q supports <a href="https://code.kx.com/q/kb/mt-primitives/">multithreaded primitives</a> that can run on multiple CPU cores. I think Shakti supports multi-threading as well. Oddly enough, J user Monument AI has also been working on their own parallel <a href="https://www.monument.ai/m/parallel">J engine</a>. So array languages are finally moving to multiple cores (the reason this hasn't happened sooner is probably that array language users often have workloads where they can run one instance on each core, which is easier and tends to be faster than splitting one run across multiple cores). It's interesting, and a potential reason to use K or Q, although it's too recent to be part of the &quot;K is fastest&quot; mythos. Not every K claim is a wild one!</p>
<h2 id="instruction-cache"><a class="header" href="#instruction-cache">Instruction cache</a></h2>