From 3dcd003bea2356e15cafc267d775eaa188bf8f46 Mon Sep 17 00:00:00 2001
From: Marshall Lochbaum <mwlochbaum@gmail.com>
Date: Wed, 16 Mar 2022 10:19:45 -0400
Subject: Include a perf measurement of the markdown generator

---
 docs/implementation/kclaims.html | 36 ++++++++++++++++++++++++------------
 implementation/kclaims.md        | 35 +++++++++++++++++++++++------------
 2 files changed, 47 insertions(+), 24 deletions(-)
diff --git a/docs/implementation/kclaims.html b/docs/implementation/kclaims.html
index 8627b228..837ac1c2 100644
--- a/docs/implementation/kclaims.html
+++ b/docs/implementation/kclaims.html
@@ -51,16 +51,26 @@
 
        <span class='Number'>0.557255985</span> <span class='Value'>seconds</span> <span class='Value'>time</span> <span class='Value'>elapsed</span>
 </pre>
-<p>Here's the BQN call that builds <a href="https://github.com/dzaima/CBQN">CBQN</a>'s object code sources:</p>
+<p>Here are the BQN calls that build <a href="https://github.com/dzaima/CBQN">CBQN</a>'s object code sources, and this website:</p>
 <pre> <span class='Function'>Performance</span> <span class='Value'>counter</span> <span class='Value'>stats</span> <span class='Value'>for</span> <span class='String'>'</span><span class='Value'>.</span><span class='Function'>/</span><span class='Value'>genRuntime</span> <span class='Function'>/</span><span class='Value'>home</span><span class='Function'>/</span><span class='Value'>marshall</span><span class='Function'>/BQN/</span><span class='String'>'</span><span class='Head'>:</span>
 
-       <span class='Number'>241</span><span class='Separator'>,</span><span class='Number'>224</span><span class='Separator'>,</span><span class='Number'>322</span>      <span class='Value'>cycles</span><span class='Head'>:</span><span class='Value'>u</span>
-         <span class='Number'>5</span><span class='Separator'>,</span><span class='Number'>452</span><span class='Separator'>,</span><span class='Number'>372</span>      <span class='Value'>icache_16b.ifdata_stall</span><span class='Head'>:</span><span class='Value'>u</span>
-           <span class='Number'>829</span><span class='Separator'>,</span><span class='Number'>146</span>      <span class='Value'>cache</span><span class='Function'>-</span><span class='Value'>misses</span><span class='Head'>:</span><span class='Value'>u</span>
-         <span class='Number'>6</span><span class='Separator'>,</span><span class='Number'>954</span><span class='Separator'>,</span><span class='Number'>143</span>      <span class='Function'>L1-</span><span class='Value'>dcache</span><span class='Function'>-</span><span class='Value'>load</span><span class='Function'>-</span><span class='Value'>misses</span><span class='Head'>:</span><span class='Value'>u</span>
-         <span class='Number'>1</span><span class='Separator'>,</span><span class='Number'>291</span><span class='Separator'>,</span><span class='Number'>804</span>      <span class='Function'>L1-</span><span class='Value'>icache</span><span class='Function'>-</span><span class='Value'>load</span><span class='Function'>-</span><span class='Value'>misses</span><span class='Head'>:</span><span class='Value'>u</span>
+       <span class='Number'>232</span><span class='Separator'>,</span><span class='Number'>456</span><span class='Separator'>,</span><span class='Number'>331</span>      <span class='Value'>cycles</span><span class='Head'>:</span><span class='Value'>u</span>
+         <span class='Number'>4</span><span class='Separator'>,</span><span class='Number'>482</span><span class='Separator'>,</span><span class='Number'>531</span>      <span class='Value'>icache_16b.ifdata_stall</span><span class='Head'>:</span><span class='Value'>u</span>
+           <span class='Number'>707</span><span class='Separator'>,</span><span class='Number'>909</span>      <span class='Value'>cache</span><span class='Function'>-</span><span class='Value'>misses</span><span class='Head'>:</span><span class='Value'>u</span>
+         <span class='Number'>5</span><span class='Separator'>,</span><span class='Number'>058</span><span class='Separator'>,</span><span class='Number'>125</span>      <span class='Function'>L1-</span><span class='Value'>dcache</span><span class='Function'>-</span><span class='Value'>load</span><span class='Function'>-</span><span class='Value'>misses</span><span class='Head'>:</span><span class='Value'>u</span>
+         <span class='Number'>1</span><span class='Separator'>,</span><span class='Number'>315</span><span class='Separator'>,</span><span class='Number'>281</span>      <span class='Function'>L1-</span><span class='Value'>icache</span><span class='Function'>-</span><span class='Value'>load</span><span class='Function'>-</span><span class='Value'>misses</span><span class='Head'>:</span><span class='Value'>u</span>
 
-       <span class='Number'>0.098228740</span> <span class='Value'>seconds</span> <span class='Value'>time</span> <span class='Value'>elapsed</span>
+       <span class='Number'>0.103811282</span> <span class='Value'>seconds</span> <span class='Value'>time</span> <span class='Value'>elapsed</span>
+
+ <span class='Function'>Performance</span> <span class='Value'>counter</span> <span class='Value'>stats</span> <span class='Value'>for</span> <span class='String'>'</span><span class='Value'>.</span><span class='Function'>/</span><span class='Value'>gendocs</span><span class='String'>'</span><span class='Head'>:</span>
+
+     <span class='Number'>5</span><span class='Separator'>,</span><span class='Number'>633</span><span class='Separator'>,</span><span class='Number'>327</span><span class='Separator'>,</span><span class='Number'>936</span>      <span class='Value'>cycles</span><span class='Head'>:</span><span class='Value'>u</span>
+       <span class='Number'>494</span><span class='Separator'>,</span><span class='Number'>293</span><span class='Separator'>,</span><span class='Number'>472</span>      <span class='Value'>icache_16b.ifdata_stall</span><span class='Head'>:</span><span class='Value'>u</span>
+         <span class='Number'>8</span><span class='Separator'>,</span><span class='Number'>755</span><span class='Separator'>,</span><span class='Number'>069</span>      <span class='Value'>cache</span><span class='Function'>-</span><span class='Value'>misses</span><span class='Head'>:</span><span class='Value'>u</span>
+        <span class='Number'>37</span><span class='Separator'>,</span><span class='Number'>565</span><span class='Separator'>,</span><span class='Number'>924</span>      <span class='Function'>L1-</span><span class='Value'>dcache</span><span class='Function'>-</span><span class='Value'>load</span><span class='Function'>-</span><span class='Value'>misses</span><span class='Head'>:</span><span class='Value'>u</span>
+       <span class='Number'>265</span><span class='Separator'>,</span><span class='Number'>985</span><span class='Separator'>,</span><span class='Number'>526</span>      <span class='Function'>L1-</span><span class='Value'>icache</span><span class='Function'>-</span><span class='Value'>load</span><span class='Function'>-</span><span class='Value'>misses</span><span class='Head'>:</span><span class='Value'>u</span>
+
+       <span class='Number'>2.138414849</span> <span class='Value'>seconds</span> <span class='Value'>time</span> <span class='Value'>elapsed</span>
 </pre>
 <p>And the Python-based font tool I use to build <a href="https://mlochbaum.github.io/BQN/fonts.html">font samples</a> for this site:</p>
 <pre> <span class='Function'>Performance</span> <span class='Value'>counter</span> <span class='Value'>stats</span> <span class='Value'>for</span> <span class='String'>'</span><span class='Value'>pyftsubset</span> <span class='Value'>[…more</span> <span class='Value'>stuff]</span><span class='String'>'</span><span class='Head'>:</span>
@@ -74,15 +84,17 @@
        <span class='Number'>0.215698059</span> <span class='Value'>seconds</span> <span class='Value'>time</span> <span class='Value'>elapsed</span>
 </pre>
 <p>Dividing the stall number by total cycles gives us percentage of program time that can be attributed to L1 instruction misses.</p>
-<a class="replLink" title="Open in the REPL" target="_blank" href="https://mlochbaum.github.io/BQN/try.html#code=Ikoi4oC/IkJRTiLigL8iUHl0aG9uIiDiiY3LmCAxMDAgw5cgNTbigL81LjTigL8yNSDDtyAxXzQ1N+KAvzI0MeKAvzQ5OQ==">↗️</a><pre>    <span class='String'>&quot;J&quot;</span><span class='Ligature'>‿</span><span class='String'>&quot;BQN&quot;</span><span class='Ligature'>‿</span><span class='String'>&quot;Python&quot;</span> <span class='Function'>≍</span><span class='Modifier'>˘</span> <span class='Number'>100</span> <span class='Function'>×</span> <span class='Number'>56</span><span class='Ligature'>‿</span><span class='Number'>5.4</span><span class='Ligature'>‿</span><span class='Number'>25</span> <span class='Function'>÷</span> <span class='Number'>1_457</span><span class='Ligature'>‿</span><span class='Number'>241</span><span class='Ligature'>‿</span><span class='Number'>499</span>
+<a class="replLink" title="Open in the REPL" target="_blank" href="https://mlochbaum.github.io/BQN/try.html#code=bCDihpAgIkoi4oC/IkJRTiLigL8iQlFOIuKAvyJQeXRob24iCmwg4omNy5ggMTAwIMOXIDU24oC/NC414oC/NDk04oC/MjUgw7cgMV80NTfigL8yMzLigL81XzYzM+KAvzQ5OQ==">↗️</a><pre>    <span class='Value'>l</span> <span class='Gets'>←</span> <span class='String'>&quot;J&quot;</span><span class='Ligature'>‿</span><span class='String'>&quot;BQN&quot;</span><span class='Ligature'>‿</span><span class='String'>&quot;BQN&quot;</span><span class='Ligature'>‿</span><span class='String'>&quot;Python&quot;</span>
+    <span class='Value'>l</span> <span class='Function'>≍</span><span class='Modifier'>˘</span> <span class='Number'>100</span> <span class='Function'>×</span> <span class='Number'>56</span><span class='Ligature'>‿</span><span class='Number'>4.5</span><span class='Ligature'>‿</span><span class='Number'>494</span><span class='Ligature'>‿</span><span class='Number'>25</span> <span class='Function'>÷</span> <span class='Number'>1_457</span><span class='Ligature'>‿</span><span class='Number'>232</span><span class='Ligature'>‿</span><span class='Number'>5_633</span><span class='Ligature'>‿</span><span class='Number'>499</span>
 ┌─                            
 ╵ "J"      3.843514070006863  
-  "BQN"    2.240663900414938  
+  "BQN"    1.939655172413793  
+  "BQN"    8.76974968933073   
   "Python" 5.01002004008016   
                              ┘
 </pre>
-<p>So, roughly 4%, 2%, and 5%. The cache miss counts are also broadly in line with these numbers. Note that full cache misses are pretty rare, so that most misses just hit L2 or L3 and don't suffer a large penalty. Also note that instruction cache misses are mostly lower than data misses, as expected.</p>
-<p>Don't get me wrong, I'd love to improve performance even by 2%. But it's not exactly world domination, is it? And it doesn't matter how cache-friendly K is, that's the absolute limit.</p>
+<p>So, roughly 4%, 2 to 9%, and 5%. The cache miss counts are also broadly in line with these numbers. Note that full cache misses are pretty rare, so that most misses just hit L2 or L3 and don't suffer a large penalty. Also note that instruction cache misses are mostly lower than data misses, as expected.</p>
+<p>Don't get me wrong, I'd love to improve performance even by 2%. But it's not exactly world domination, is it? The perf results are an upper bound for how much these programs could be sped up with better treatment of the instruction cache. If K is faster by more than that, it's because of other optimizations.</p>
 <p>For comparison, here's <a href="https://codeberg.org/ngn/k">ngn/k</a> (which does aim for a small executable) running one of its unit tests—test 19 in the a20/ folder, chosen because it's the longest-running of those tests.</p>
 <pre> <span class='Function'>Performance</span> <span class='Value'>counter</span> <span class='Value'>stats</span> <span class='Value'>for</span> <span class='String'>'</span><span class='Value'>..</span><span class='Function'>/</span><span class='Value'>k</span> <span class='Number'>19</span><span class='Value'>.k</span><span class='String'>'</span><span class='Head'>:</span>
 
@@ -94,4 +106,4 @@
 
        <span class='Number'>1.245378356</span> <span class='Value'>seconds</span> <span class='Value'>time</span> <span class='Value'>elapsed</span>
 </pre>
-<p>The stalls are less than 1% here, so maybe the smaller executable is paying off in some way. I can't be sure, because the programs being run are very different: <code><span class='Number'>19</span><span class='Value'>.k</span></code> is 10 lines while the others are hundreds of lines long. But I don't have a longer K program handy to test with (and you could always argue the result doesn't apply to Whitney's K anyway). Again, it doesn't matter much: the point is that the absolute most the other interpreters could gain from being more L1-friendly is about 5% on those fairly representative programs.</p>
+<p>The stalls are less than 1% here, so maybe the smaller executable is paying off in some way. I can't be sure, because the programs being run are very different: <code><span class='Number'>19</span><span class='Value'>.k</span></code> is 10 lines while the others are hundreds of lines long. But I don't have a longer K program handy to test with (and you could always argue the result doesn't apply to Whitney's K anyway). Again, it doesn't matter much: the point is that the absolute most the other interpreters could gain from being more L1-friendly is about 10% on those fairly representative programs.</p>
diff --git a/implementation/kclaims.md b/implementation/kclaims.md
index 5aa085ad..c8a2debd 100644
--- a/implementation/kclaims.md
+++ b/implementation/kclaims.md
@@ -79,17 +79,27 @@ That's just the whole cost (in cycles) of L1 misses, exactly what we want! First
 
            0.557255985 seconds time elapsed
 
-Here's the BQN call that builds [CBQN](https://github.com/dzaima/CBQN)'s object code sources:
+Here are the BQN calls that build [CBQN](https://github.com/dzaima/CBQN)'s object code sources, and this website:
 
      Performance counter stats for './genRuntime /home/marshall/BQN/':
 
-           241,224,322      cycles:u
-             5,452,372      icache_16b.ifdata_stall:u
-               829,146      cache-misses:u
-             6,954,143      L1-dcache-load-misses:u
-             1,291,804      L1-icache-load-misses:u
+           232,456,331      cycles:u
+             4,482,531      icache_16b.ifdata_stall:u
+               707,909      cache-misses:u
+             5,058,125      L1-dcache-load-misses:u
+             1,315,281      L1-icache-load-misses:u
 
-           0.098228740 seconds time elapsed
+           0.103811282 seconds time elapsed
+
+     Performance counter stats for './gendocs':
+
+         5,633,327,936      cycles:u
+           494,293,472      icache_16b.ifdata_stall:u
+             8,755,069      cache-misses:u
+            37,565,924      L1-dcache-load-misses:u
+           265,985,526      L1-icache-load-misses:u
+
+           2.138414849 seconds time elapsed
 
 And the Python-based font tool I use to build [font samples](https://mlochbaum.github.io/BQN/fonts.html) for this site:
 
@@ -103,13 +113,14 @@ And the Python-based font tool I use to build [font samples](https://mlochbaum.g
 
            0.215698059 seconds time elapsed
 
-Dividing the stall number by total cycles gives us percentage of program time that can be attributed to L1 instruction misses. 
+Dividing the stall number by total cycles gives us percentage of program time that can be attributed to L1 instruction misses.
 
-        "J"‿"BQN"‿"Python" ≍˘ 100 × 56‿5.4‿25 ÷ 1_457‿241‿499
+        l ← "J"‿"BQN"‿"BQN"‿"Python"
+        l ≍˘ 100 × 56‿4.5‿494‿25 ÷ 1_457‿232‿5_633‿499
 
-So, roughly 4%, 2%, and 5%. The cache miss counts are also broadly in line with these numbers. Note that full cache misses are pretty rare, so that most misses just hit L2 or L3 and don't suffer a large penalty. Also note that instruction cache misses are mostly lower than data misses, as expected.
+So, roughly 4%, 2 to 9%, and 5%. The cache miss counts are also broadly in line with these numbers. Note that full cache misses are pretty rare, so that most misses just hit L2 or L3 and don't suffer a large penalty. Also note that instruction cache misses are mostly lower than data misses, as expected.
 
-Don't get me wrong, I'd love to improve performance even by 2%. But it's not exactly world domination, is it? And it doesn't matter how cache-friendly K is, that's the absolute limit.
+Don't get me wrong, I'd love to improve performance even by 2%. But it's not exactly world domination, is it? The perf results are an upper bound for how much these programs could be sped up with better treatment of the instruction cache. If K is faster by more than that, it's because of other optimizations.
 
 For comparison, here's [ngn/k](https://codeberg.org/ngn/k) (which does aim for a small executable) running one of its unit tests—test 19 in the a20/ folder, chosen because it's the longest-running of those tests.
 
@@ -123,4 +134,4 @@ For comparison, here's [ngn/k](https://codeberg.org/ngn/k) (which does aim for a
 
            1.245378356 seconds time elapsed
 
-The stalls are less than 1% here, so maybe the smaller executable is paying off in some way. I can't be sure, because the programs being run are very different: `19.k` is 10 lines while the others are hundreds of lines long. But I don't have a longer K program handy to test with (and you could always argue the result doesn't apply to Whitney's K anyway). Again, it doesn't matter much: the point is that the absolute most the other interpreters could gain from being more L1-friendly is about 5% on those fairly representative programs.
+The stalls are less than 1% here, so maybe the smaller executable is paying off in some way. I can't be sure, because the programs being run are very different: `19.k` is 10 lines while the others are hundreds of lines long. But I don't have a longer K program handy to test with (and you could always argue the result doesn't apply to Whitney's K anyway). Again, it doesn't matter much: the point is that the absolute most the other interpreters could gain from being more L1-friendly is about 10% on those fairly representative programs.
-- 
cgit v1.2.3