diff options
Diffstat (limited to 'docs')
| -rw-r--r-- | docs/implementation/primitive/replicate.html | 2 |
1 files changed, 1 insertions, 1 deletions
diff --git a/docs/implementation/primitive/replicate.html b/docs/implementation/primitive/replicate.html index 9e07a754..64cf73a7 100644 --- a/docs/implementation/primitive/replicate.html +++ b/docs/implementation/primitive/replicate.html @@ -7,7 +7,7 @@ <h1 id="implementation-of-indices-and-replicate"><a class="header" href="#implementation-of-indices-and-replicate">Implementation of Indices and Replicate</a></h1> <p>The replicate family of functions contains not just primitives but powerful tools for implementing other functionality. The most important is converting <a href="#booleans-to-indices">bits to indices</a>: AVX-512 extensions implement this natively for various index sizes, and even with no SIMD support at all there are surprisingly fast table-based algorithms for it.</p> <p><a href="#replicate">General replication</a> is more complex. Branching will slow many useful cases down considerably when using the obvious solution. However, branch-free techniques introduce overhead for larger replication amounts. Hybridizing these seems to be the only way, but it's finicky.</p> -<p>Replicate by a <a href="#constant-replicate">constant amount</a> (so <code><span class='Value'>𝕨</span></code> is a single number) is not too common in itself, but it's notable because it can be the fastest way to implement outer products and scalar dyadics with prefix agreement.</p> +<p>Replicate by a <a href="#constant-replicate">constant amount</a> (so <code><span class='Value'>𝕨</span></code> is a single number) is not too common in itself, but it's notable because it can be the fastest way to implement outer products and arithmetic with prefix agreement.</p> <h2 id="indices"><a class="header" href="#indices">Indices</a></h2> <p>Branchless algorithms are fastest, but with unbounded values in <code><span class='Value'>𝕨</span></code> a fully branchless algorithm is impossible because you can't write an arbitrary amount of memory without branching. So the best algorithms depend on bounding <code><span class='Value'>𝕨</span></code>. Fortunately the most useful case is that <code><span class='Value'>𝕨</span></code> is boolean.</p> <h3 id="booleans-to-indices"><a class="header" href="#booleans-to-indices">Booleans to indices</a></h3> |
