From 0745f57691ea59f91155b2a014be80529b97701f Mon Sep 17 00:00:00 2001 From: Marshall Lochbaum Date: Wed, 6 Oct 2021 08:24:44 -0400 Subject: Don't say "scalar dyadics" --- docs/implementation/primitive/replicate.html | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'docs/implementation') diff --git a/docs/implementation/primitive/replicate.html b/docs/implementation/primitive/replicate.html index 9e07a754..64cf73a7 100644 --- a/docs/implementation/primitive/replicate.html +++ b/docs/implementation/primitive/replicate.html @@ -7,7 +7,7 @@

Implementation of Indices and Replicate

The replicate family of functions contains not just primitives but powerful tools for implementing other functionality. The most important is converting bits to indices: AVX-512 extensions implement this natively for various index sizes, and even with no SIMD support at all there are surprisingly fast table-based algorithms for it.

General replication is more complex. Branching will slow many useful cases down considerably when using the obvious solution. However, branch-free techniques introduce overhead for larger replication amounts. Hybridizing these seems to be the only way, but it's finicky.

-

Replicate by a constant amount (so 𝕨 is a single number) is not too common in itself, but it's notable because it can be the fastest way to implement outer products and scalar dyadics with prefix agreement.

+

Replicate by a constant amount (so 𝕨 is a single number) is not too common in itself, but it's notable because it can be the fastest way to implement outer products and arithmetic with prefix agreement.

Indices

Branchless algorithms are fastest, but with unbounded values in 𝕨 a fully branchless algorithm is impossible because you can't write an arbitrary amount of memory without branching. So the best algorithms depend on bounding 𝕨. Fortunately the most useful case is that 𝕨 is boolean.

Booleans to indices

-- cgit v1.2.3