diff options
Diffstat (limited to 'docs/implementation/primitive')
| -rw-r--r-- | docs/implementation/primitive/random.html | 2 |
1 files changed, 1 insertions, 1 deletions
diff --git a/docs/implementation/primitive/random.html b/docs/implementation/primitive/random.html index b5a93a9a..5c0f4db6 100644 --- a/docs/implementation/primitive/random.html +++ b/docs/implementation/primitive/random.html @@ -11,5 +11,5 @@ <p>Other choices are <a href="https://prng.di.unimi.it/">xoshiro++</a> and <a href="https://www.pcg-random.org/">PCG</a>. The authors of these algorithms (co-author for xoshiro) hate each other very much and have spent quite some time slinging mud at each other. As far as I can tell they both have the normal small bias in favor of their own algorithms but are wildly unfair towards the other side, choosing misleading examples and inflating minor issues. I think both generators are good but find the case for xoshiro a little more convincing, and I think it's done better in third-party benchmarks.</p> <h2 id="simple-random-sample">Simple random sample</h2> <p>A <a href="https://en.wikipedia.org/wiki/Simple_random_sample">simple random sample</a> from a set is a subset with a specified size, chosen so that each subset of that size has equal probability. <code><span class='Value'>rand.</span><span class='Function'>Deal</span></code> gets a sample of size <code><span class='Value'>𝕨</span></code> from the set <code><span class='Function'>↕</span><span class='Value'>𝕩</span></code> with elements in a uniformly random order, and <code><span class='Value'>rand.</span><span class='Function'>Subset</span></code> does the same but sorts the elements.</p> -<p><code><span class='Function'>Deal</span></code> uses a <a href="https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle">Knuth shuffle</a>, stopping after the first <code><span class='Value'>𝕨</span></code> elements have been shuffled, as the algorithm won't touch them again. Usually it creates <code><span class='Function'>↕</span><span class='Value'>𝕩</span></code> explicitly for this purpose, but if <code><span class='Value'>𝕨</span></code> is very small then initializing it is too slow. In this case we initialize <code><span class='Function'>↕</span><span class='Value'>𝕨</span></code>, but use a "hash" table with an identity hash—the numbers are already random—for <code><span class='Value'>𝕨</span><span class='Function'>↓↕</span><span class='Value'>𝕩</span></code>. The default is that every value in the table is equal to its key, so that only entries where a swap has happened need to be stored. The hash table is the same design as for self-comparison functions, with open addressing and linear probing.</p> +<p><code><span class='Function'>Deal</span></code> uses a <a href="https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle">Knuth shuffle</a>, stopping after the first <code><span class='Value'>𝕨</span></code> elements have been shuffled, as the algorithm won't touch them again. Usually it creates <code><span class='Function'>↕</span><span class='Value'>𝕩</span></code> explicitly for this purpose, but if <code><span class='Value'>𝕨</span></code> is very small then initializing it is too slow. In this case we initialize <code><span class='Function'>↕</span><span class='Value'>𝕨</span></code>, but use a "hash" table with an identity hash—the numbers are already random—for <code><span class='Value'>𝕨</span><span class='Function'>↓↕</span><span class='Value'>𝕩</span></code>. The default is that every value in the table is equal to its key, so that only entries where a swap has happened need to be stored. The hash table is the same design as for self-search functions, with open addressing and linear probing.</p> <p><code><span class='Function'>Subset</span></code> uses <a href="https://math.stackexchange.com/questions/178690/whats-the-proof-of-correctness-for-robert-floyds-algorithm-for-selecting-a-sin">Floyd's method</a>, which is sort of a modification of shuffling where only the selected elements need to be stored, not what they were swapped with. This requires a lookup structure that can be updated efficiently and output all elements in sorted order. The choices are a bitset for large <code><span class='Value'>𝕨</span></code> and another not-really-hash table for small <code><span class='Value'>𝕨</span></code>. The table uses a right shift—that is, division by a power of two—as a hash so that hashing preserves the ordering, and inserts like an insertion sort: any larger entries are pushed forward. Really this is an online sorting algorithm, that works because we know the input distribution is well-behaved (it degrades to quadratic performance only in very unlikely cases). When <code><span class='Value'>𝕨</span><span class='Function'>></span><span class='Value'>𝕩</span><span class='Function'>÷</span><span class='Number'>2</span></code>, we always use a bitset, but select <code><span class='Value'>𝕩</span><span class='Function'>-</span><span class='Value'>𝕨</span></code> elements and invert the selection.</p> |
