From 85878912035fd3fb3582db60ef1fc06b459fe67a Mon Sep 17 00:00:00 2001 From: Marshall Lochbaum Date: Thu, 15 Jul 2021 08:35:30 -0400 Subject: Rename self-comparison to self-search --- doc/README.md | 2 +- doc/indices.md | 2 +- doc/leading.md | 2 +- doc/match.md | 2 +- doc/selfcmp.md | 14 +++++++------- docs/doc/index.html | 2 +- docs/doc/indices.html | 2 +- docs/doc/leading.html | 2 +- docs/doc/match.html | 2 +- docs/doc/selfcmp.html | 16 ++++++++-------- docs/implementation/primitive/random.html | 2 +- implementation/primitive/random.md | 2 +- 12 files changed, 25 insertions(+), 25 deletions(-) diff --git a/doc/README.md b/doc/README.md index e3736ac2..fc8e7000 100644 --- a/doc/README.md +++ b/doc/README.md @@ -51,7 +51,7 @@ Primitives: - [Scan](scan.md) (`` ` ``) - [Search functions](search.md) (`⊐⊒∊`) - [Select](select.md) (`⊏`) -- [Self-comparison functions](selfcmp.md) (`⊐⊒∊⍷`) +- [Self-search functions](selfcmp.md) (`⊐⊒∊⍷`) - [Shift functions](shift.md) (`»«`) - [Solo, Couple, and Merge](couple.md) (`≍>`) - [Transpose](transpose.md) (`⍉`) diff --git a/doc/indices.md b/doc/indices.md index 62122175..30a0157a 100644 --- a/doc/indices.md +++ b/doc/indices.md @@ -34,7 +34,7 @@ Unlike `/` and `⊔`, `↕` and `⊑` do use list element indices. For `↕` thi One of the successes of the [leading axis model](https://aplwiki.com/wiki/Leading_axis_theory) is to introduce a kind of index for multidimensional arrays that is easier to work with than list indices. The model introduces [cells](https://aplwiki.com/wiki/Cell), where a cell index is a list of any length up to the containing array's rank. General cell indices are discussed in the next section; first we introduce a special case, indices into major cells or ¯1-cells. These cells naturally form a list, so the index of a major cell is a single number. Such an index can also be considered to select along the first axis, since an index along any axis is a single number. -[Ordering](order.md) functions `⍋⍒` and [search](search.md)/[self-comparison](selfcmp.md) functions `⊐⊒` that depend on cell ordering only really make sense with major cell indices: while other indices have an ordering, it's not very natural. Note that `⊐` only uses the ordering in an incidental way, because it's defined to return the *first* index where a cell in `𝕩` is found. A mathematician would be more interested in a "pre-image" function that returns the set of all indices where a particular value appears. However, programming usefulness and consistency with the other search functions makes searching for the first index a reasonable choice. +[Ordering](order.md) functions `⍋⍒` and [search](search.md)/[self-search](selfcmp.md) functions `⊐⊒` that depend on cell ordering only really make sense with major cell indices: while other indices have an ordering, it's not very natural. Note that `⊐` only uses the ordering in an incidental way, because it's defined to return the *first* index where a cell in `𝕩` is found. A mathematician would be more interested in a "pre-image" function that returns the set of all indices where a particular value appears. However, programming usefulness and consistency with the other search functions makes searching for the first index a reasonable choice. Only one other function—but an important one!—deals with cells rather than elements: [Select](select.md) (`⊏`). Select [allows](leading.md#multiple-axes) either a simple first-axis case where `𝕨` has depth 1 or less (a depth-0 argument is automatically enclosed), and a multi-axis case where it is a list of depth-1 elements. In each case the depth-1 arrays index along a single axis. diff --git a/doc/leading.md b/doc/leading.md index 2f75ee3d..b2f1f1b5 100644 --- a/doc/leading.md +++ b/doc/leading.md @@ -37,7 +37,7 @@ In these three cases above, the results are the same as you would get from [tran ### Comparing cells -The functions in the last section manipulate cells in the same way regardless of what data they contain. Other functions compare cells to each other, either testing whether they match or how they are ordered relative to one another. The two [Grade](order.md) functions `⍋⍒`, and the [self-comparison](selfcmp.md) functions Classify (`⊐`), Mark Firsts (`∊`), and Occurrence Count (`⊒`), each give a list result, with one number for each cell. We can see below that Occurrence Count returns the same results even as we make the argument cells more complicated, because the changes made preserve the matching of cells. +The functions in the last section manipulate cells in the same way regardless of what data they contain. Other functions compare cells to each other, either testing whether they match or how they are ordered relative to one another. The two [Grade](order.md) functions `⍋⍒`, and the [self-search](selfcmp.md) functions Classify (`⊐`), Mark Firsts (`∊`), and Occurrence Count (`⊒`), each give a list result, with one number for each cell. We can see below that Occurrence Count returns the same results even as we make the argument cells more complicated, because the changes made preserve the matching of cells. s ← "abracadabra" ⊒ s diff --git a/doc/match.md b/doc/match.md index fdd2e6cc..5912b292 100644 --- a/doc/match.md +++ b/doc/match.md @@ -7,7 +7,7 @@ The primitive Match (`≡`) tests whether its two argument arrays are considered "abc" ≡ 'a'‿'b'‿'c' 4 ≢ <4 -Match always gives the same result as [Equals](arithmetic.md#comparisons) (`=`) when both arguments are atoms, but the two functions are extended to arrays differently: while the pervasive Equals maps over array arguments to return an array of results, Match compares them in totality and always returns one boolean (it never gives an error). Match is the basis for BQN's [search](search.md) and [self-comparison](selfcmp.md) functions. +Match always gives the same result as [Equals](arithmetic.md#comparisons) (`=`) when both arguments are atoms, but the two functions are extended to arrays differently: while the pervasive Equals maps over array arguments to return an array of results, Match compares them in totality and always returns one boolean (it never gives an error). Match is the basis for BQN's [search](search.md) and [self-search](selfcmp.md) functions. "abc" = "acc" "abc" ≡ "acc" diff --git a/doc/selfcmp.md b/doc/selfcmp.md index fdb43090..e4f1616c 100644 --- a/doc/selfcmp.md +++ b/doc/selfcmp.md @@ -1,6 +1,6 @@ *View this file with results and syntax highlighting [here](https://mlochbaum.github.io/BQN/doc/selfcmp.html).* -# Self-comparison functions +# Self-search functions -BQN has four self-comparison functions, Classify (`⊐`), Occurrence Count (`⊒`), Mark Firsts (`∊`), and Deduplicate (`⍷`). Each of these is a monadic function that obtains its result by comparing each major cell of the argument (which must have rank at least 1) to the earlier major cells with [match](match.md). For example, Mark Firsts indicates the cells that don't match any earlier cell, making them the first of their kind. +BQN has four self-search functions, Classify (`⊐`), Occurrence Count (`⊒`), Mark Firsts (`∊`), and Deduplicate (`⍷`). Each of these is a monadic function that obtains its result by comparing each major cell of the argument (which must have rank at least 1) to the earlier major cells with [match](match.md). For example, Mark Firsts indicates the cells that don't match any earlier cell, making them the first of their kind. ∊ "abaacb" -When the argument is a list, its major cells are units and thus contain one element each, so it's just as valid to say that a self-comparison function compares elements of its argument. Only with a higher-rank argument does the major cell nature become apparent. +When the argument is a list, its major cells are units and thus contain one element each, so it's just as valid to say that a self-search function compares elements of its argument. Only with a higher-rank argument does the major cell nature become apparent. ⊢ arr ← >"abc"‿"dcb"‿"abc"‿"bcd"‿"dcb" ∊ arr -The result has one number for each major cell, or in other words is a list with the same length as its argument. Three self-comparison functions follow this pattern, but Deduplicate (`⍷`) is different: it returns an array of the same rank but possibly a shorter length than the argument. +The result has one number for each major cell, or in other words is a list with the same length as its argument. Three self-search functions follow this pattern, but Deduplicate (`⍷`) is different: it returns an array of the same rank but possibly a shorter length than the argument. ## Classify -Classify is the universal self-comparison function, in that it preserves all the self-comparison information in its argument. It gives each different cell value a natural number, ordered by first appearance. +Classify is the universal self-search function, in that it preserves all the self-search information in its argument. It gives each different cell value a natural number, ordered by first appearance. ⊐ 5‿6‿2‿2‿5‿1 @@ -60,7 +60,7 @@ Classify is the universal self-comparison function, in that it preserves all the ≍⟜⊐ 5‿6‿2‿2‿5‿1 -Applying Classify before another self-comparison function will never change the result, except in the case of Deduplicate (`⍷`), which constructs its result from cells in the argument. In particular, Classify is [idempotent](https://en.wikipedia.org/wiki/Idempotent), meaning that applying it twice is the same as applying it once. +Applying Classify before another self-search function will never change the result, except in the case of Deduplicate (`⍷`), which constructs its result from cells in the argument. In particular, Classify is [idempotent](https://en.wikipedia.org/wiki/Idempotent), meaning that applying it twice is the same as applying it once. ∊ "dbaedcbcecbcd" ∊ ⊐ "dbaedcbcecbcd" @@ -89,7 +89,7 @@ Applying both Classify and Deduplicate gives an array that has both properties ( *See the [APL Wiki page](https://aplwiki.com/wiki/Unique_Mask) on this function as well.* -Mark Firsts (`∊`) is the simplest self-comparison function: it returns `0` for any major cell of the argument that is a duplicate of an earlier cell and `1` for a major cell that's the first with its value. To implement [Deduplicate](#deduplicate) in terms of Mark Firsts, just filter out the duplicates with `∊⊸/`. +Mark Firsts (`∊`) is the simplest self-search function: it returns `0` for any major cell of the argument that is a duplicate of an earlier cell and `1` for a major cell that's the first with its value. To implement [Deduplicate](#deduplicate) in terms of Mark Firsts, just filter out the duplicates with `∊⊸/`. ∊ 3‿1‿4‿1‿5‿9‿2‿6‿5 diff --git a/docs/doc/index.html b/docs/doc/index.html index c1eaf56b..99a75305 100644 --- a/docs/doc/index.html +++ b/docs/doc/index.html @@ -57,7 +57,7 @@
  • Scan (`)
  • Search functions (⊐⊒∊)
  • Select ()
  • -
  • Self-comparison functions (⊐⊒∊⍷)
  • +
  • Self-search functions (⊐⊒∊⍷)
  • Shift functions (»«)
  • Solo, Couple, and Merge (≍>)
  • Transpose ()
  • diff --git a/docs/doc/indices.html b/docs/doc/indices.html index 42938172..7f342cca 100644 --- a/docs/doc/indices.html +++ b/docs/doc/indices.html @@ -92,7 +92,7 @@

    Unlike / and , and do use list element indices. For this is because the output format can be controlled by the argument format: if passed a single number, the result uses atomic indices (so it's a numeric list); if passed a list, it uses list indices and the result has depth 2 (the result depth is always one greater than the argument depth). For , list indices are chosen because Select () handles atomic indices well already. When selecting multiple elements from a list, they would typically have to be placed in an array, which is equivalent to with a numeric list 𝕨. An atom 𝕨 in Pick is converted to a list, so it can be used to select a single element if only one is wanted. To select multiple elements, uses each depth-1 array in 𝕨 as an index and replaces it with that element from the right argument. Because this uses elements as elements (not cells), it is impossible to have conformability errors where elements do not fit together (invalid index errors are of course still possible). Atoms also cannot be used in this context, as it would create ambiguity: is a one-element list an index, or does it contain an index?

    Major cell indices

    One of the successes of the leading axis model is to introduce a kind of index for multidimensional arrays that is easier to work with than list indices. The model introduces cells, where a cell index is a list of any length up to the containing array's rank. General cell indices are discussed in the next section; first we introduce a special case, indices into major cells or ¯1-cells. These cells naturally form a list, so the index of a major cell is a single number. Such an index can also be considered to select along the first axis, since an index along any axis is a single number.

    -

    Ordering functions ⍋⍒ and search/self-comparison functions ⊐⊒ that depend on cell ordering only really make sense with major cell indices: while other indices have an ordering, it's not very natural. Note that only uses the ordering in an incidental way, because it's defined to return the first index where a cell in 𝕩 is found. A mathematician would be more interested in a "pre-image" function that returns the set of all indices where a particular value appears. However, programming usefulness and consistency with the other search functions makes searching for the first index a reasonable choice.

    +

    Ordering functions ⍋⍒ and search/self-search functions ⊐⊒ that depend on cell ordering only really make sense with major cell indices: while other indices have an ordering, it's not very natural. Note that only uses the ordering in an incidental way, because it's defined to return the first index where a cell in 𝕩 is found. A mathematician would be more interested in a "pre-image" function that returns the set of all indices where a particular value appears. However, programming usefulness and consistency with the other search functions makes searching for the first index a reasonable choice.

    Only one other function—but an important one!—deals with cells rather than elements: Select (). Select allows either a simple first-axis case where 𝕨 has depth 1 or less (a depth-0 argument is automatically enclosed), and a multi-axis case where it is a list of depth-1 elements. In each case the depth-1 arrays index along a single axis.

    General cell indices

    BQN does not use general cell indices directly, but it is useful to consider how they might work, and how a programmer might implement functions that use them in BQN if needed. The functions /, , and are the ones that can work with indices for multidimensional arrays but don't already. Here we will examine how multidimensional versions would work.

    diff --git a/docs/doc/leading.html b/docs/doc/leading.html index 69f04d77..11e7a567 100644 --- a/docs/doc/leading.html +++ b/docs/doc/leading.html @@ -81,7 +81,7 @@ ⟨ 3 2 1 ⟩

    Comparing cells

    -

    The functions in the last section manipulate cells in the same way regardless of what data they contain. Other functions compare cells to each other, either testing whether they match or how they are ordered relative to one another. The two Grade functions ⍋⍒, and the self-comparison functions Classify (), Mark Firsts (), and Occurrence Count (), each give a list result, with one number for each cell. We can see below that Occurrence Count returns the same results even as we make the argument cells more complicated, because the changes made preserve the matching of cells.

    +

    The functions in the last section manipulate cells in the same way regardless of what data they contain. Other functions compare cells to each other, either testing whether they match or how they are ordered relative to one another. The two Grade functions ⍋⍒, and the self-search functions Classify (), Mark Firsts (), and Occurrence Count (), each give a list result, with one number for each cell. We can see below that Occurrence Count returns the same results even as we make the argument cells more complicated, because the changes made preserve the matching of cells.

    ↗️
        s  "abracadabra"
          s
     ⟨ 0 0 0 1 0 2 0 3 1 1 4 ⟩
    diff --git a/docs/doc/match.html b/docs/doc/match.html
    index 03da6ad6..dbf0b301 100644
    --- a/docs/doc/match.html
    +++ b/docs/doc/match.html
    @@ -11,7 +11,7 @@
         4  <4
     1
     
    -

    Match always gives the same result as Equals (=) when both arguments are atoms, but the two functions are extended to arrays differently: while the pervasive Equals maps over array arguments to return an array of results, Match compares them in totality and always returns one boolean (it never gives an error). Match is the basis for BQN's search and self-comparison functions.

    +

    Match always gives the same result as Equals (=) when both arguments are atoms, but the two functions are extended to arrays differently: while the pervasive Equals maps over array arguments to return an array of results, Match compares them in totality and always returns one boolean (it never gives an error). Match is the basis for BQN's search and self-search functions.

    ↗️
        "abc" = "acc"
     ⟨ 1 0 1 ⟩
         "abc"  "acc"
    diff --git a/docs/doc/selfcmp.html b/docs/doc/selfcmp.html
    index 9188c69c..bc99ef43 100644
    --- a/docs/doc/selfcmp.html
    +++ b/docs/doc/selfcmp.html
    @@ -1,10 +1,10 @@
     
       
       
    -  BQN: Self-comparison functions
    +  BQN: Self-search functions
     
     
    -

    Self-comparison functions

    +

    Self-search functions

    @@ -131,11 +131,11 @@ -

    BQN has four self-comparison functions, Classify (), Occurrence Count (), Mark Firsts (), and Deduplicate (). Each of these is a monadic function that obtains its result by comparing each major cell of the argument (which must have rank at least 1) to the earlier major cells with match. For example, Mark Firsts indicates the cells that don't match any earlier cell, making them the first of their kind.

    +

    BQN has four self-search functions, Classify (), Occurrence Count (), Mark Firsts (), and Deduplicate (). Each of these is a monadic function that obtains its result by comparing each major cell of the argument (which must have rank at least 1) to the earlier major cells with match. For example, Mark Firsts indicates the cells that don't match any earlier cell, making them the first of their kind.

    ↗️
         "abaacb"
     ⟨ 1 1 0 0 1 0 ⟩
     
    -

    When the argument is a list, its major cells are units and thus contain one element each, so it's just as valid to say that a self-comparison function compares elements of its argument. Only with a higher-rank argument does the major cell nature become apparent.

    +

    When the argument is a list, its major cells are units and thus contain one element each, so it's just as valid to say that a self-search function compares elements of its argument. Only with a higher-rank argument does the major cell nature become apparent.

    ↗️
         arr  >"abc""dcb""abc""bcd""dcb"
     ┌─     
     ╵"abc  
    @@ -147,9 +147,9 @@
          arr
     ⟨ 1 1 0 1 0 ⟩
     
    -

    The result has one number for each major cell, or in other words is a list with the same length as its argument. Three self-comparison functions follow this pattern, but Deduplicate () is different: it returns an array of the same rank but possibly a shorter length than the argument.

    +

    The result has one number for each major cell, or in other words is a list with the same length as its argument. Three self-search functions follow this pattern, but Deduplicate () is different: it returns an array of the same rank but possibly a shorter length than the argument.

    Classify

    -

    Classify is the universal self-comparison function, in that it preserves all the self-comparison information in its argument. It gives each different cell value a natural number, ordered by first appearance.

    +

    Classify is the universal self-search function, in that it preserves all the self-search information in its argument. It gives each different cell value a natural number, ordered by first appearance.

    ↗️
         562251
     ⟨ 0 1 2 2 0 3 ⟩
     
    @@ -160,7 +160,7 @@ 0 1 2 2 0 3 ┘
    -

    Applying Classify before another self-comparison function will never change the result, except in the case of Deduplicate (), which constructs its result from cells in the argument. In particular, Classify is idempotent, meaning that applying it twice is the same as applying it once.

    +

    Applying Classify before another self-search function will never change the result, except in the case of Deduplicate (), which constructs its result from cells in the argument. In particular, Classify is idempotent, meaning that applying it twice is the same as applying it once.

    ↗️
           "dbaedcbcecbcd"
     ⟨ 1 1 1 1 0 1 0 0 0 0 0 0 0 ⟩
           "dbaedcbcecbcd"
    @@ -208,7 +208,7 @@
     

    Applying both Classify and Deduplicate gives an array that has both properties (this isn't the case for all pairs of projections—we need to know that Classify maintains the uniqueness property for Deduplicate and vice-versa). It has no duplicate major cells, and it's a list of natural numbers that starts with 0 and never goes up by more than one. Taken together, these are a tight constraint! The first element of the argument has to be 0. The next can't be 0 because it's already appeared, but it can't be more than one higher—it has to be 1. The next can't be 0 or 1, and has to be 2. And so on. So the result is always n for some n. In fact it's possible to determine the length as well, by noting that each function preserves the number of unique major cells in its argument. Classify does this because distinct numbers in the output correspond exactly to distinct major cells in the input; Deduplicate does this because it only removes duplicate cells, not distinct ones. So the final result is n, where n is the number of unique major cells in the argument.

    Mark Firsts

    See the APL Wiki page on this function as well.

    -

    Mark Firsts () is the simplest self-comparison function: it returns 0 for any major cell of the argument that is a duplicate of an earlier cell and 1 for a major cell that's the first with its value. To implement Deduplicate in terms of Mark Firsts, just filter out the duplicates with /.

    +

    Mark Firsts () is the simplest self-search function: it returns 0 for any major cell of the argument that is a duplicate of an earlier cell and 1 for a major cell that's the first with its value. To implement Deduplicate in terms of Mark Firsts, just filter out the duplicates with /.

    ↗️
           314159265
     ⟨ 1 1 1 0 1 1 1 1 0 ⟩
     
    diff --git a/docs/implementation/primitive/random.html b/docs/implementation/primitive/random.html
    index b5a93a9a..5c0f4db6 100644
    --- a/docs/implementation/primitive/random.html
    +++ b/docs/implementation/primitive/random.html
    @@ -11,5 +11,5 @@
     

    Other choices are xoshiro++ and PCG. The authors of these algorithms (co-author for xoshiro) hate each other very much and have spent quite some time slinging mud at each other. As far as I can tell they both have the normal small bias in favor of their own algorithms but are wildly unfair towards the other side, choosing misleading examples and inflating minor issues. I think both generators are good but find the case for xoshiro a little more convincing, and I think it's done better in third-party benchmarks.

    Simple random sample

    A simple random sample from a set is a subset with a specified size, chosen so that each subset of that size has equal probability. rand.Deal gets a sample of size 𝕨 from the set 𝕩 with elements in a uniformly random order, and rand.Subset does the same but sorts the elements.

    -

    Deal uses a Knuth shuffle, stopping after the first 𝕨 elements have been shuffled, as the algorithm won't touch them again. Usually it creates 𝕩 explicitly for this purpose, but if 𝕨 is very small then initializing it is too slow. In this case we initialize 𝕨, but use a "hash" table with an identity hash—the numbers are already random—for 𝕨↓↕𝕩. The default is that every value in the table is equal to its key, so that only entries where a swap has happened need to be stored. The hash table is the same design as for self-comparison functions, with open addressing and linear probing.

    +

    Deal uses a Knuth shuffle, stopping after the first 𝕨 elements have been shuffled, as the algorithm won't touch them again. Usually it creates 𝕩 explicitly for this purpose, but if 𝕨 is very small then initializing it is too slow. In this case we initialize 𝕨, but use a "hash" table with an identity hash—the numbers are already random—for 𝕨↓↕𝕩. The default is that every value in the table is equal to its key, so that only entries where a swap has happened need to be stored. The hash table is the same design as for self-search functions, with open addressing and linear probing.

    Subset uses Floyd's method, which is sort of a modification of shuffling where only the selected elements need to be stored, not what they were swapped with. This requires a lookup structure that can be updated efficiently and output all elements in sorted order. The choices are a bitset for large 𝕨 and another not-really-hash table for small 𝕨. The table uses a right shift—that is, division by a power of two—as a hash so that hashing preserves the ordering, and inserts like an insertion sort: any larger entries are pushed forward. Really this is an online sorting algorithm, that works because we know the input distribution is well-behaved (it degrades to quadratic performance only in very unlikely cases). When 𝕨>𝕩÷2, we always use a bitset, but select 𝕩-𝕨 elements and invert the selection.

    diff --git a/implementation/primitive/random.md b/implementation/primitive/random.md index a1be7e8b..afab34c0 100644 --- a/implementation/primitive/random.md +++ b/implementation/primitive/random.md @@ -14,6 +14,6 @@ Other choices are [xoshiro++](https://prng.di.unimi.it/) and [PCG](https://www.p A [simple random sample](https://en.wikipedia.org/wiki/Simple_random_sample) from a set is a subset with a specified size, chosen so that each subset of that size has equal probability. `rand.Deal` gets a sample of size `𝕨` from the set `↕𝕩` with elements in a uniformly random order, and `rand.Subset` does the same but sorts the elements. -`Deal` uses a [Knuth shuffle](https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle), stopping after the first `𝕨` elements have been shuffled, as the algorithm won't touch them again. Usually it creates `↕𝕩` explicitly for this purpose, but if `𝕨` is very small then initializing it is too slow. In this case we initialize `↕𝕨`, but use a "hash" table with an identity hash—the numbers are already random—for `𝕨↓↕𝕩`. The default is that every value in the table is equal to its key, so that only entries where a swap has happened need to be stored. The hash table is the same design as for self-comparison functions, with open addressing and linear probing. +`Deal` uses a [Knuth shuffle](https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle), stopping after the first `𝕨` elements have been shuffled, as the algorithm won't touch them again. Usually it creates `↕𝕩` explicitly for this purpose, but if `𝕨` is very small then initializing it is too slow. In this case we initialize `↕𝕨`, but use a "hash" table with an identity hash—the numbers are already random—for `𝕨↓↕𝕩`. The default is that every value in the table is equal to its key, so that only entries where a swap has happened need to be stored. The hash table is the same design as for self-search functions, with open addressing and linear probing. `Subset` uses [Floyd's method](https://math.stackexchange.com/questions/178690/whats-the-proof-of-correctness-for-robert-floyds-algorithm-for-selecting-a-sin), which is sort of a modification of shuffling where only the selected elements need to be stored, not what they were swapped with. This requires a lookup structure that can be updated efficiently and output all elements in sorted order. The choices are a bitset for large `𝕨` and another not-really-hash table for small `𝕨`. The table uses a right shift—that is, division by a power of two—as a hash so that hashing preserves the ordering, and inserts like an insertion sort: any larger entries are pushed forward. Really this is an online sorting algorithm, that works because we know the input distribution is well-behaved (it degrades to quadratic performance only in very unlikely cases). When `𝕨>𝕩÷2`, we always use a bitset, but select `𝕩-𝕨` elements and invert the selection. -- cgit v1.2.3