diff options
| author | Marshall Lochbaum <mwlochbaum@gmail.com> | 2022-02-13 11:28:04 -0500 |
|---|---|---|
| committer | Marshall Lochbaum <mwlochbaum@gmail.com> | 2022-02-13 11:28:04 -0500 |
| commit | cde6f699cd4b76b3b193a8d5bd39242261196fbd (patch) | |
| tree | f247073605808c90ba352ee685bb2835f461b6e6 | |
| parent | 1853387a0eca09eb4a9778aad64488602e29dca0 (diff) | |
Comments on interpolation search
| -rw-r--r-- | docs/implementation/primitive/sort.html | 3 | ||||
| -rw-r--r-- | implementation/primitive/sort.md | 6 |
2 files changed, 9 insertions, 0 deletions
diff --git a/docs/implementation/primitive/sort.html b/docs/implementation/primitive/sort.html index 8e69a9ea..12160286 100644 --- a/docs/implementation/primitive/sort.html +++ b/docs/implementation/primitive/sort.html @@ -58,3 +58,6 @@ <p>It's cheap and sometimes worthwhile to trim <code><span class='Value'>π¨</span></code> down to the range of <code><span class='Value'>π©</span></code>. After finding the range of <code><span class='Value'>π©</span></code>, binary cut <code><span class='Value'>π¨</span></code> to a smaller list that contains the range. Stop when the middle element fits inside the range, and search each half of <code><span class='Value'>π¨</span></code> for the appropriate endpoint of the range.</p> <p>If <code><span class='Value'>π©</span></code> is small-range, then a lookup table method is possible. Check the length of <code><span class='Value'>π¨</span></code> because if it's too large then this method is slowerβbinary search doesn't have to hit every element! The approach is simply to create a table of the number of elements in <code><span class='Value'>π¨</span></code> with each value, then take a prefix sum. In BQN, <code><span class='Value'>π©</span><span class='Function'>β+</span><span class='Modifier'>`</span><span class='Paren'>(</span><span class='Number'>1</span><span class='Function'>+β</span><span class='Modifier'>Β΄</span><span class='Value'>π©</span><span class='Paren'>)</span><span class='Function'>β/</span><span class='Modifier'>βΌ</span><span class='Value'>π¨</span></code>, assuming a minimum of 0.</p> <p><a href="#partitioning">Partitioning</a> allows one pivot <code><span class='Value'>t</span></code> from <code><span class='Value'>π¨</span></code> to be compared with all of <code><span class='Value'>π©</span></code> at once. Although the comparison <code><span class='Value'>t</span><span class='Function'>β€</span><span class='Value'>π©</span></code> can be vectorized, the overhead of partitioning still makes this method a little slower per-comparison than sequential binary search <em>when</em> <code><span class='Value'>π¨</span></code> <em>fits in L1 cache</em>. For larger <code><span class='Value'>π¨</span></code> (and randomly positioned <code><span class='Value'>π©</span></code>) cache churn is a huge cost and partitioning can be many times faster. It should be performed recursively, switching to sequential binary search when <code><span class='Value'>π¨</span></code> is small enough. Unlike quicksort there is no difficulty in pivot selection: always take it from the middle of <code><span class='Value'>π¨</span></code> as in a normal binary search. However, there is a potential issue with memory. If <code><span class='Value'>π©</span></code> is unbalanced with respect to <code><span class='Value'>π¨</span></code>, then the larger part can be nearly the whole length of <code><span class='Value'>π©</span></code> (if it's all of <code><span class='Value'>π©</span></code> partitioning isn't actually needed and it doesn't need to be saved). This can require close to <code><span class='Number'>2</span><span class='Function'>β</span><span class='Modifier'>βΌ</span><span class='Function'>β </span><span class='Value'>π¨</span></code> saved partitions of length <code><span class='Function'>β </span><span class='Value'>π©</span></code>, while the expected use would be a total length <code><span class='Function'>β </span><span class='Value'>π©</span></code>.</p> +<h4 id="interpolation-search"><a class="header" href="#interpolation-search">Interpolation search?</a></h4> +<p>Binary search is the optimal approach for truly unknown data. However, if the searched array has an approximately uniform distribution, giving an approximately linear relationship between index and value, <a href="https://en.wikipedia.org/wiki/Interpolation_search">interpolation search</a> uses asymptotically fewer comparisons. Because of the high overhead of index computation (a division!!), it could only ever be useful for large <code><span class='Value'>π¨</span></code>, and <code><span class='Value'>π©</span></code> small enough that partitioning isn't viable.</p> +<p>Interpolation is generally held to be counterproductive because of how badly it fails for non-uniform data. But the continuous-domain equivalent, bracketed root-finding, is better studied, with hybrid approaches developed to fix this. The recently published <a href="https://en.wikipedia.org/wiki/ITP_Method">ITP method</a> (2020) gets the advantages of interpolation with the same maximum number of comparisons as binary search, plus a configurable constant (even if set to 0, it can take advantage of the gap between the length and the next power of 2, but 1 is a better setting). And once the search range is small enough that memory access stops being very expensive, it should switch to binary search and avoid the division.</p> diff --git a/implementation/primitive/sort.md b/implementation/primitive/sort.md index 36b9c916..beb99a58 100644 --- a/implementation/primitive/sort.md +++ b/implementation/primitive/sort.md @@ -100,3 +100,9 @@ It's cheap and sometimes worthwhile to trim `π¨` down to the range of `π©`. If `π©` is small-range, then a lookup table method is possible. Check the length of `π¨` because if it's too large then this method is slowerβbinary search doesn't have to hit every element! The approach is simply to create a table of the number of elements in `π¨` with each value, then take a prefix sum. In BQN, ``π©β+`(1+βΒ΄π©)β/βΌπ¨``, assuming a minimum of 0. [Partitioning](#partitioning) allows one pivot `t` from `π¨` to be compared with all of `π©` at once. Although the comparison `tβ€π©` can be vectorized, the overhead of partitioning still makes this method a little slower per-comparison than sequential binary search *when* `π¨` *fits in L1 cache*. For larger `π¨` (and randomly positioned `π©`) cache churn is a huge cost and partitioning can be many times faster. It should be performed recursively, switching to sequential binary search when `π¨` is small enough. Unlike quicksort there is no difficulty in pivot selection: always take it from the middle of `π¨` as in a normal binary search. However, there is a potential issue with memory. If `π©` is unbalanced with respect to `π¨`, then the larger part can be nearly the whole length of `π©` (if it's all of `π©` partitioning isn't actually needed and it doesn't need to be saved). This can require close to `2ββΌβ π¨` saved partitions of length `β π©`, while the expected use would be a total length `β π©`. + +#### Interpolation search? + +Binary search is the optimal approach for truly unknown data. However, if the searched array has an approximately uniform distribution, giving an approximately linear relationship between index and value, [interpolation search](https://en.wikipedia.org/wiki/Interpolation_search) uses asymptotically fewer comparisons. Because of the high overhead of index computation (a division!!), it could only ever be useful for large `π¨`, and `π©` small enough that partitioning isn't viable. + +Interpolation is generally held to be counterproductive because of how badly it fails for non-uniform data. But the continuous-domain equivalent, bracketed root-finding, is better studied, with hybrid approaches developed to fix this. The recently published [ITP method](https://en.wikipedia.org/wiki/ITP_Method) (2020) gets the advantages of interpolation with the same maximum number of comparisons as binary search, plus a configurable constant (even if set to 0, it can take advantage of the gap between the length and the next power of 2, but 1 is a better setting). And once the search range is small enough that memory access stops being very expensive, it should switch to binary search and avoid the division. |
