docs/implementation/primitive/sort.html


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

<head>
  <link href="../../favicon.ico" rel="shortcut icon" type="image/x-icon"/>
  <link href="../../style.css" rel="stylesheet"/>
  <title>BQN: Implementation of ordering functions</title>
</head>
<div class="nav"><a href="https://github.com/mlochbaum/BQN">BQN</a> / <a href="../../index.html">main</a> / <a href="../index.html">implementation</a> / <a href="index.html">primitive</a></div>
<h1 id="implementation-of-ordering-functions">Implementation of ordering functions</h1>
<p>The <a href="../../doc/order.html">ordering functions</a> are Sort (<code><span class='Function'>∧∨</span></code>), Grade (<code><span class='Function'>⍋⍒</span></code>), and Bins (<code><span class='Function'>⍋⍒</span></code>). Although these are well-studied—particularly sorting, and then binary search or &quot;predecessor search&quot;—there are many recent developments, as well as techniques that I have not found in the literature. The three functions are closely related but have important differences in what algorithms are viable. Sorting is a remarkably deep problem with different algorithms able to do a wide range of amazing things, and sophisticated ways to combine those. It is by no means solved. In comparison, Bins is pretty tame.</p>
<p>There's a large divide between ordering compound data and simple data. For compound data comparisons are expensive, and the best algorithm will generally be the one that uses the fewest comparisons. For simple data they fall somewhere between cheap and extremely cheap, and fancy branchless and vectorized algorithms are the best.</p>
<h2 id="on-quicksort-versus-merge-sort">On quicksort versus merge sort</h2>
<p>Merge sort is better. It is deterministic, stable, and has optimal worst-case performance. Its pattern handling is better: while merge sort handles &quot;horizontal&quot; patterns and quicksort does &quot;vertical&quot; ones, merge sort gets useful work out of <em>any</em> sequence of runs but in-place quicksort will quickly mangle its analogue until it may as well be random.</p>
<p>But that doesn't mean merge sort is always faster. Quicksort seems to work a little better branchlessly. For sorting, quicksort's partitioning can reduce the range of the data enough to use an extremely quick counting sort. Partitioning is also a natural fit for binary search, where it's mandatory for sensible cache behavior with large enough arguments. So it can be useful. But it doesn't merge, and can't easily be made to merge, and that's a shame.</p>
<p>The same applies to the general categories of partitioning sorts (quicksort, radix sort, samplesort) and merging sorts (mergesort, timsort, multimerges). Radix sorts are definitely the best for some types and lengths, although the scattered accesses make their performance unpredictable and I think overall they're not worth it.</p>
<h2 id="on-binary-search">On binary search</h2>
<p>Binary searches are very easy to get wrong. Do not write <code><span class='Paren'>(</span><span class='Value'>hi</span><span class='Function'>+</span><span class='Value'>lo</span><span class='Paren'>)</span><span class='Function'>/</span><span class='Number'>2</span></code>: it's not safe from overflows. I always follow the pattern given in the first code block <a href="https://pvk.ca/Blog/2015/11/29/retrospective-on-binary-search-and-on-compression-slash-compilation/">here</a>. This code will never access the value <code><span class='Value'>*base</span></code>, so it should be considered a search on the <code><span class='Value'>n</span><span class='Function'>-</span><span class='Number'>1</span></code> values beginning at <code><span class='Value'>base</span><span class='Function'>+</span><span class='Number'>1</span></code> (the perfect case is when the number of values is one less than a power of two, which is in fact how it has to go). It's branchless and always takes the same number of iterations. To get a version that stops when the answer is known, subtract <code><span class='Value'>n%</span><span class='Number'>2</span></code> from <code><span class='Value'>n</span></code> in the case that <code><span class='Value'>*mid</span> <span class='Function'>&lt;</span> <span class='Value'>x</span></code>.</p>
<h2 id="compound-data">Compound data</h2>
<p>Array comparisons are expensive. The goal here is almost entirely to minimize the number of comparisons. Which is a much less complex goal than to get the most out of modern hardware, so the algorithms here are simpler.</p>
<p>For <strong>Sort</strong> and <strong>Grade</strong>, use Timsort. It's time-tested and shows no signs of weakness (but do be sure to pick up a fix for the bug discovered in 2015 in formal verification). Hardly different from optimal comparison numbers on random data, and outstanding pattern handling. Grade can either by selecting from the original array to order indices or by moving the data around in the same order as the indices. I think the second of these ends up being substantially better for small-ish elements.</p>
<p>For <strong>Bins</strong>, use a branching binary search: see <a href="#on-binary-search">On binary search</a> above. But there are also interesting (although, I expect, rare) cases where only one argument is compound. Elements of this argument should be reduced to fit the type of the other argument, then compared to multiple elements. For the right argument, this just means reducing before doing whatever binary search is appropriate to the left argument. If the left argument is compound, its elements should be used as partitions. Then switch back to binary search only when the partitions get very small—probably one element.</p>
<h2 id="simple-data">Simple data</h2>
<p>The name of the game here is &quot;branchless&quot;.</p>
<p>Sorting algorithms of interest are counting/bucket sort, <a href="https://github.com/orlp/pdqsort">pdqsort</a> (some improvements of my own to be described here later), and <a href="https://github.com/scandum/quadsort">quadsort</a> (good benchmarks but I haven't properly looked into it).</p>
<p>Vector binary search described in &quot;Sub-nanosecond Searches&quot; (<a href="https://dyalog.tv/Dyalog18/?v=paxIkKBzqBU">video</a>, <a href="https://www.dyalog.com/user-meetings/uploads/conference/dyalog18/presentations/D15_The_Interpretive_Advantage.zip">slides</a>).</p>