From a0efe2b43544ec8edb01b3bab2ed6efe7ba5f6f2 Mon Sep 17 00:00:00 2001 From: Marshall Lochbaum Date: Sat, 16 Jan 2021 22:26:59 -0500 Subject: Commentary on sorting functions --- docs/spec/primitive.html | 10 ++++++++++ 1 file changed, 10 insertions(+) (limited to 'docs') diff --git a/docs/spec/primitive.html b/docs/spec/primitive.html index 4e9ef995..61b190b3 100644 --- a/docs/spec/primitive.html +++ b/docs/spec/primitive.html @@ -135,3 +135,13 @@
  • Progressive Index of (βŠ’) processes non-principal cells in ravel order, and gives the smallest index of a principal argument cell that matches the cell that hasn't already been included in the result. Again ≠𝕨 is returned for a given cell if there is no valid cell.
  • Find (⍷) indicates positions where 𝕨 appears as a contiguous subarray of a =𝕨-cell of 𝕩. It has one result element for each such subarray of 𝕩, whose value is 1 if that subarray matches 𝕩 and 0 otherwise.

    +

    Sorting

    +

    Sorting functions are those that depend on BQN's array ordering. There are three kinds of sorting function, with two functions of each kind: one with an upward-pointing glyph that uses an ascending ordering (these function names are suffixed with "Up"), and one with a downward-pointing glyph and the reverse, descending, ordering ("Down"). Below, these three kinds of function are described, then the ordering rules. Except for the right argument of Bins, all arguments must have rank at least 1.

    +

    Sort (∧∨) reorders the major cells of its argument so that a major cell with a lower index comes earlier in the ordering than a major cell with a higher index, or matches it.

    +

    Grade (⍋⍒) returns a permutation describing the way the argument array would be sorted. For this reason the reference implementations simply define Sort to be selection by the grade. One way to define Grade is as a sorted version of the index list ↕≠𝕩. An index i is ordered according to the corresponding major cell iβŠπ•©. However, ties in the ordering are broken by ordering the index values themselves, so that no two indices are ever considered equal, and the result of sorting is well-defined (for Sort this is not an issueβ€”matching cells are truly interchangeable). This property means that a stable sorting algorithm must be used to implement Grade functions. While cells might be ordered ascending or descending, indices are always ordered ascending, so that for example index i is placed before index j if either iβŠπ•© comes earlier in the ordering than jβŠπ•©, or if i<j.

    +

    Bins (⍋⍒) requires the 𝕨 to be ordered in the sense of Sort (with the same direction). Like a dyadic search function, it then works on cells of 𝕩 with the same rank as major cells of 𝕨: the rank of 𝕩 cannot be less than (=𝕨)-1. For each of these, it identifies where in the ordering given by 𝕨 the cell belongs, that is, the index of the first cell in 𝕨 that is ordered later than it, or ≠𝕨 if no such cell exists. An equivalent formulation is that the result value for a cell of 𝕩 is the number of major cells in 𝕨 that match or precede it.

    +

    BQN's array ordering is an extension of the number and character ordering given by ≀ to arrays. In this system, any two arrays consisting of only numbers and characters for atoms can be compared with each other. Furthermore, some arrays that contain incomparable atoms (operations) might be comparable, if the result of the comparison can be decided before reaching these atoms. Array ordering does not depend on the fill elements for the two arguments.

    +

    Here we define the array ordering using the terms "smaller" and "larger". For functions βˆ§β‹, "earlier" means "smaller" and "later" means "larger", while βˆ¨β’ use the opposite definition, reversing the ordering.

    +

    To compare two arrays, BQN first attempts to compare elements at corresponding indices, where two indices are considered to correspond if one is a suffix of the other. Elements are accessed in ravel order, that is, beginning at the all-zero index and first increasing the final number in the index, then the second-to-last, and so on. They are compared, using array comparison if necessary, until a non-matching pair of elements is foundβ€”in this case the ordering of this pair determines the ordering of the arraysβ€”or one array has an index with no corresponding index in the other array. For example, comparing 4β€Ώ3β€Ώ2β₯Š1 with 2β€Ώ5β₯Š1 stops when index 0β€Ώ2 in 2β€Ώ5β₯Š1 is reached, because the corresponding index 0β€Ώ0β€Ώ2 is out of range. The index 0β€Ώ2β€Ώ0 in the other array also has no corresponding index, but comes later in the index ordering. In this case, the array that lacks the index in question is considered smaller.

    +

    If two arrays have the same shape (ignoring leading 1s) and all matching element, or if they are both empty, then the element-by-element comparison will not find any differences. In this case, the arrays are compared first by rank, with the higher-rank array considered larger, and then by shape, beginning with the leading axes.

    +

    To compare two atoms, array ordering uses ≀: if 𝕨≀𝕩 then 𝕨 matches 𝕩 if 𝕩≀𝕨 and otherwise is smaller than 𝕩 (and 𝕩 is larger than 𝕨). To compare an atom to an array, the atom is promoted to an array by enclosing it; however, if the enclosed atom matches the array then the atom is considered smaller.

    -- cgit v1.2.3