From f113d9f57bdae219c87887c5e0781a5c824dc8e4 Mon Sep 17 00:00:00 2001 From: Marshall Lochbaum Date: Wed, 14 Jul 2021 21:10:09 -0400 Subject: Document search functions on lists --- docs/doc/index.html | 1 + docs/doc/primitive.html | 8 +++--- docs/doc/search.html | 72 +++++++++++++++++++++++++++++++++++++++++++++++-- 3 files changed, 75 insertions(+), 6 deletions(-) (limited to 'docs/doc') diff --git a/docs/doc/index.html b/docs/doc/index.html index e32dd8f8..c1eaf56b 100644 --- a/docs/doc/index.html +++ b/docs/doc/index.html @@ -55,6 +55,7 @@
  • Repeat ()
  • Reverse and Rotate ()
  • Scan (`)
  • +
  • Search functions (⊐⊒∊)
  • Select ()
  • Self-comparison functions (⊐⊒∊⍷)
  • Shift functions (»«)
  • diff --git a/docs/doc/primitive.html b/docs/doc/primitive.html index 46f8d444..e8eedc23 100644 --- a/docs/doc/primitive.html +++ b/docs/doc/primitive.html @@ -206,18 +206,18 @@ -Classify* () -Index of +Classify* +Index of Occurrence Count* -Progressive Index of* +Progressive Index of* Mark Firsts -Member of +Member of diff --git a/docs/doc/search.html b/docs/doc/search.html index 118cd35a..285f8b46 100644 --- a/docs/doc/search.html +++ b/docs/doc/search.html @@ -6,6 +6,15 @@

    Search functions

    + + + + + + + + + @@ -15,7 +24,7 @@ - + @@ -25,7 +34,7 @@ - + @@ -124,3 +133,62 @@

    The searched-for argument is 𝕩 in Index-of functions (⊐⊒) and 𝕨 in Member of (). Bins Up and Down (⍋⍒) are ordering functions but follow the same pattern as Index-of. It's split into cells, but not necessarily major cells: instead, the cells used match the rank of a major cell of the other (searched-in) argument. In the most common case, when the searched-in argument is a list, 0-cells are used for the search (we might also say elements, as it gives the same result).

    The result is always an array containing one number for each searched-for cell. For Index of and Member of, every result is computed independently; for Progressive Index of the result for a cell can depend on earlier cells, in index order.

    +

    Member of

    +

    The simplest of the search functions, Member of () returns 1 if an entry in 𝕨 matches some entry in 𝕩, and 0 if it doesn't.

    +↗️
        "green""bricks""cow""blue"  "red""green""blue"
    +⟨ 1 0 0 1 ⟩
    +
    +

    The result is independent of the ordering of 𝕩: all that matters is which cells it contains.

    +

    Member of can be used in a train to compute the set intersection and difference of two arrays. For example, ∊/⊣ uses 𝕨𝕩 to filter 𝕨 (from 𝕨𝕩), giving an intersection.

    +↗️
        "initial set" (∊/⊣) "intersect"     # Keep 𝕩
    +"initiset"
    +
    +    "initial set" (¬∊/⊣) "difference"  # Remove 𝕩
    +"tal st"
    +
    +

    These are the APL functions Intersect () and Without (~). Really, only 𝕩 is treated like a set, while the ordering and multiplicity of elements of 𝕨 are maintained. I think the explicit implementations show this well, since 𝕩 is only used as the right argument to , and prefer this clarity to the brevity of a single symbol.

    +

    Index of

    +

    Index of () returns the index of the first occurrence of each entry in 𝕨, or 𝕨 if an entry doesn't appear in 𝕨 at all.

    +↗️
        "zero""one""two""three"  "one""eight""two"
    +⟨ 1 4 2 ⟩
    +
    +

    𝕩𝕨 is the same as (𝕨𝕩)<≠𝕨. Note the reversal of arguments! In both and , the open side points to the searched-in argument and the closed side points to the searched-for argument. Relatedly, in Select (), the open side points to the selected argument, which is more like the searched-in argument in that its cells are generally accessed out of order (the searched-for argument is most like the selection result 𝕨𝕩).

    +

    Index of always returns exactly one number, even if there are multiple matches, or no matches at all. To find the indices of all matches, start with Match Each, then Indices (I didn't mean for it to sound so repetitive! It just happened!).

    +↗️
        / "letters" ¨< 'e'        # Many to one
    +⟨ 1 4 ⟩
    +
    +    "letters" (</˘⌜˜) "let"  # Many to many
    +⟨ ⟨ 0 ⟩ ⟨ 1 4 ⟩ ⟨ 2 3 ⟩ ⟩
    +
    +

    Progressive Index of

    +

    Progressive Index of (), as the name and glyph suggest, is a more sophisticated variant of Index of. Like Index of, it returns either 𝕨 or an index of a cell from 𝕨 that matches the given cell of 𝕩. Unlike Index of, no index except 𝕨 can ever be repeated. Progressive Index of returns the index of the first unused match, provided there's still one left.

    +↗️
        "aaa"  "aaaaa"
    +⟨ 0 1 2 3 3 ⟩
    +
    +    "aaabb"  "ababababab"
    +⟨ 0 3 1 4 2 5 5 5 5 5 ⟩
    +
    +

    Above we said that 𝕩𝕨 is (𝕨𝕩)<≠𝕨, so that ˜<≠ is an implementation of Member of. The corresponding ˜<≠ implements progressive member of, that is, membership on multisets. So if 𝕩 contains two copies of 'a', only the first to instances of 'a' in 𝕨 are considered to belong to it. And like membership is useful for set intersection and difference, progressive membership gives multiset versions of these.

    +↗️
        "aabbcc" (˜<≠) "baa"
    +⟨ 1 1 1 1 0 0 ⟩
    +
    +    "aabbcc" (˜<≠) "baa"
    +⟨ 1 1 1 0 0 0 ⟩
    +
    +    "aabbcc" ((˜=≠)/⊣) "baa"  # Multiset difference
    +"bcc"
    +
    +

    This primitive gives an interesting way to implement the ordinals pattern that might be easier to understand than the APL classic ⍋⍋ (it's probably a little slower though). The idea is to use the sorted array as the left argument to . Now the index returned for each cell is just where it ended up in that sorted order. If we used ordinary Index of then equal cells would share the smallest index; Progressive Index of means ties are broken in favor of earlier cells.

    +↗️
         "adebcedba"
    +⟨ 0 5 7 2 4 8 6 3 1 ⟩
    +
    +     "adebcedba"
    +⟨ 0 5 7 2 4 8 6 3 1 ⟩
    +
    +     "adebcedba"  # Ties included
    +⟨ 0 5 7 2 4 7 5 2 0 ⟩
    +
    +

    Here's a goofy code golf tip: if the two arguments to Progressive Index of are the same, then every cell will be matched to itself, because all the previous indices are taken but the current one does match. So ˜ is the same as .

    +↗️
        ˜ "anything at all"
    +⟨ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 ⟩
    +
    -- cgit v1.2.3