From f113d9f57bdae219c87887c5e0781a5c824dc8e4 Mon Sep 17 00:00:00 2001 From: Marshall Lochbaum Date: Wed, 14 Jul 2021 21:10:09 -0400 Subject: Document search functions on lists --- docs/doc/search.html | 72 ++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 70 insertions(+), 2 deletions(-) (limited to 'docs/doc/search.html') diff --git a/docs/doc/search.html b/docs/doc/search.html index 118cd35a..285f8b46 100644 --- a/docs/doc/search.html +++ b/docs/doc/search.html @@ -6,6 +6,15 @@

Search functions

+ + + + + + + + + @@ -15,7 +24,7 @@ - + @@ -25,7 +34,7 @@ - + @@ -124,3 +133,62 @@

The searched-for argument is ๐•ฉ in Index-of functions (โАโŠ’) and ๐•จ in Member of (โˆŠ). Bins Up and Down (โ‹โ’) are ordering functions but follow the same pattern as Index-of. It's split into cells, but not necessarily major cells: instead, the cells used match the rank of a major cell of the other (searched-in) argument. In the most common case, when the searched-in argument is a list, 0-cells are used for the search (we might also say elements, as it gives the same result).

The result is always an array containing one number for each searched-for cell. For Index of and Member of, every result is computed independently; for Progressive Index of the result for a cell can depend on earlier cells, in index order.

+

Member of

+

The simplest of the search functions, Member of (โˆŠ) returns 1 if an entry in ๐•จ matches some entry in ๐•ฉ, and 0 if it doesn't.

+โ†—๏ธ
    "green"โ€ฟ"bricks"โ€ฟ"cow"โ€ฟ"blue" โˆŠ "red"โ€ฟ"green"โ€ฟ"blue"
+โŸจ 1 0 0 1 โŸฉ
+
+

The result is independent of the ordering of ๐•ฉ: all that matters is which cells it contains.

+

Member of can be used in a train to compute the set intersection and difference of two arrays. For example, โˆŠ/โŠฃ uses ๐•จโˆŠ๐•ฉ to filter ๐•จ (from ๐•จโŠฃ๐•ฉ), giving an intersection.

+โ†—๏ธ
    "initial set" (โˆŠ/โŠฃ) "intersect"     # Keep ๐•ฉ
+"initiset"
+
+    "initial set" (ยฌโˆ˜โˆŠ/โŠฃ) "difference"  # Remove ๐•ฉ
+"tal st"
+
+

These are the APL functions Intersect (โˆฉ) and Without (~). Really, only ๐•ฉ is treated like a set, while the ordering and multiplicity of elements of ๐•จ are maintained. I think the explicit implementations show this well, since ๐•ฉ is only used as the right argument to โˆŠ, and prefer this clarity to the brevity of a single symbol.

+

Index of

+

Index of (โА) returns the index of the first occurrence of each entry in ๐•จ, or โ‰ ๐•จ if an entry doesn't appear in ๐•จ at all.

+โ†—๏ธ
    "zero"โ€ฟ"one"โ€ฟ"two"โ€ฟ"three" โА "one"โ€ฟ"eight"โ€ฟ"two"
+โŸจ 1 4 2 โŸฉ
+
+

๐•ฉโˆŠ๐•จ is the same as (๐•จโА๐•ฉ)<โ‰ ๐•จ. Note the reversal of arguments! In both โˆŠ and โА, the open side points to the searched-in argument and the closed side points to the searched-for argument. Relatedly, in Select (โŠ), the open side points to the selected argument, which is more like the searched-in argument in that its cells are generally accessed out of order (the searched-for argument is most like the selection result ๐•จโŠ๐•ฉ).

+

Index of always returns exactly one number, even if there are multiple matches, or no matches at all. To find the indices of all matches, start with Match Each, then Indices (I didn't mean for it to sound so repetitive! It just happened!).

+โ†—๏ธ
    / "letters" โ‰กยจ< 'e'        # Many to one
+โŸจ 1 4 โŸฉ
+
+    "letters" (<โˆ˜/ห˜โ‰กโŒœหœ) "let"  # Many to many
+โŸจ โŸจ 0 โŸฉ โŸจ 1 4 โŸฉ โŸจ 2 3 โŸฉ โŸฉ
+
+

Progressive Index of

+

Progressive Index of (โŠ’), as the name and glyph suggest, is a more sophisticated variant of Index of. Like Index of, it returns either โ‰ ๐•จ or an index of a cell from ๐•จ that matches the given cell of ๐•ฉ. Unlike Index of, no index except โ‰ ๐•จ can ever be repeated. Progressive Index of returns the index of the first unused match, provided there's still one left.

+โ†—๏ธ
    "aaa" โŠ’ "aaaaa"
+โŸจ 0 1 2 3 3 โŸฉ
+
+    "aaabb" โŠ’ "ababababab"
+โŸจ 0 3 1 4 2 5 5 5 5 5 โŸฉ
+
+

Above we said that ๐•ฉโˆŠ๐•จ is (๐•จโА๐•ฉ)<โ‰ ๐•จ, so that โАหœ<โ‰ โˆ˜โŠข is an implementation of Member of. The corresponding โŠ’หœ<โ‰ โˆ˜โŠข implements progressive member of, that is, membership on multisets. So if ๐•ฉ contains two copies of 'a', only the first to instances of 'a' in ๐•จ are considered to belong to it. And like membership is useful for set intersection and difference, progressive membership gives multiset versions of these.

+โ†—๏ธ
    "aabbcc" (โАหœ<โ‰ โˆ˜โŠข) "baa"
+โŸจ 1 1 1 1 0 0 โŸฉ
+
+    "aabbcc" (โŠ’หœ<โ‰ โˆ˜โŠข) "baa"
+โŸจ 1 1 1 0 0 0 โŸฉ
+
+    "aabbcc" ((โŠ’หœ=โ‰ โˆ˜โŠข)/โŠฃ) "baa"  # Multiset difference
+"bcc"
+
+

This primitive gives an interesting way to implement the ordinals pattern that might be easier to understand than the APL classic โ‹โ‹ (it's probably a little slower though). The idea is to use the sorted array as the left argument to โŠ’. Now the index returned for each cell is just where it ended up in that sorted order. If we used ordinary Index of then equal cells would share the smallest index; Progressive Index of means ties are broken in favor of earlier cells.

+โ†—๏ธ
    โ‹โˆ˜โ‹ "adebcedba"
+โŸจ 0 5 7 2 4 8 6 3 1 โŸฉ
+
+    โˆงโŠธโŠ’ "adebcedba"
+โŸจ 0 5 7 2 4 8 6 3 1 โŸฉ
+
+    โˆงโŠธโА "adebcedba"  # Ties included
+โŸจ 0 5 7 2 4 7 5 2 0 โŸฉ
+
+

Here's a goofy code golf tip: if the two arguments to Progressive Index of are the same, then every cell will be matched to itself, because all the previous indices are taken but the current one does match. So โŠ’หœ is the same as โ†•โˆ˜โ‰ .

+โ†—๏ธ
    โŠ’หœ "anything at all"
+โŸจ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 โŸฉ
+
-- cgit v1.2.3