From 247a554649e7cd0f165c716d4ec4e9a984349e56 Mon Sep 17 00:00:00 2001 From: Marshall Lochbaum Date: Fri, 16 Jul 2021 22:53:01 -0400 Subject: Actually add the files --- doc/find.md | 41 ++++++++++++++++++++++++++++ docs/doc/find.html | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 121 insertions(+) create mode 100644 doc/find.md create mode 100644 docs/doc/find.html diff --git a/doc/find.md b/doc/find.md new file mode 100644 index 00000000..b91a5930 --- /dev/null +++ b/doc/find.md @@ -0,0 +1,41 @@ +*View this file with results and syntax highlighting [here](https://mlochbaum.github.io/BQN/doc/find.html).* + +# Find + +Find (`⍷`) searches for occurrences of an array `𝕨` within `𝕩`. The result contains a boolean for each possible location, which is 1 if `𝕨` was found there and 0 if not. + + "xx" ⍷ "xxbdxxxcx" + +More precisely `𝕨` needs to [match](match.md) a contiguous selection from `𝕩`, which for strings means a substring. These subarrays of `𝕩` are also exactly the cells in the result of [Windows](windows.md). In fact we can use Windows to see all the arrays `𝕨` will be compared against. + + 2 ↕ "xxbdxxxcx" + + "xx"βŠΈβ‰‘Λ˜ 2 ↕ "xxbdxxxcx" + +Like Windows, the result usually doesn't have the same dimensions as `𝕩`. This is easier to see when `𝕨` is longer. It differs from APL's version, which includes trailing 0s in order to maintain the same length. Bringing the size up to that of `𝕩` is easy enough with [Take](take.md) (`↑`), while shortening a padded result would be harder. + + "string" ⍷ "substring" + + "string" (β‰’βˆ˜βŠ’β†‘β·) "substring" # APL style + +If `𝕨` is larger than `𝕩`, the result is empty, and there's no error even in cases where Windows would fail. One place this tends to come up is when applying [First](pick.md) (`βŠ‘`) the result: `βŠ‘β·` tests whether `𝕨` appears in `𝕩` at the first position, that is, whether it's a prefix of `𝕩`. If `𝕨` is longer than `𝕩` it shouldn't be a prefix, so 0 is appropriate. + + "loooooong" ⍷ "short" + + 9 ↕ "short" + + βŠ‘ "loooooong" ⍷ "short" + +This pattern also works in the high-rank case discussed below, testing whether `𝕨` is a multi-dimensional prefix starting at the lowest-index corner of `𝕩`. + +### Higher ranks + +If `𝕨` and `𝕩` are two-dimensional then Find does a two-dimensional search. The cells used are also found in `π•¨β‰’βŠΈβ†•π•©`. For example, the bottom-right corner of `𝕩` below matches `𝕨`, so there's a 1 in the bottom-right corner of the result. + + ⊒ a ← 7 (4|β‹†Λœ)βŒœβ—‹β†• 9 # Array with patterns + + (0β€Ώ3β€Ώ0≍0β€Ώ1β€Ώ0) ⍷ a + +It's also allowed for `𝕨` to have a smaller rank than `𝕩`; in this case leading axes of `𝕩` are mapped over so that axes of `𝕨` correspond to trailing axes of `𝕩`. This is a minor violation of the [leading axis](leading.md) principle, which would match axes of `𝕨` to leading axes of `𝕩` in order to make a function that's useful with the Rank operator, but such a function would be quite strange and hardly ever useful. + + 0β€Ώ1β€Ώ0β€Ώ1 ⍷ a diff --git a/docs/doc/find.html b/docs/doc/find.html new file mode 100644 index 00000000..bc883db2 --- /dev/null +++ b/docs/doc/find.html @@ -0,0 +1,80 @@ + + + + BQN: Find + + +

Find

+

Find (⍷) searches for occurrences of an array 𝕨 within 𝕩. The result contains a boolean for each possible location, which is 1 if 𝕨 was found there and 0 if not.

+↗️
    "xx" ⍷ "xxbdxxxcx"
+⟨ 1 0 0 0 1 1 0 0 ⟩
+
+

More precisely 𝕨 needs to match a contiguous selection from 𝕩, which for strings means a substring. These subarrays of 𝕩 are also exactly the cells in the result of Windows. In fact we can use Windows to see all the arrays 𝕨 will be compared against.

+↗️
    2 ↕ "xxbdxxxcx"
+β”Œβ”€    
+β•΅"xx  
+  xb  
+  bd  
+  dx  
+  xx  
+  xx  
+  xc  
+  cx" 
+     β”˜
+
+    "xx"βŠΈβ‰‘Λ˜ 2 ↕ "xxbdxxxcx"
+⟨ 1 0 0 0 1 1 0 0 ⟩
+
+

Like Windows, the result usually doesn't have the same dimensions as 𝕩. This is easier to see when 𝕨 is longer. It differs from APL's version, which includes trailing 0s in order to maintain the same length. Bringing the size up to that of 𝕩 is easy enough with Take (↑), while shortening a padded result would be harder.

+↗️
    "string" ⍷ "substring"
+⟨ 0 0 0 1 ⟩
+
+    "string" (β‰’βˆ˜βŠ’β†‘β·) "substring"  # APL style
+⟨ 0 0 0 1 0 0 0 0 0 ⟩
+
+

If 𝕨 is larger than 𝕩, the result is empty, and there's no error even in cases where Windows would fail. One place this tends to come up is when applying First (βŠ‘) the result: βŠ‘β· tests whether 𝕨 appears in 𝕩 at the first position, that is, whether it's a prefix of 𝕩. If 𝕨 is longer than 𝕩 it shouldn't be a prefix, so 0 is appropriate.

+↗️
    "loooooong" ⍷ "short"
+⟨⟩
+
+    9 ↕ "short"
+ERROR
+
+    βŠ‘ "loooooong" ⍷ "short"
+0
+
+

This pattern also works in the high-rank case discussed below, testing whether 𝕨 is a multi-dimensional prefix starting at the lowest-index corner of 𝕩.

+

Higher ranks

+

If 𝕨 and 𝕩 are two-dimensional then Find does a two-dimensional search. The cells used are also found in π•¨β‰’βŠΈβ†•π•©. For example, the bottom-right corner of 𝕩 below matches 𝕨, so there's a 1 in the bottom-right corner of the result.

+↗️
    ⊒ a ← 7 (4|β‹†Λœ)βŒœβ—‹β†• 9   # Array with patterns
+β”Œβ”€                   
+β•΅ 1 1 1 1 1 1 1 1 1  
+  0 1 2 3 0 1 2 3 0  
+  0 1 0 1 0 1 0 1 0  
+  0 1 0 3 0 1 0 3 0  
+  0 1 0 1 0 1 0 1 0  
+  0 1 0 3 0 1 0 3 0  
+  0 1 0 1 0 1 0 1 0  
+                    β”˜
+
+    (0β€Ώ3β€Ώ0≍0β€Ώ1β€Ώ0) ⍷ a
+β”Œβ”€               
+β•΅ 0 0 0 0 0 0 0  
+  0 0 0 0 0 0 0  
+  0 0 0 0 0 0 0  
+  0 0 1 0 0 0 1  
+  0 0 0 0 0 0 0  
+  0 0 1 0 0 0 1  
+                β”˜
+
+

It's also allowed for 𝕨 to have a smaller rank than 𝕩; in this case leading axes of 𝕩 are mapped over so that axes of 𝕨 correspond to trailing axes of 𝕩. This is a minor violation of the leading axis principle, which would match axes of 𝕨 to leading axes of 𝕩 in order to make a function that's useful with the Rank operator, but such a function would be quite strange and hardly ever useful.

+↗️
    0β€Ώ1β€Ώ0β€Ώ1 ⍷ a
+β”Œβ”€             
+β•΅ 0 0 0 0 0 0  
+  0 0 0 0 0 0  
+  1 0 1 0 1 0  
+  0 0 0 0 0 0  
+  1 0 1 0 1 0  
+  0 0 0 0 0 0  
+  1 0 1 0 1 0  
+              β”˜
+
-- cgit v1.2.3