Documentation for Indices and Replicate

author: Marshall Lochbaum <mwlochbaum@gmail.com> 2021-06-14 22:06:01 -0400
committer: Marshall Lochbaum <mwlochbaum@gmail.com> 2021-06-14 22:06:01 -0400
commit: 7b81224a5929c919815eb0a770cac93fa1325174 (patch)
tree: e2234d58df0c05771c48e90afe1dde27542e4ef3 /doc/replicate.md
parent: 4c1292ee8e27e59891a60c2f1bf3c326d4dba51f (diff)
1 files changed, 146 insertions, 0 deletions
diff --git a/doc/replicate.md b/doc/replicate.md
new file mode 100644
index 00000000..106a36e5
--- /dev/null
+++ b/doc/replicate.md
@@ -0,0 +1,146 @@
+*View this file with results and syntax highlighting [here](https://mlochbaum.github.io/BQN/doc/replicate.html).*
+
+# Indices and Replicate
+
+<!--GEN
+d ← 48‿22
+
+rc ← At "class=code|stroke-width=1.5|rx=12"
+Ge ← "g"⊸At⊸Enc
+g  ← "font-family=BQN,monospace|font-size=18px|text-anchor=middle"
+hg ← "class=bluegreen|stroke-width=0|opacity=0.2"
+cg ← "font-size=24px|text-anchor=end"
+lg ← "stroke-linecap=round|stroke=currentColor|opacity=0.7"
+
+wv ← 0‿1‿1‿0‿3‿2‿0‿0‿0
+xl ← ≠ xc ← ⊐ xt ← '''(Highlight∾∾⊣)¨"replicate"
+
+Text ← ("text" Attr "dy"‿"0.29em"∾(Pos d⊸×))⊸Enc
+Line ← "line" Elt ("xy"≍⌜"12")≍˘○⥊ ·FmtNum d×⊢
+Rp ← Pos⊸∾⟜("width"‿"height"≍˘FmtNum)○(d⊸×)
+
+tx ← ↕xl ⋄ y ← » yd ← +`0.6+1‿2‿1‿1.8
+dim ← ⟨2+xl,¯1⊑yd⟩ ⋄ sh ← ¯2.1‿¯1.3
+tp ← y ≍˜¨¨ 2 / ⟨tx,↕+´wv⟩
+hp ← 0.2‿¯0.7(+⟜(1‿0×sh)≍¯2⊸×⊸+)1‿0×dim
+Ll ← Line∘⍉ ≍ + (0≍0.05×-○⊑)≍˘0.45‿¯0.55˙
+
+((∾˜d)×((-∾+˜)0.7‿0.4)+sh∾dim) SVG g Ge ⟨
+  "rect" Elt rc ∾ sh Rp dim
+  hg Ge ("rect" Elt ·Rp˝ {𝕩⊸+⌾(1⊑⊏)hp})¨ 0‿2⊏y
+  cg Ge (¯0.7≍¨y) Text⟜Highlight¨ "𝕨"‿"𝕩"‿"𝕨/𝕩"‿"/𝕨"
+  tp Text¨○∾ Highlight∘•Repr¨¨⌾(1‿3⊸⊏) xt‿wv‿(wv/xt)‿(/wv)
+  lg Ge ⟨
+    "stroke-width=0.4" Ge   Ll¨ ´ (0=wv)⊸/¨  2⊸↑ tp
+    "stroke-width=1.5" Ge > Ll¨¨˝˘ 2↕ wv⊸/¨⌾(2⊸↑)tp
+  ⟩
+⟩
+-->
+
+The functions Indices and Replicate are used to copy or filter data. They might be described as transforming a [run-length encoding](https://en.wikipedia.org/wiki/Run-length_encoding) into unencoded form. On the other hand, Indices might be described as giving a sparse representation of `𝕩`, which is smaller if `𝕩` mostly consists of zeros.
+
+BQN doesn't have any of the various features used in APL to add fills to the result of Replicate, like negative numbers in `𝕨` or an Expand (`\`) primitive. An alternative to Expand is to use Replicate with structural Under (`⌾`) to insert values into an array of fills.
+
+## Replicate
+
+Given a list of natural numbers `𝕨`, Replicate repeats each major cell in `𝕩` the corresponding number of times. That is, `𝕨` and `𝕩` must have the same length, and the result includes `i⊑𝕨` copies of each cell `i⊏𝕩`, in order.
+
+        2‿1‿0‿2 / "abcd"
+
+        ⊢ a ← >"aa0"‿"bb1"‿"cc2"‿"dd3"
+
+        2‿1‿0‿2 / a
+
+It's also allowed for `𝕨` to be a single number (or enclosed number: it just needs to be a unit and not a list). In this case every cell of `𝕩` is repeated that number of times.
+
+        3 / "copy"
+
+When `𝕨` is a list of booleans, a cell is never repeated more than once, meaning that each cell of `𝕩` is either left out (0), or kept in (1). If `Fn` is a function with a boolean result, `Fn¨⊸/` filters elements of a list according to `Fn`.
+
+        1‿1‿0‿0‿1‿0 / "filter"
+
+        ≤⟜'i' "filter"
+
+        ≤⟜'i'⊸/ "filter"
+
+Here `≤⟜'i'` is a pervasive function, so there's no need to add `¨`. Similarly, to filter major cells of an array, `Fn˘⊸/` could be used, applying `Fn` to one major cell at a time.
+
+A similar pattern applies to Replicate as well. The function below tests which input characters are double quotes, but by adding one it changes the result to 1 for each non-quote character and 2 for quotes (but source code and display also double quotes here, so the input string has only two `"`s and the output has four).
+
+        {1+'"'=𝕩}⊸/ "for ""escaping"" quotes"
+
+### Compound Replicate
+
+If `𝕨` has [depth](depth.md) two, then its elements give the amounts to copy along each [leading axis](leading.md) of `𝕩`.
+
+        ⊢ b ← 2‿5 ⥊ ↕10
+
+        ⟨2‿0, 1‿0‿0‿1‿1⟩ / b
+
+        2‿0 / 1‿0‿0‿1‿1⊸/˘ b
+
+Here the `2‿0` indicates that the first row of `b` is copied twice and the second is ignored, while `1‿0‿0‿1‿1` picks out three entries from that row. `𝕩` can also have more axes than elements of `𝕨`, and the trailing ones aren't changed, just like the simpler case. However, `𝕨` has to have at least as many elements as `𝕩` has axes (so `(≠𝕨)≥=𝕩`), and each element has to have the same length as the corresponding axis in `𝕩`—or it can be a unit, as shown below.
+
+        ⟨<2,<3⟩ / b
+
+Above, both elements of `𝕨` are enclosed numbers. An individual element doesn't have to be enclosed, but I don't recommend this, since if *none* of them are enclosed, then `𝕨` will have depth 1 and it will be interpreted as replicating along the first axis only.
+
+        ⟨2,3⟩ / b
+
+The example above has a different result from the previous one! The function `<¨⊸/` could be used in place of `/` to replicate each axis by a different number.
+
+If `𝕨` is `⟨⟩`, then it has depth 1, but is handled with the multidimensional case anyway, giving a result of `𝕩`. The one-dimensional case could also only ever return `𝕩`, but it would be required to have length 0, so this convention still only extends the simple case.
+
+        b ≡ ⟨⟩ / b
+
+## Indices
+
+The monadic form of `/` is much simpler than the dyadic one, with no multidimensional case or mismatched argument ranks. `𝕩` must be a list of natural numbers, and `/𝕩` is the list `𝕩/↕≠𝕩`. Its elements are the indices for `𝕩`, with index `i` repeated `i⊑𝕩` times.
+
+        / 3‿0‿1‿2
+
+A unit argument isn't allowed, and isn't very useful: for example, `/6` might indicate an array of six zeros, but this can be written `/⥊6` or `6⥊0` with hardly any extra effort.
+
+When `𝕨` has rank 1, `𝕨/𝕩` is equivalent to `𝕨/⊸⊏𝕩`. Of course, this isn't the only use of Indices. It also gets along well with [Group](group.md): for example, `/⊸⊔` groups `𝕩` according to a list of lengths `𝕨`.
+
+        2‿5‿0‿1 /⊸⊔ "ABCDEFGH"
+
+This function will fail to include trailing empty arrays; the modification `(/∾⟜1)⊸⊔` fixes this and ensures the result always has as many elements as `𝕨`.
+
+If `𝕩` is boolean then `/𝕩` contains all the indices where a 1 appears in `𝕩`. Applying `-⟜»` to the result gives the distance from each 1 to the previous, or to the start of the list, another potentially useful function.
+
+        / 0‿1‿0‿1‿0‿0‿0‿0‿1‿0
+
+        -⟜» / 0‿1‿0‿1‿0‿0‿0‿0‿1‿0
+
+With more effort we can also use `/` to analyze groups of 1s in the argument (and of course all these methods can be applied to 0s instead, by first flipping the values with `¬`). First we highlight the start and end of each group by comparing the list with a shifted copy of itself. Or rather, we'll first place a 0 at the front and then at the end, in order to detect when a group starts at the beginning of the list or ends at the end (there's also a [shift](shift.md)-based version, `≠⟜«0∾𝕩`).
+
+        0 (∾≍∾˜) 0‿1‿1‿1‿0‿0‿1‿0‿1‿1‿0
+
+        0 (∾≠∾˜) 0‿1‿1‿1‿0‿0‿1‿0‿1‿1‿0
+
+        / 0(∾≠∾˜) 0‿1‿1‿1‿0‿0‿1‿0‿1‿1‿0
+
+So now we have the indices of each transition from 0 to 1 or 1 to 0, in an extended list with 0 added at the beginning and end. The first index has to be for a 0 to 1 transition, because we forced the first value to be a 0, and then the next can only be 1 to 0, then 0 to 1, and so on until the last, which must be 1 to 0 because the last value is also 0.
+
+        -˜`˘ ∘‿2⥊/ 0(∾≠∾˜) 0‿1‿1‿1‿0‿0‿1‿0‿1‿1‿0
+
+This means the transitions can be grouped exactly in pairs, the beginning and end of each group. Reshape with a [computed length](reshape.md#computed-lengths) `∘‿2` groups these pairs, and then a scan ``-˜`˘`` can be used to convert the start/end format to start/length if wanted.
+
+### Inverse
+
+The result of Indices `/n` is an ordered list of natural numbers, where the number `i` appears `i⊑n` times. Given an ordered list of natural numbers `k`, the *inverse* of indices returns a corresponding `n`: one where the value `i⊑n` is the number of times `i` appears in `k`.
+
+        / 3‿2‿1
+
+        /⁼ 0‿0‿0‿1‿1‿2
+
+Finding how many times each index appears in a list of indices is often a useful thing to do, and there are a few ways to do it:
+
+        +˝˘ (↕5) =⌜ 2‿2‿4‿1‿2‿0  # Inefficient
+
+        ≠¨⊔ 2‿2‿4‿1‿2‿0
+
+        /⁼∧ 2‿2‿4‿1‿2‿0
+
+For `/⁼` to work, the argument has to be sorted: otherwise it won't be a valid result of `/`. But sorting with `∧` is no problem, and `/⁼∧` will probably be faster than `≠¨⊔` in the absence of special handling for either combination.
author	Marshall Lochbaum <mwlochbaum@gmail.com>	2021-06-14 22:06:01 -0400
committer	Marshall Lochbaum <mwlochbaum@gmail.com>	2021-06-14 22:06:01 -0400
commit	7b81224a5929c919815eb0a770cac93fa1325174 (patch)
tree	e2234d58df0c05771c48e90afe1dde27542e4ef3 /doc/replicate.md
parent	4c1292ee8e27e59891a60c2f1bf3c326d4dba51f (diff)