From 7b81224a5929c919815eb0a770cac93fa1325174 Mon Sep 17 00:00:00 2001 From: Marshall Lochbaum Date: Mon, 14 Jun 2021 22:06:01 -0400 Subject: Documentation for Indices and Replicate --- docs/doc/replicate.html | 233 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 233 insertions(+) create mode 100644 docs/doc/replicate.html (limited to 'docs/doc/replicate.html') diff --git a/docs/doc/replicate.html b/docs/doc/replicate.html new file mode 100644 index 00000000..752fc7e1 --- /dev/null +++ b/docs/doc/replicate.html @@ -0,0 +1,233 @@ + + + + BQN: Indices and Replicate + + +

Indices and Replicate

+ + + + + + + + + 𝕨 + 𝕩 + 𝕨/𝕩 + /𝕨 + + 'r' + 'e' + 'p' + 'l' + 'i' + 'c' + 'a' + 't' + 'e' + 0 + 1 + 1 + 0 + 3 + 2 + 0 + 0 + 0 + 'e' + 'p' + 'i' + 'i' + 'i' + 'c' + 'c' + 1 + 2 + 4 + 4 + 4 + 5 + 5 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

The functions Indices and Replicate are used to copy or filter data. They might be described as transforming a run-length encoding into unencoded form. On the other hand, Indices might be described as giving a sparse representation of 𝕩, which is smaller if 𝕩 mostly consists of zeros.

+

BQN doesn't have any of the various features used in APL to add fills to the result of Replicate, like negative numbers in 𝕨 or an Expand (\) primitive. An alternative to Expand is to use Replicate with structural Under () to insert values into an array of fills.

+

Replicate

+

Given a list of natural numbers 𝕨, Replicate repeats each major cell in 𝕩 the corresponding number of times. That is, 𝕨 and 𝕩 must have the same length, and the result includes i𝕨 copies of each cell i𝕩, in order.

+↗️
    2102 / "abcd"
+"aabdd"
+
+     a  >"aa0""bb1""cc2""dd3"
+┌─     
+╵"aa0  
+  bb1  
+  cc2  
+  dd3" 
+      ┘
+
+    2102 / a
+┌─     
+╵"aa0  
+  aa0  
+  bb1  
+  dd3  
+  dd3" 
+      ┘
+
+

It's also allowed for 𝕨 to be a single number (or enclosed number: it just needs to be a unit and not a list). In this case every cell of 𝕩 is repeated that number of times.

+↗️
    3 / "copy"
+"cccooopppyyy"
+
+

When 𝕨 is a list of booleans, a cell is never repeated more than once, meaning that each cell of 𝕩 is either left out (0), or kept in (1). If Fn is a function with a boolean result, Fn¨/ filters elements of a list according to Fn.

+↗️
    110010 / "filter"
+"fie"
+
+    'i' "filter"
+⟨ 1 1 0 0 1 0 ⟩
+
+    'i'/ "filter"
+"fie"
+
+

Here 'i' is a pervasive function, so there's no need to add ¨. Similarly, to filter major cells of an array, Fn˘/ could be used, applying Fn to one major cell at a time.

+

A similar pattern applies to Replicate as well. The function below tests which input characters are double quotes, but by adding one it changes the result to 1 for each non-quote character and 2 for quotes (but source code and display also double quotes here, so the input string has only two "s and the output has four).

+↗️
    {1+'"'=𝕩}/ "for ""escaping"" quotes"
+"for """"escaping"""" quotes"
+
+

Compound Replicate

+

If 𝕨 has depth two, then its elements give the amounts to copy along each leading axis of 𝕩.

+↗️
     b  25  10
+┌─           
+╵ 0 1 2 3 4  
+  5 6 7 8 9  
+            ┘
+
+    20, 10011 / b
+┌─       
+╵ 0 3 4  
+  0 3 4  
+        ┘
+
+    20 / 10011/˘ b
+┌─       
+╵ 0 3 4  
+  0 3 4  
+        ┘
+
+

Here the 20 indicates that the first row of b is copied twice and the second is ignored, while 10011 picks out three entries from that row. 𝕩 can also have more axes than elements of 𝕨, and the trailing ones aren't changed, just like the simpler case. However, 𝕨 has to have at least as many elements as 𝕩 has axes (so (𝕨)≥=𝕩), and each element has to have the same length as the corresponding axis in 𝕩—or it can be a unit, as shown below.

+↗️
    <2,<3 / b
+┌─                               
+╵ 0 0 0 1 1 1 2 2 2 3 3 3 4 4 4  
+  0 0 0 1 1 1 2 2 2 3 3 3 4 4 4  
+  5 5 5 6 6 6 7 7 7 8 8 8 9 9 9  
+  5 5 5 6 6 6 7 7 7 8 8 8 9 9 9  
+                                ┘
+
+

Above, both elements of 𝕨 are enclosed numbers. An individual element doesn't have to be enclosed, but I don't recommend this, since if none of them are enclosed, then 𝕨 will have depth 1 and it will be interpreted as replicating along the first axis only.

+↗️
    2,3 / b
+┌─           
+╵ 0 1 2 3 4  
+  0 1 2 3 4  
+  5 6 7 8 9  
+  5 6 7 8 9  
+  5 6 7 8 9  
+            ┘
+
+

The example above has a different result from the previous one! The function <¨/ could be used in place of / to replicate each axis by a different number.

+

If 𝕨 is ⟨⟩, then it has depth 1, but is handled with the multidimensional case anyway, giving a result of 𝕩. The one-dimensional case could also only ever return 𝕩, but it would be required to have length 0, so this convention still only extends the simple case.

+↗️
    b  ⟨⟩ / b
+1
+
+

Indices

+

The monadic form of / is much simpler than the dyadic one, with no multidimensional case or mismatched argument ranks. 𝕩 must be a list of natural numbers, and /𝕩 is the list 𝕩/↕≠𝕩. Its elements are the indices for 𝕩, with index i repeated i𝕩 times.

+↗️
    / 3012
+⟨ 0 0 0 2 3 3 ⟩
+
+

A unit argument isn't allowed, and isn't very useful: for example, /6 might indicate an array of six zeros, but this can be written /⥊6 or 60 with hardly any extra effort.

+

When 𝕨 has rank 1, 𝕨/𝕩 is equivalent to 𝕨/𝕩. Of course, this isn't the only use of Indices. It also gets along well with Group: for example, / groups 𝕩 according to a list of lengths 𝕨.

+↗️
    2501 / "ABCDEFGH"
+⟨ "AB" "CDEFG" ⟨⟩ "H" ⟩
+
+

This function will fail to include trailing empty arrays; the modification (/∾1) fixes this and ensures the result always has as many elements as 𝕨.

+

If 𝕩 is boolean then /𝕩 contains all the indices where a 1 appears in 𝕩. Applying -» to the result gives the distance from each 1 to the previous, or to the start of the list, another potentially useful function.

+↗️
    / 0101000010
+⟨ 1 3 8 ⟩
+
+    -» / 0101000010
+⟨ 1 2 5 ⟩
+
+

With more effort we can also use / to analyze groups of 1s in the argument (and of course all these methods can be applied to 0s instead, by first flipping the values with ¬). First we highlight the start and end of each group by comparing the list with a shifted copy of itself. Or rather, we'll first place a 0 at the front and then at the end, in order to detect when a group starts at the beginning of the list or ends at the end (there's also a shift-based version, «0𝕩).

+↗️
    0 (∾≍∾˜) 01110010110
+┌─                         
+╵ 0 0 1 1 1 0 0 1 0 1 1 0  
+  0 1 1 1 0 0 1 0 1 1 0 0  
+                          ┘
+
+    0 (∾≠∾˜) 01110010110
+⟨ 0 1 0 0 1 0 1 1 1 0 1 0 ⟩
+
+    / 0(∾≠∾˜) 01110010110
+⟨ 1 4 6 7 8 10 ⟩
+
+

So now we have the indices of each transition from 0 to 1 or 1 to 0, in an extended list with 0 added at the beginning and end. The first index has to be for a 0 to 1 transition, because we forced the first value to be a 0, and then the next can only be 1 to 0, then 0 to 1, and so on until the last, which must be 1 to 0 because the last value is also 0.

+↗️
    -˜`˘ 2⥊/ 0(∾≠∾˜) 01110010110
+┌─     
+╵ 1 3  
+  6 1  
+  8 2  
+      ┘
+
+

This means the transitions can be grouped exactly in pairs, the beginning and end of each group. Reshape with a computed length 2 groups these pairs, and then a scan -˜`˘ can be used to convert the start/end format to start/length if wanted.

+

Inverse

+

The result of Indices /n is an ordered list of natural numbers, where the number i appears in times. Given an ordered list of natural numbers k, the inverse of indices returns a corresponding n: one where the value in is the number of times i appears in k.

+↗️
    / 321
+⟨ 0 0 0 1 1 2 ⟩
+
+    / 000112
+⟨ 3 2 1 ⟩
+
+

Finding how many times each index appears in a list of indices is often a useful thing to do, and there are a few ways to do it:

+↗️
    +˝˘ (5) = 224120  # Inefficient
+⟨ 1 1 3 0 1 ⟩
+
+    ¨ 224120
+⟨ 1 1 3 0 1 ⟩
+
+    / 224120
+⟨ 1 1 3 0 1 ⟩
+
+

For / to work, the argument has to be sorted: otherwise it won't be a valid result of /. But sorting with is no problem, and / will probably be faster than ¨ in the absence of special handling for either combination.

-- cgit v1.2.3