From 471f4537dd92dbd09ee009e86df1e2771ce07c4d Mon Sep 17 00:00:00 2001 From: Marshall Lochbaum Date: Sun, 13 Sep 2020 19:30:33 -0400 Subject: Update Group document for Classify and shifts --- doc/group.md | 14 ++++++++------ docs/doc/group.html | 17 ++++++++++------- 2 files changed, 18 insertions(+), 13 deletions(-) diff --git a/doc/group.md b/doc/group.md index 81e16f94..592b868d 100644 --- a/doc/group.md +++ b/doc/group.md @@ -63,11 +63,11 @@ Group can even be implemented with the same techniques as a bucket sort, which c ## Applications -The obvious application of Group is to group some values according to a known or computed property. If this property isn't an integer, it can be turned into one using Unique and Index Of (the combination `⍷⊸⊐` has been called "self-classify"). +The obvious application of Group is to group some values according to a known or computed property. If this property isn't an integer, it can be turned into one using Classify (monadic `⊐`, identical to `⍷⊸⊐`). Classify numbers the unique values in its argument by first occurrence. ln ← "Phelps"‿"Latynina"‿"Bjørgen"‿"Andrianov"‿"Bjørndalen" co ← "US" ‿"SU" ‿"NO" ‿"SU" ‿"NO" - ⥊˘ co ⍷⊸⊐⊸⊔ ln + ⥊˘ co ⊐⊸⊔ ln If we would like a particular index to key correspondence, we can use a fixed left argument to Index Of. @@ -95,8 +95,10 @@ In other cases, we might want to split on spaces, so that words are separated by ' '((⊢-˜¬×+`)∘=⊔⊢)" string with spaces " -However, trailing spaces are ignored because Group never produces trailing empty groups (to get them back we would use a dummy final character in the string). To avoid empty words, we should increase the word index only once per group of spaces. We can do this by taking the prefix sum of a list that is 1 only for a space with no space before it. To make such a list, we can use the [Windows](windows.md) function. We will extend our list with an initial 1 so that leading spaces will be ignored. Then we take windows of the same length as the original list: the first includes the dummy argument followed by a shifted copy of the list, and the second is the original list. These represent whether the previous and current characters are spaces; we want positions where the previous wasn't a space and the current is. +However, trailing spaces are ignored because Group never produces trailing empty groups (to get them back we would use a dummy final character in the string). To avoid empty words, we should increase the word index only once per group of spaces. We can do this by taking the prefix sum of a list that is 1 only for a space with no space before it. To make such a list, we can use the [Shift Before](shift.md) function, giving a list of previous elements. To treat the first element as if it's before a space (so that leading spaces have no effect rather than creating an initial empty group), we shift in a 1. - ≍⟜(<˝≠↕1∾⊢) ' '=" string with spaces " # All, then filtered, spaces - ≍⟜(⊢-˜¬×+`∘(<˝≠↕1∾⊢))' '=" string with spaces " # More processing - ' '((⊢-˜¬×+`∘(<˝≠↕1∾⊢))∘=⊔⊢)" string with spaces " # Final result + (⊢≍1⊸»<⊢) ' '=" string with spaces " # All, then filtered, spaces + ≍⟜(⊢-˜¬×·+`1⊸»<⊢)' '=" string with spaces " # More processing + ' '((⊢-˜¬×·+`1⊸»<⊢)∘=⊔⊢)" string with spaces " # Final result + + ' '((¬-˜⊢×·+`»⊸>)∘≠⊔⊢)" string with spaces " # Slightly shorter diff --git a/docs/doc/group.html b/docs/doc/group.html index 25bc3748..be2192b2 100644 --- a/docs/doc/group.html +++ b/docs/doc/group.html @@ -90,10 +90,10 @@

Group can even be implemented with the same techniques as a bucket sort, which can be branchless and fast.

Applications

-

The obvious application of Group is to group some values according to a known or computed property. If this property isn't an integer, it can be turned into one using Unique and Index Of (the combination has been called "self-classify").

-↗️
    ln  "Phelps""Latynina""Bjørgen""Andrianov""Bjørndalen"
+

The obvious application of Group is to group some values according to a known or computed property. If this property isn't an integer, it can be turned into one using Classify (monadic , identical to ). Classify numbers the unique values in its argument by first occurrence.

+↗️
    ln  "Phelps""Latynina""Bjørgen""Andrianov""Bjørndalen"
     co  "US"    "SU"      "NO"     "SU"       "NO"
-    ˘ co  ln
+    ˘ co  ln
 ┌─                            
 ╵ ⟨ "Phelps" ⟩                
   ⟨ "Latynina" "Andrianov" ⟩  
@@ -137,18 +137,21 @@
 ↗️
    ' '((⊢-˜¬×+`)=⊔⊢)"  string with  spaces   "
 ⟨ ⟨⟩ ⟨⟩ "string" "with" ⟨⟩ "spaces" ⟩
 
-

However, trailing spaces are ignored because Group never produces trailing empty groups (to get them back we would use a dummy final character in the string). To avoid empty words, we should increase the word index only once per group of spaces. We can do this by taking the prefix sum of a list that is 1 only for a space with no space before it. To make such a list, we can use the Windows function. We will extend our list with an initial 1 so that leading spaces will be ignored. Then we take windows of the same length as the original list: the first includes the dummy argument followed by a shifted copy of the list, and the second is the original list. These represent whether the previous and current characters are spaces; we want positions where the previous wasn't a space and the current is.

-↗️
    (<˝≠↕1∾⊢) ' '="  string with  spaces   "  # All, then filtered, spaces
+

However, trailing spaces are ignored because Group never produces trailing empty groups (to get them back we would use a dummy final character in the string). To avoid empty words, we should increase the word index only once per group of spaces. We can do this by taking the prefix sum of a list that is 1 only for a space with no space before it. To make such a list, we can use the Shift Before function, giving a list of previous elements. To treat the first element as if it's before a space (so that leading spaces have no effect rather than creating an initial empty group), we shift in a 1.

+↗️
    (⊢≍1»<⊢) ' '="  string with  spaces   "  # All, then filtered, spaces
 ┌─                                                 
 ╵ 1 1 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 1 1 1  
   0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0  
                                                   ┘
-    (⊢-˜¬×+`(<˝≠↕1∾⊢))' '="  string with  spaces   "  # More processing
+    (⊢-˜¬×·+`1»<⊢)' '="  string with  spaces   "  # More processing
 ┌─                                                         
 ╵  1  1 0 0 0 0 0 0  1 0 0 0 0  1  1 0 0 0 0 0 0  1  1  1  
   ¯1 ¯1 0 0 0 0 0 0 ¯1 1 1 1 1 ¯1 ¯1 2 2 2 2 2 2 ¯1 ¯1 ¯1  
                                                           ┘
-    ' '((⊢-˜¬×+`(<˝≠↕1∾⊢))=⊔⊢)"  string with  spaces   "  # Final result
+    ' '((⊢-˜¬×·+`1»<⊢)=⊔⊢)"  string with  spaces   "  # Final result
+⟨ "string" "with" "spaces" ⟩
+
+    ' '((¬-˜⊢×·+`»>)≠⊔⊢)"  string with  spaces   "  # Slightly shorter
 ⟨ "string" "with" "spaces" ⟩
 
-- cgit v1.2.3