Difference between revisions of "List function suggestions"
JaredUpdike (talk  contribs) (reworked split to mirror ByteString) 
(findSublistIndex) 

(21 intermediate revisions by 13 users not shown)  
Line 1:  Line 1:  
−  = Let's fix this = 

+  This page lists proposed extensions to the Haskell list functions, whether in the [http://www.haskell.org/ghc/docs/latest/html/libraries/base/Prelude.html Prelude] or [http://www.haskell.org/ghc/docs/latest/html/libraries/base/DataList.html Data.List]. 

+  Please discuss the proposals on the Talk Page or the libraries list, and use this page to record the results of discussions. 

+  However, since the advent of [[Hackage]]DB and [[CabalInstall]] it is preferred to provide such functionality in specialised packages, rather than extending the already large base library. 

−  We need these useful functions in Data.List; I'll call them 'split' (and variants) and 'replace'. These are easily implemented but everyone always reinvents them. The goal is clarity/uniformity (everyone uses them widely and recognizes them) and portability (I don't have to keep reimplementing these or copying that one file UsefulMissingFunctions.hs). 

+  == Splitting on a separator, etc == 

−  Use this page to record consensus as reached on the Talk Page. (Use four tildes to sign your post automatically with your name/timestamp.) Diverging opinions welcome! Note: a lot of good points (diverging opinions!) are covered in the mailing lists, but if we include all these various cases, split* will have 9 variants! I'm working on trying to organize all this into something meaningful. 

+  We need these useful functions in Data.List; I'll call them 'split' (and variants) and 'replace'. These are easily implemented but everyone always reinvents them. Various versions have been proposed, but there was no consensus on which was best, e.g. 

−  == Summary == 

+  * [http://www.haskell.org/pipermail/haskellcafe/2006July/thread.html#16559 haskellcafe thread July 2006] 

+  * [http://www.haskell.org/pipermail/libraries/2004July/thread.html#2342 libraries thread July 2004] 

−  <i> 

+  Note: a lot of good points (diverging opinions!) are covered in the mailing lists, but if we include all these various cases, split* will have 9 variants! The goal is to reach some kind of reasonable consensus, specifically on naming and semantics. Even if we need pairs of functions to satisfy various usage and algebraic needs. Failing to accommodate every possible use of these functions should not be a sufficient reason to abandon the whole project. 

−  Hacking up your own custom split (or a tokens/splitOnGlue) must be one 

−  of the most common questions from beginners on the irc channel. 

−  Anyone rememeber what the result of the "let's get split into the base 

+  The goal is clarity/uniformity (everyone uses them widely and recognizes them) and portability (I don't have to keep reimplementing these or copying that one file UsefulMissingFunctions.hs). 

−  library" movement's work was? 

−  
−  ISTR there wasn't a concensus, so nothing happened. Which is silly, 

−  really  I agree we should definitely have a Data.List.split. 

−  </i> 

−  A thread July 2006 

+  Note: I (Jared Updike) am working with the belief that efficiency should not be a valid argument to bar these otherwise universally useful functions from the libraries; regexes are overkill for 'split' and 'replace' for common simple situations. Let's assume people will know (or learn) when they need heavier machinery (regexes, ByteString) and will use it when efficiency is important. We can try to facilitate this by reusing any names from ByteString, etc. 

−  http://www.haskell.org/pipermail/haskellcafe/2006July/thread.html#16559 

+  === split (working name) === 

−  A thread July 2004 

+  First of all: Check out whether the [http://hackage.haskell.org/cgibin/hackagescripts/package/split split] package provides, what you need. 

−  
−  http://www.haskell.org/pipermail/libraries/2004July/thread.html#2342 

−  
−  == Goal == 

−  
−  The goal is to reach some kind of reasonable consensus, specifically on naming and semantics. Even if we need pairs of functions to satisfy various usage and algebraic needs. Failing to accomodate every possible use of these functions should not be a sufficient reason to abandon the whole project. 

−  
−  Note: I (Jared Updike) am working with the belief that efficiency should not be a valid argument to bar these otherwise universally useful functions from the libraries; regexes are overkill for 'split' and 'replace' for common simple situations. Let's assume people will know (or learn) when they need heavier machinery (regexes, FPS/ByteString) and will use it when efficiency is important. We can try to facilitate this by reusing any names from FastPackedString and/or ByteString, etc. 

−  
−  = The Data.List functions = 

−  
−  === split (working name) === 

We need a few of these: 
We need a few of these: 

Line 47:  Line 31:  
<haskell> 
<haskell> 

−  join sep 
+  join sep . split sep = id 
</haskell> 
</haskell> 

Line 63:  Line 47:  
<haskell> 
<haskell> 

−  split' sep 
+  split' sep = filter (not . null) . split sep 
</haskell> 
</haskell> 

Line 71:  Line 55:  
tokensws = tokens' (`elem` " \f\v\t\n\r\b") 
tokensws = tokens' (`elem` " \f\v\t\n\r\b") 

−  tokensws "Hello there\n \n Haskellers! " == 
+  tokensws "Hello there\n \n Haskellers! " == 
["Hello", "there", "Haskellers!"] 
["Hello", "there", "Haskellers!"] 

</haskell> 
</haskell> 

Line 98:  Line 82:  
'''TODO: list names and reasons for/against''' 
'''TODO: list names and reasons for/against''' 

+  
+  Implemented for instance in [http://hackage.haskell.org/packages/archive/utilityht/0.0.1/doc/html/DataListHT.html#v%3Areplace utilityht]. 

=== join (working name) === 
=== join (working name) === 

Line 108:  Line 94:  
join sep = concat . intersperse sep 
join sep = concat . intersperse sep 

</haskell> 
</haskell> 

+  
+  Note: this function has been implemented as 'intercalate' in Data.List. 

'''TODO: copypaste things from threads mentioned above''' 
'''TODO: copypaste things from threads mentioned above''' 

Line 113:  Line 101:  
'''TODO: list names and reasons for/against''' 
'''TODO: list names and reasons for/against''' 

−  +  == Sorted lists == 

−  Such as endsWith, beginsWith, etc. 

+  The following are versions of standard prelude functions, but intended for sorted lists. The advantage is that they frequently reduce execution time by an O(n). The disadvantage is that the elements have to be members of Ord, and the lists have to be already sorted. 

+  
+  <haskell> 

+   Eliminates duplicate entries from the list, where duplication is defined 

+   by the 'eq' function. The last value is kept. 

+  sortedNubBy :: (a > a > Bool) > [a] > [a] 

+  sortedNubBy eq (x1 : xs@(x2 : _)) = 

+  if eq x1 x2 then sortedNubBy eq xs else x1 : sortedNubBy eq xs 

+  sortedNubBy _ xs = xs 

+  
+  sortedNub :: (Eq a) => [a] > [a] 

+  sortedNub = sortedNubBy (==) 

+  
+   Merge two sorted lists into a new sorted list. Where elements are equal 

+   the element from the first list is taken first. 

+  mergeBy :: (a > a > Ordering) > [a] > [a] > [a] 

+  mergeBy cmp xs@(x1:xs1) ys@(y1:ys1) = 

+  if cmp x1 y1 == GT 

+  then y1 : mergeBy cmp xs ys1 

+  else x1 : mergeBy cmp xs1 ys 

+  mergeBy _ [] ys = ys 

+  mergeBy _ xs [] = xs 

+  
+  merge :: (Ord a) => [a] > [a] > [a] 

+  merge = mergeBy compare 

+  </haskell> 

+  
+  <hask>mergeBy</hask> is implemented in [http://hackage.haskell.org/packages/archive/utilityht/0.0.1/doc/html/DataListHT.html#v%3AmergeBy utilityht]. 

+  
+  == Generalize groupBy and friends == 

+  
+  In the Haskell 98 List library, <hask>groupBy</hask> assumes that its argument function defines an equivalence, and the reference definition returns sublists where each element is equivalent to the first. The following definition, comparing adjacent elements, does the same thing on equivalence relations: 

+  <haskell> 

+  groupBy :: (a > a > Bool) > [a] > [[a]] 

+  groupBy rel [] = [] 

+  groupBy rel (x:xs) = (x:ys) : groupBy rel zs 

+  where (ys,zs) = groupByAux x xs 

+  groupByAux x0 (x:xs)  rel x0 x = (x:ys, zs) 

+  where (ys,zs) = groupByAux x xs 

+  groupByAux y xs = ([], xs) 

+  </haskell> 

+  However it is also useful on other relations, e.g. 

+  * Picking out maximal ascending sublists (runs): 

+  <haskell> 

+  > groupBy (<=) [7,3,5,9,6,8,3,5,4] 

+  [[7],[3,5,9],[6,8],[3,5],[4]] 

+  </haskell> 

+  * Picking out contiguous sublists from an ascending sequence: 

+  <haskell> 

+  > groupBy (\a b > a+1 == b) [1,2,3,4,6] 

+  [[1,2,3,4],[6]] 

+  </haskell> 

+  * Splitting a line at the start of each word: 

+  <haskell> 

+  > groupBy (\ c1 c2 > isLetter c1  not (isLetter c2)) "This is a line" 

+  ["This ","is ","a ","line"] 

+  </haskell> 

+  Since this more useful definition agrees with the Haskell 98 one on its specified domain, it should be a backwardscompatible replacement. 

+  
+  The same applies to <hask>nubBy</hask>, and possibly <hask>deleteBy</hask>, <hask>deleteFirstsBy</hask> and <hask>intersectBy</hask> (which could have more general types to make this clear). 

+  
+  See: 

+  * Libraries list on [http://www.haskell.org/pipermail/libraries/2007August/008028.html Data.List.groupBy with nontransitive equality predicate] 

+  * Implementation in [http://hackage.haskell.org/packages/archive/utilityht/0.0.1/doc/html/DataListHT.html#v%3AgroupBy utilityht] 

+  
+  == groupOn and sortOn == 

+  
+  Almost all uses of <hask>groupBy</hask> and <hask>sortBy</hask> actually use a specific compare function. This can (using a recent version of base) as 

+  <hask>sortBy (comparing fst)</hask> 

+  or 

+  <hask>sortBy (compare `on` fst)</hask>. 

+  Since this use is so common, it might be worthwhile to add separate functions for this: 

+  <haskell> 

+  sortOn :: Ord b => (a > b) > [a] > [a] 

+  sortOn = sortBy . comparing 

+  </haskell> 

+  The same goes for <hask>groupBy</hask> 

+  <haskell> 

+  groupOn :: Eq b => (a > b) > [a] > [[a]] 

+  groupOn f = groupBy ((==) `on` f) 

+  </haskell> 

+  That is, 

+  <haskell> 

+  groupOn' :: Eq b => (a > b) > [a] > [[a]] 

+  groupOn' f = groupBy (\x y > f x == f y) 

+  </haskell> 

+  The names could be better, the idea behind 'on' comes from the 'on' function. 

+  
+  See [http://hackage.haskell.org/packages/archive/utilityht/0.0.1/doc/html/DataListKey.html utilityht] package. 

+  
+  == Indexing lists == 

+  
+  * Find the index of a sublist in a list 

+  <haskell> 

+  findSublistIndex :: Eq a => [a] > [a] > Maybe Int 

+  findSublistIndex xss xs = findIndex (isPrefixOf xss) $ tails xs 

+  
+   Examples 

+  findSublistIndex findSublistIndex [5,6] [1..] == Just 4 

+  findSublistIndex "abbc" "abcabbc" == Just 3 

+  findSublistIndex [2,1] [2,4..10] == Nothing 

+  </haskell> 

−  '''TODO: copypaste from threads mentioned above, or from your own code''' 

+  == See also == 

+  * [[Prelude extensions]] 

+  * [[Hackage]] 

[[Category:Proposals]] 
[[Category:Proposals]] 
Latest revision as of 11:14, 16 June 2012
This page lists proposed extensions to the Haskell list functions, whether in the Prelude or Data.List. Please discuss the proposals on the Talk Page or the libraries list, and use this page to record the results of discussions. However, since the advent of HackageDB and CabalInstall it is preferred to provide such functionality in specialised packages, rather than extending the already large base library.
Contents
Splitting on a separator, etc
We need these useful functions in Data.List; I'll call them 'split' (and variants) and 'replace'. These are easily implemented but everyone always reinvents them. Various versions have been proposed, but there was no consensus on which was best, e.g.
Note: a lot of good points (diverging opinions!) are covered in the mailing lists, but if we include all these various cases, split* will have 9 variants! The goal is to reach some kind of reasonable consensus, specifically on naming and semantics. Even if we need pairs of functions to satisfy various usage and algebraic needs. Failing to accommodate every possible use of these functions should not be a sufficient reason to abandon the whole project.
The goal is clarity/uniformity (everyone uses them widely and recognizes them) and portability (I don't have to keep reimplementing these or copying that one file UsefulMissingFunctions.hs).
Note: I (Jared Updike) am working with the belief that efficiency should not be a valid argument to bar these otherwise universally useful functions from the libraries; regexes are overkill for 'split' and 'replace' for common simple situations. Let's assume people will know (or learn) when they need heavier machinery (regexes, ByteString) and will use it when efficiency is important. We can try to facilitate this by reusing any names from ByteString, etc.
split (working name)
First of all: Check out whether the split package provides, what you need.
We need a few of these:
split :: Eq a => a > [a] > [[a]]
splitWith :: (a > Bool) > [a] > [[a]]
tokens :: (a > Bool) > [a] > [[a]]
That preserve:
join sep . split sep = id
See below for 'join'
And some that use above split but filter to remove empty elements (but do not preserve above property). Easy enough:
split' :: Eq a => a > [a] > [[a]]
splitWith' :: (a > Bool) > [a] > [[a]]
tokens' :: (a > Bool) > [a] > [[a]]
i.e.
split' sep = filter (not . null) . split sep
Usage would be:
tokensws = tokens' (`elem` " \f\v\t\n\r\b")
tokensws "Hello there\n \n Haskellers! " ==
["Hello", "there", "Haskellers!"]
TODO: add version like python with multielement separator
TODO: give code, copypaste from threads mentioned above
TODO: list names and reasons for/against
replace (working name)
replace :: [a] > [a] > [a] > [a]
like Python replace:
replace "the" "a" "the quick brown fox jumped over the lazy black dog"
===>
"a quick brown fox jumped over a lazy black dog"
TODO: give code, copypaste from threads mentioned above
TODO: list names and reasons for/against
Implemented for instance in utilityht.
join (working name)
join :: [a] > [[a]] > [a]
join sep = concat . intersperse sep
Note: this function has been implemented as 'intercalate' in Data.List.
TODO: copypaste things from threads mentioned above
TODO: list names and reasons for/against
Sorted lists
The following are versions of standard prelude functions, but intended for sorted lists. The advantage is that they frequently reduce execution time by an O(n). The disadvantage is that the elements have to be members of Ord, and the lists have to be already sorted.
 Eliminates duplicate entries from the list, where duplication is defined
 by the 'eq' function. The last value is kept.
sortedNubBy :: (a > a > Bool) > [a] > [a]
sortedNubBy eq (x1 : xs@(x2 : _)) =
if eq x1 x2 then sortedNubBy eq xs else x1 : sortedNubBy eq xs
sortedNubBy _ xs = xs
sortedNub :: (Eq a) => [a] > [a]
sortedNub = sortedNubBy (==)
 Merge two sorted lists into a new sorted list. Where elements are equal
 the element from the first list is taken first.
mergeBy :: (a > a > Ordering) > [a] > [a] > [a]
mergeBy cmp xs@(x1:xs1) ys@(y1:ys1) =
if cmp x1 y1 == GT
then y1 : mergeBy cmp xs ys1
else x1 : mergeBy cmp xs1 ys
mergeBy _ [] ys = ys
mergeBy _ xs [] = xs
merge :: (Ord a) => [a] > [a] > [a]
merge = mergeBy compare
mergeBy
is implemented in utilityht.
Generalize groupBy and friends
In the Haskell 98 List library, groupBy
assumes that its argument function defines an equivalence, and the reference definition returns sublists where each element is equivalent to the first. The following definition, comparing adjacent elements, does the same thing on equivalence relations:
groupBy :: (a > a > Bool) > [a] > [[a]]
groupBy rel [] = []
groupBy rel (x:xs) = (x:ys) : groupBy rel zs
where (ys,zs) = groupByAux x xs
groupByAux x0 (x:xs)  rel x0 x = (x:ys, zs)
where (ys,zs) = groupByAux x xs
groupByAux y xs = ([], xs)
However it is also useful on other relations, e.g.
 Picking out maximal ascending sublists (runs):
> groupBy (<=) [7,3,5,9,6,8,3,5,4]
[[7],[3,5,9],[6,8],[3,5],[4]]
 Picking out contiguous sublists from an ascending sequence:
> groupBy (\a b > a+1 == b) [1,2,3,4,6]
[[1,2,3,4],[6]]
 Splitting a line at the start of each word:
> groupBy (\ c1 c2 > isLetter c1  not (isLetter c2)) "This is a line"
["This ","is ","a ","line"]
Since this more useful definition agrees with the Haskell 98 one on its specified domain, it should be a backwardscompatible replacement.
The same applies to nubBy
, and possibly deleteBy
, deleteFirstsBy
and intersectBy
(which could have more general types to make this clear).
See:
 Libraries list on Data.List.groupBy with nontransitive equality predicate
 Implementation in utilityht
groupOn and sortOn
Almost all uses of groupBy
and sortBy
actually use a specific compare function. This can (using a recent version of base) as
sortBy (comparing fst)
or
sortBy (compare `on` fst)
.
Since this use is so common, it might be worthwhile to add separate functions for this:
sortOn :: Ord b => (a > b) > [a] > [a]
sortOn = sortBy . comparing
The same goes for groupBy
groupOn :: Eq b => (a > b) > [a] > [[a]]
groupOn f = groupBy ((==) `on` f)
That is,
groupOn' :: Eq b => (a > b) > [a] > [[a]]
groupOn' f = groupBy (\x y > f x == f y)
The names could be better, the idea behind 'on' comes from the 'on' function.
See utilityht package.
Indexing lists
 Find the index of a sublist in a list
findSublistIndex :: Eq a => [a] > [a] > Maybe Int
findSublistIndex xss xs = findIndex (isPrefixOf xss) $ tails xs
 Examples
findSublistIndex findSublistIndex [5,6] [1..] == Just 4
findSublistIndex "abbc" "abcabbc" == Just 3
findSublistIndex [2,1] [2,4..10] == Nothing