Sequence splitting

Scott David Daniels Scott.Daniels at Acm.Org
Fri Jul 3 13:47:15 EDT 2009


Steven D'Aprano wrote:
> I've never needed such a split function, and I don't like the name, and 
> the functionality isn't general enough. I'd prefer something which splits 
> the input sequence into as many sublists as necessary, according to the 
> output of the key function. Something like itertools.groupby(), except it 
> runs through the entire sequence and collates all the elements with 
> identical keys.
> 
> splitby(range(10), lambda n: n%3)
> => [ (0, [0, 3, 6, 9]),
>      (1, [1, 4, 7]), 
>      (2, [2, 5, 8]) ]
> 
> Your split() would be nearly equivalent to this with a key function that 
> returns a Boolean.

Well, here is my go at doing the original with iterators:

def splitter(source, test=bool):
     a, b = itertools.tee((x, test(x)) for x in source)
     return (data for data, decision in a if decision), (
	    data for data, decision in b if not decision)

This has the advantage that it can operate on infinite lists.  For
something like splitby for grouping, I seem to need to know the cases
up front:

def _make_gen(particular, src):
      return (x for x, c in src if c == particular)

def splitby(source, cases, case):
     '''Produce a dict of generators for case(el) for el in source'''
     decided = itertools.tee(((x, case(x)) for x in source), len(cases))
     return dict((c, _make_gen(c, src))
                 for c, src in zip(cases, decided))

example:

def classify(n):
     '''Least prime factor of a few'''
     for prime in [2, 3, 5, 7]:
         if n % prime == 0:
             return prime
     return 0

for k,g in splitby(range(50), (2, 3, 5, 7, 0), classify).items():
     print('%s: %s' % (k, list(g)))

0: [1, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47]
2: [0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24,
     26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48]
3: [3, 9, 15, 21, 27, 33, 39, 45]
5: [5, 25, 35]
7: [7, 49]

--Scott David Daniels
Scott.Daniels at Acm.Org



More information about the Python-list mailing list