[Python-ideas] Batching/grouping function for itertools

Terry Reedy tjreedy at udel.edu
Sun Dec 8 23:09:14 CET 2013


On 12/8/2013 7:16 AM, Steven D'Aprano wrote:
> On Sun, Dec 08, 2013 at 01:30:56PM +0200, Serhiy Storchaka wrote:

>> There is also a question about result's type. Sometimes you need an
>> iterator of subsequences (i.e. split string on equal string chunks),
>> sometimes an iterator of iterators is enough.
>
> None of the other itertools functions treat strings specially. Why
> should this one? If you want to re-join them into strings, you can do so
> with a trivial wrapper:
>
> (''.join(elements) for elements in group("some string", 3, pad=' '))

A large fraction, perhaps over half, of the multiple requests for a 
chunker or grouper function are for sequences, not general iterables, as 
input, with the desired output type being the input type. For this, an 
iterator of *slices* is *far* more efficient. The same function could 
easily handle overlaps. (There are still the possible varieties of short 
slice handling). *Untested*:

def window(seq, size, advance=None, extra='skip'):
   '''Yield successive slices of len size of sequence seq.

   Move window advance items (default = size).
   Extra determines the handling of extra items.
   The options are 'skip' (default), 'keep', and 'raise'.
   '''
   if overlap == None: advance = size
   i, j, n = 0, size, len(seq)
   while j <= n:
     yield seq[i:j]
     i += advance
     j += advance
   if j < n + advance:
     if extra == 'keep':
       yield seq[i:j]
     elif extra == 'raise'
       raise ValueError('extra items')
     else:
       raise ValueError('bad extra')

Having gotten this far, it would be possible to treat the above as a 
fast path for sequences and wrap it in try:except and if len or slice 
fail, fall back to a general iterator version. The result could be a 
builtin rather than itertool.

-- 
Terry Jan Reedy



More information about the Python-ideas mailing list