[Python-ideas] Batching/grouping function for itertools
Terry Reedy
tjreedy at udel.edu
Sun Dec 8 23:09:14 CET 2013
On 12/8/2013 7:16 AM, Steven D'Aprano wrote:
> On Sun, Dec 08, 2013 at 01:30:56PM +0200, Serhiy Storchaka wrote:
>> There is also a question about result's type. Sometimes you need an
>> iterator of subsequences (i.e. split string on equal string chunks),
>> sometimes an iterator of iterators is enough.
>
> None of the other itertools functions treat strings specially. Why
> should this one? If you want to re-join them into strings, you can do so
> with a trivial wrapper:
>
> (''.join(elements) for elements in group("some string", 3, pad=' '))
A large fraction, perhaps over half, of the multiple requests for a
chunker or grouper function are for sequences, not general iterables, as
input, with the desired output type being the input type. For this, an
iterator of *slices* is *far* more efficient. The same function could
easily handle overlaps. (There are still the possible varieties of short
slice handling). *Untested*:
def window(seq, size, advance=None, extra='skip'):
'''Yield successive slices of len size of sequence seq.
Move window advance items (default = size).
Extra determines the handling of extra items.
The options are 'skip' (default), 'keep', and 'raise'.
'''
if overlap == None: advance = size
i, j, n = 0, size, len(seq)
while j <= n:
yield seq[i:j]
i += advance
j += advance
if j < n + advance:
if extra == 'keep':
yield seq[i:j]
elif extra == 'raise'
raise ValueError('extra items')
else:
raise ValueError('bad extra')
Having gotten this far, it would be possible to treat the above as a
fast path for sequences and wrap it in try:except and if len or slice
fail, fall back to a general iterator version. The result could be a
builtin rather than itertool.
--
Terry Jan Reedy
More information about the Python-ideas
mailing list