[Python-ideas] itertools.chunks()

Tue Apr 9 18:46:31 CEST 2013

>Also, here's a version of the same from my own code (modified a
>little) that uses islice instead of zip_longest. I haven't timed it but it
was intended to be fast for large chunk sizes and I'd be interested to know
>how it compares:
>
>
>from itertools import islice
>
>def chunked(iterable, size, **kwargs):
>    '''Breaks an iterable into chunks
>
>    Usage:
>        >>> list(chunked('qwertyuiop', 3))
>        [['q', 'w', 'e'], ['r', 't', 'y'], ['u', 'i', 'o'], ['p']]
>
>        >>> list(chunked('qwertyuiop', 3, fillvalue=None))
>        [['q', 'w', 'e'], ['r', 't', 'y'], ['u', 'i', 'o'], ['p', None,
None]]
>
>        >>> list(chunked('qwertyuiop', 3, strict=True))
>        Traceback (most recent call last):
>            ...
>        ValueError: Invalid chunk size
>    '''
>    list_, islice_ = list, islice
>    iterator = iter(iterable)
>
>    chunk = list_(islice_(iterator, size))
>    while len(chunk) == size:
>        yield chunk
>        chunk = list_(islice_(iterator, size))
>
>    if not chunk:
>        return
>    elif kwargs.get('strict', False):
>        raise ValueError('Invalid chunk size')
>    elif 'fillvalue' in kwargs:
>        yield chunk + (size - len(chunk)) * [kwargs['fillvalue']]
>    else:
>        yield chunk

Hi there,
I have compared now your code for chunked (thanks a lot for sharing it!!)
with Peter's strict_grouper using
timeit. 
As a reminder, here's the strict_grouper code again:

def strict_grouper(items, size, strict):
    fillvalue = object()
    args = [iter(items)]*size
    chunks = zip_longest(*args, fillvalue=fillvalue)
    prev = next(chunks)

    for chunk in chunks:
        yield prev
        prev = chunk

    if prev[-1] is fillvalue:
        if strict:
            raise ValueError
        else:
            while prev[-1] is fillvalue:
                prev = prev[:-1]

    yield prev

and here's the code I used for timing:

from timeit import timeit
results = []
results2 = []
# specify a range of test conditions
conds=((100,1),(100,10),(100,80),(1000,1),(1000,10),(1000,100),(1000,800))

# run chunked under the different conditions
for cond in conds:
    r=timeit(stmt='d=[i for i in chunked(range(cond[0]),cond[1])]', \
    setup='from __main__ import chunked, cond', number=10000)
    results.append((cond, r))

# same for strict_grouper
for cond in conds:
    r=timeit(stmt= \
    'd=[i for i in strict_grouper(range(cond[0]),cond[1], strict=False)]', \
    setup='from __main__ import strict_grouper, cond', number=10000)
    results2.append((cond, r))

the results I got were:

# the chunked results:
[((100, 1), 2.197788960464095), ((100, 10), 0.27306091885475325), ((100,
80), 0.1232851640888839),
 ((1000, 1), 21.86202648707149), ((1000, 10), 2.47093215096902), ((1000,
100), 0.9069762837680173),
((1000, 800), 0.6114090097580629)]

the strict_grouper results:
[((100, 1), 0.31356012737705896), ((100, 10), 0.10581013815499318), ((100,
80), 0.45853288974103634),
((1000, 1), 2.5020897878439428), ((1000, 10), 0.6703603850128275), ((1000,
100), 0.5088070259098458),
((1000, 800), 19.14092429336597)]

Two things are obvious from this:
1) Peter's solution is usually faster, sometimes a lot, but
2) it performs very poorly when it has to yield a truncated last group of
items,
like in the range(1000), 800 case. I guess this is true only for the
strict=False case (with strict=True
it would raise the error instantaneously) because I put in Peter's cautious
while loop to trim off the
fillvalues. If this was fixed, then I guess strict_grouper would be
preferable under pretty much any condition.

Best,
Wolfgang