[Python-ideas] itertools.chunks()
Wolfgang Maier
wolfgang.maier at biologie.uni-freiburg.de
Tue Apr 9 18:46:31 CEST 2013
>Also, here's a version of the same from my own code (modified a
>little) that uses islice instead of zip_longest. I haven't timed it but it
was intended to be fast for large chunk sizes and I'd be interested to know
>how it compares:
>
>
>from itertools import islice
>
>def chunked(iterable, size, **kwargs):
> '''Breaks an iterable into chunks
>
> Usage:
> >>> list(chunked('qwertyuiop', 3))
> [['q', 'w', 'e'], ['r', 't', 'y'], ['u', 'i', 'o'], ['p']]
>
> >>> list(chunked('qwertyuiop', 3, fillvalue=None))
> [['q', 'w', 'e'], ['r', 't', 'y'], ['u', 'i', 'o'], ['p', None,
None]]
>
> >>> list(chunked('qwertyuiop', 3, strict=True))
> Traceback (most recent call last):
> ...
> ValueError: Invalid chunk size
> '''
> list_, islice_ = list, islice
> iterator = iter(iterable)
>
> chunk = list_(islice_(iterator, size))
> while len(chunk) == size:
> yield chunk
> chunk = list_(islice_(iterator, size))
>
> if not chunk:
> return
> elif kwargs.get('strict', False):
> raise ValueError('Invalid chunk size')
> elif 'fillvalue' in kwargs:
> yield chunk + (size - len(chunk)) * [kwargs['fillvalue']]
> else:
> yield chunk
Hi there,
I have compared now your code for chunked (thanks a lot for sharing it!!)
with Peter's strict_grouper using
timeit.
As a reminder, here's the strict_grouper code again:
def strict_grouper(items, size, strict):
fillvalue = object()
args = [iter(items)]*size
chunks = zip_longest(*args, fillvalue=fillvalue)
prev = next(chunks)
for chunk in chunks:
yield prev
prev = chunk
if prev[-1] is fillvalue:
if strict:
raise ValueError
else:
while prev[-1] is fillvalue:
prev = prev[:-1]
yield prev
and here's the code I used for timing:
from timeit import timeit
results = []
results2 = []
# specify a range of test conditions
conds=((100,1),(100,10),(100,80),(1000,1),(1000,10),(1000,100),(1000,800))
# run chunked under the different conditions
for cond in conds:
r=timeit(stmt='d=[i for i in chunked(range(cond[0]),cond[1])]', \
setup='from __main__ import chunked, cond', number=10000)
results.append((cond, r))
# same for strict_grouper
for cond in conds:
r=timeit(stmt= \
'd=[i for i in strict_grouper(range(cond[0]),cond[1], strict=False)]', \
setup='from __main__ import strict_grouper, cond', number=10000)
results2.append((cond, r))
the results I got were:
# the chunked results:
[((100, 1), 2.197788960464095), ((100, 10), 0.27306091885475325), ((100,
80), 0.1232851640888839),
((1000, 1), 21.86202648707149), ((1000, 10), 2.47093215096902), ((1000,
100), 0.9069762837680173),
((1000, 800), 0.6114090097580629)]
the strict_grouper results:
[((100, 1), 0.31356012737705896), ((100, 10), 0.10581013815499318), ((100,
80), 0.45853288974103634),
((1000, 1), 2.5020897878439428), ((1000, 10), 0.6703603850128275), ((1000,
100), 0.5088070259098458),
((1000, 800), 19.14092429336597)]
Two things are obvious from this:
1) Peter's solution is usually faster, sometimes a lot, but
2) it performs very poorly when it has to yield a truncated last group of
items,
like in the range(1000), 800 case. I guess this is true only for the
strict=False case (with strict=True
it would raise the error instantaneously) because I put in Peter's cautious
while loop to trim off the
fillvalues. If this was fixed, then I guess strict_grouper would be
preferable under pretty much any condition.
Best,
Wolfgang
More information about the Python-ideas
mailing list