Consume an iterable

Muhammad Alkarouri malkarouri at gmail.com
Sat Jan 23 08:14:34 EST 2010


On 23 Jan, 12:45, Peter Otten <__pete... at web.de> wrote:
> Muhammad Alkarouri wrote:
> > Thanks everyone, but not on my machine (Python 2.6.1, OS X 10.6) it's
> > not:
>
> > In [1]: from itertools import count, islice
>
> > In [2]: from collections import deque
>
> > In [3]: i1=count()
>
> > In [4]: def consume1(iterator, n):
> >    ...:     deque(islice(iterator, n), maxlen=0)
> >    ...:
> >    ...:
>
> > In [5]: i2=count()
>
> > In [6]: def consume2(iterator, n):
> >    ...:     for _ in islice(iterator, n): pass
> >    ...:
> >    ...:
>
> > In [7]: timeit consume1(i1, 10)
> > 1000000 loops, best of 3: 1.63 us per loop
>
> > In [8]: timeit consume2(i2, 10)
> > 1000000 loops, best of 3: 846 ns per loop
>
> > Can somebody please test whether it is only my machine or is this
> > reproducible?
>
> I can reproduce it. The deque-based approach has a bigger constant overhead
> but better per-item performance. Its asymptotical behaviour is therefore
> better.
>
> $ python consume_timeit.py
> consume_deque
>     10: 1.77500414848
>    100: 3.73333001137
>   1000: 24.7235469818
>
> consume_forloop
>     10: 1.22008490562
>    100: 5.86271500587
>   1000: 52.2449371815
>
> consume_islice
>     10: 0.897439956665
>    100: 1.51542806625
>   1000: 7.70061397552
>
> $ cat consume_timeit.py
> from collections import deque
> from itertools import islice, repeat
>
> def consume_deque(n, items):
>     deque(islice(items, n), maxlen=0)
>
> def consume_forloop(n, items):
>     for _ in islice(items, n):
>         pass
>
> def consume_islice(n, items):
>     next(islice(items, n-1, None), None)
>
> def check(fs):
>     for consume in fs:
>         items = iter(range(10))
>         consume(3, items)
>         rest = list(items)
>         assert rest == range(3, 10), consume.__name__
>
> if __name__ == "__main__":
>     fs = consume_deque, consume_forloop, consume_islice
>     check(fs)
>
>     items = repeat(None)
>
>     from timeit import Timer
>     for consume in fs:
>         print consume.__name__
>         for n in (10, 100, 1000):
>             print "%6d:" % n,
>             print Timer("consume(%s, items)" % n,
>                         "from __main__ import consume, items").timeit()
>         print
> $
>
> With next(islice(...), None) I seem to have found a variant that beats both  
> competitors.
>
> Peter

Thanks Peter, I got more or less the same result on my machine (Python
2.6.1, x86_64, OS X 10.6):

~/tmp> python consume_timeit.py
consume_deque
    10: 1.3138859272
   100: 3.54495286942
  1000: 24.9603481293

consume_forloop
    10: 0.658113002777
   100: 2.85697007179
  1000: 24.6610429287

consume_islice
    10: 0.637741088867
   100: 1.09042882919
  1000: 5.44473600388

The next function performs much better. It is also much more direct
for the purposes of consume and much more understandable (at least for
me) as it doesn't require a specialized data structure which is
subsequently not used as such.
I am thus inclined to report it as a python documentation enhancement
(bug) request. Any comments?

Cheers,

Muhammad



More information about the Python-list mailing list