Consume an iterable
Peter Otten
__peter__ at web.de
Sun Jan 24 11:05:58 EST 2010
Jan Kaliszewski wrote:
> Dnia 23-01-2010 o 15:19:56 Peter Otten <__peter__ at web.de> napisał(a):
>
>>>> def consume_islice(n, items):
>>>> next(islice(items, n, n), None)
>>
>> One problem: the above function doesn't consume the entire iterator like
>> the original example does for n=None. Passing sys.maxint instead is not
>> pretty.
>
> Not very pretty, but noticeably (though not dramatically) faster for
> n=None. Consider a modified version of the script from
> http://bugs.python.org/issue7764:
>
> import collections, sys
> from itertools import islice, repeat
>
> def consume0(iterator, n): # the old one
> collections.deque(islice(iterator, n), maxlen=0)
>
> def consume1(iterator, n): # similar to the primary proposal
> if n is None:
> collections.deque(iterator, maxlen=0)
> elif n != 0:
> next(islice(iterator, n-1, None), None)
>
> def consume2(iterator, n): # the approved proposal (see #7764)
> if n is None:
> collections.deque(iterator, maxlen=0)
> else:
> next(islice(iterator, n, n), None)
>
> def consume3(iterator, n): # with sys.maxint
> if n is None:
> n = sys.maxint # (maybe should be sys.maxsize instead?)
> next(islice(iterator, n, n), None)
>
> def test(fs):
> for consume in fs:
> iterator = iter(range(10))
> consume(iterator, 3)
> rest = list(iterator)
> assert rest == range(3, 10), consume.__name__
>
> iterator = iter(range(10))
> consume(iterator, 0)
> rest = list(iterator)
> assert rest == range(10), consume.__name__
>
> iterator = iter(range(10))
> consume(iterator, None)
> rest = list(iterator)
> assert rest == [], consume.__name__
>
> if __name__ == "__main__":
> from timeit import Timer
>
> fs = (consume0, consume1,
> consume2, consume3)
> test(fs)
>
> iterator = repeat(None, 1000)
> for consume in fs:
> print consume.__name__
> for n in (10, 100, 1000, None):
> print "%6s:" % n,
> print Timer("consume(iterator, %s)" % n,
> "import collections, sys\n"
> "from __main__ import consume,
> iterator").timeit()
> print
>
>
> Results [Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41) [GCC 4.3.3]
> on linux2 pentium4 2.4 GHz]:
>
> consume0
> 10: 2.94313001633
> 100: 2.91833305359
> 1000: 2.93242096901
> None: 2.90090417862
>
> consume1
> 10: 1.80793309212
> 100: 1.7936270237
> 1000: 1.83439803123
> None: 2.37652015686
>
> consume2
> 10: 1.58784389496
> 100: 1.5890610218
> 1000: 1.58557391167
> None: 2.37005710602
>
> consume3
> 10: 1.6071870327
> 100: 1.61109304428
> 1000: 1.60717701912
> None: 1.81885385513
>
>
> Regards,
> *j
>
Don't the results look suspicious to you? Try measuring with
iterator = iter([])
I'm sure you'll get the same result. An "easy" fix which introduces some
constant overhead but keeps the results comparable:
for consume in fs:
print consume.__name__
for n in (10, 100, 1000, None):
print "%6s:" % n,
print Timer("consume(repeat(None, 1000), %s)" % n,
"import collections, sys\n"
"from __main__ import consume, repeat").timeit()
print
Just for fun, here's a variant of consume3 for the paranoid:
_sentinel = object()
def consume4(iterator, n):
if n is None:
n = sys.maxint
while next(islice(iterator, n, n), _sentinel) is not _sentinel:
pass
Peter
More information about the Python-list
mailing list