Consume an iterable

Peter Otten __peter__ at web.de
Sun Jan 24 11:05:58 EST 2010


Jan Kaliszewski wrote:

> Dnia 23-01-2010 o 15:19:56 Peter Otten <__peter__ at web.de> napisał(a):
> 
>>>> def consume_islice(n, items):
>>>>     next(islice(items, n, n), None)
>>
>> One problem: the above function doesn't consume the entire iterator like
>> the original example does for n=None. Passing sys.maxint instead is not
>> pretty.
> 
> Not very pretty, but noticeably (though not dramatically) faster for
> n=None. Consider a modified version of the script from
> http://bugs.python.org/issue7764:
> 
>      import collections, sys
>      from itertools import islice, repeat
> 
>      def consume0(iterator, n):  # the old one
>          collections.deque(islice(iterator, n), maxlen=0)
> 
>      def consume1(iterator, n):  # similar to the primary proposal
>          if n is None:
>              collections.deque(iterator, maxlen=0)
>          elif n != 0:
>              next(islice(iterator, n-1, None), None)
> 
>      def consume2(iterator, n):  # the approved proposal (see #7764)
>          if n is None:
>              collections.deque(iterator, maxlen=0)
>          else:
>              next(islice(iterator, n, n), None)
> 
>      def consume3(iterator, n):  # with sys.maxint
>          if n is None:
>              n = sys.maxint      # (maybe should be sys.maxsize instead?)
>          next(islice(iterator, n, n), None)
> 
>      def test(fs):
>          for consume in fs:
>              iterator = iter(range(10))
>              consume(iterator, 3)
>              rest = list(iterator)
>              assert rest == range(3, 10), consume.__name__
> 
>              iterator = iter(range(10))
>              consume(iterator, 0)
>              rest = list(iterator)
>              assert rest == range(10), consume.__name__
> 
>              iterator = iter(range(10))
>              consume(iterator, None)
>              rest = list(iterator)
>              assert rest == [], consume.__name__
> 
>      if __name__ == "__main__":
>          from timeit import Timer
> 
>          fs = (consume0, consume1,
>                consume2, consume3)
>          test(fs)
> 
>          iterator = repeat(None, 1000)
>          for consume in fs:
>              print consume.__name__
>              for n in (10, 100, 1000, None):
>                  print "%6s:" % n,
>                  print Timer("consume(iterator, %s)" % n,
>                              "import collections, sys\n"
>                              "from __main__ import consume,
> iterator").timeit()
>              print
> 
> 
> Results [Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41) [GCC 4.3.3]
> on linux2 pentium4 2.4 GHz]:
> 
> consume0
>      10: 2.94313001633
>     100: 2.91833305359
>    1000: 2.93242096901
>    None: 2.90090417862
> 
> consume1
>      10: 1.80793309212
>     100: 1.7936270237
>    1000: 1.83439803123
>    None: 2.37652015686
> 
> consume2
>      10: 1.58784389496
>     100: 1.5890610218
>    1000: 1.58557391167
>    None: 2.37005710602
> 
> consume3
>      10: 1.6071870327
>     100: 1.61109304428
>    1000: 1.60717701912
>    None: 1.81885385513
> 
> 
> Regards,
> *j
> 

Don't the results look suspicious to you? Try measuring with 

iterator = iter([])

I'm sure you'll get the same result. An "easy" fix which introduces some 
constant overhead but keeps the results comparable:

    for consume in fs:
        print consume.__name__
        for n in (10, 100, 1000, None):
            print "%6s:" % n,
            print Timer("consume(repeat(None, 1000), %s)" % n,
                        "import collections, sys\n"
                        "from __main__ import consume, repeat").timeit()
        print

Just for fun, here's a variant of consume3 for the paranoid:

_sentinel = object()
def consume4(iterator, n):
    if n is None:
        n = sys.maxint
    while next(islice(iterator, n, n), _sentinel) is not _sentinel:
        pass

Peter



More information about the Python-list mailing list