An exhaust() function for iterators

Currently, several strategies exist for exhausting an iterable when one does not care about what the iterable returns (such as when one merely wants a side effect of the iteration process). One can either use an empty for loop: for x in side_effect_iterable: pass A throwaway list comprehension: [x for x in side_effect_iterable] A try/except and a while: next = side_effect_iterable.next try: while True: next() except StopIteration: pass Or a number of other methods. The question is, which one is the fastest? Which one is the most memory efficient? Though these are all obvious methods, none of them are both the fastest and the most memory efficient (though the for/pass method comes close). As it turns out, the fastest and most efficient method available in the standard library is collections.deque's __init__ and extend methods. from collections import deque exhaust_iterable = deque(maxlen=0).extend exhaust_iterable(side_effect_iterable) When a deque object is initialized with a max length of zero or less, a special function, consume_iterator, is used instead of the regular element insertion calls. This function, found at http://hg.python.org/cpython/file/tip/Modules/_collectionsmodule.c#l278, merely iterates through the iterator, without doing any work allocating the object to the deque's internal structure. I would like to propose that this function, or one very similar to it, be added to the standard library, either in the itertools module, or the standard namespace. If nothing else, doing so would at least give a single *obvious* way to exhaust an iterator, instead of the several miscellaneous methods available. -- "Evil begins when you begin to treat people as things." - Terry Pratchett

It is hard to imagine that doing this: for _ in side_effect_iter: pass Could EVER realistically spend a significant share of its time in the loop code. Side effects almost surely need to do something that vastly overpowers the cost of the loop itself (maybe some I/O, maybe some computation), or there's no point in using a side-effect iterator. I know you *could* technically write: def side_effect_iter(N, obj): for n in range(N): obj.val = n yield True And probably something else whose only side effect was changing some value that doesn't need real computation. But surely writing that and exhausting that iterator is NEVER the best way to code such a thing. On the other hand, a more realistic one like this: def side_effect_iter(N): for n in range(N): val = complex_computation(n) write_to_slow_disk(val) yield True Is going to take a long time in each iteration, and there's no reason to care that the loop isn't absolutely optimal speed. On Fri, Oct 11, 2013 at 11:29 AM, Neil Girdhar <mistersheik@gmail.com>wrote:
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

On Sep 28, 2013, at 9:06 PM, Clay Sweetser <clay.sweetser@gmail.com> wrote:
That technique is shown in the itertools docs in the consume() recipe. It is the fastest way in CPython (in PyPy, a straight for-loop will likely be the fastest). I didn't immortalize it as a real itertool because I think most code is better-off with a straight for-loop. The itertools were inspired by functional languages and intended to be used in a functional style where iterators with side-effects would be considered bad form. A regular for-loop is only a little bit slower, but it has a number of virtues including clarity, signal checking, and thread switching. In a real application, the speed difference of consume() vs a for-loop is likely to be insignificant if the iterator is doing anything interesting at all. Raymond

It is hard to imagine that doing this: for _ in side_effect_iter: pass Could EVER realistically spend a significant share of its time in the loop code. Side effects almost surely need to do something that vastly overpowers the cost of the loop itself (maybe some I/O, maybe some computation), or there's no point in using a side-effect iterator. I know you *could* technically write: def side_effect_iter(N, obj): for n in range(N): obj.val = n yield True And probably something else whose only side effect was changing some value that doesn't need real computation. But surely writing that and exhausting that iterator is NEVER the best way to code such a thing. On the other hand, a more realistic one like this: def side_effect_iter(N): for n in range(N): val = complex_computation(n) write_to_slow_disk(val) yield True Is going to take a long time in each iteration, and there's no reason to care that the loop isn't absolutely optimal speed. On Fri, Oct 11, 2013 at 11:29 AM, Neil Girdhar <mistersheik@gmail.com>wrote:
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

On Sep 28, 2013, at 9:06 PM, Clay Sweetser <clay.sweetser@gmail.com> wrote:
That technique is shown in the itertools docs in the consume() recipe. It is the fastest way in CPython (in PyPy, a straight for-loop will likely be the fastest). I didn't immortalize it as a real itertool because I think most code is better-off with a straight for-loop. The itertools were inspired by functional languages and intended to be used in a functional style where iterators with side-effects would be considered bad form. A regular for-loop is only a little bit slower, but it has a number of virtues including clarity, signal checking, and thread switching. In a real application, the speed difference of consume() vs a for-loop is likely to be insignificant if the iterator is doing anything interesting at all. Raymond
participants (6)
-
Clay Sweetser
-
David Mertz
-
Georg Brandl
-
Neil Girdhar
-
Raymond Hettinger
-
Serhiy Storchaka