An exhaust() function for iterators
Currently, several strategies exist for exhausting an iterable when one does not care about what the iterable returns (such as when one merely wants a side effect of the iteration process). One can either use an empty for loop: for x in side_effect_iterable: pass A throwaway list comprehension: [x for x in side_effect_iterable] A try/except and a while: next = side_effect_iterable.next try: while True: next() except StopIteration: pass Or a number of other methods. The question is, which one is the fastest? Which one is the most memory efficient? Though these are all obvious methods, none of them are both the fastest and the most memory efficient (though the for/pass method comes close). As it turns out, the fastest and most efficient method available in the standard library is collections.deque's __init__ and extend methods. from collections import deque exhaust_iterable = deque(maxlen=0).extend exhaust_iterable(side_effect_iterable) When a deque object is initialized with a max length of zero or less, a special function, consume_iterator, is used instead of the regular element insertion calls. This function, found at http://hg.python.org/cpython/file/tip/Modules/_collectionsmodule.c#l278, merely iterates through the iterator, without doing any work allocating the object to the deque's internal structure. I would like to propose that this function, or one very similar to it, be added to the standard library, either in the itertools module, or the standard namespace. If nothing else, doing so would at least give a single *obvious* way to exhaust an iterator, instead of the several miscellaneous methods available. -- "Evil begins when you begin to treat people as things." - Terry Pratchett
Am 29.09.2013 06:06, schrieb Clay Sweetser:
I would like to propose that this function, or one very similar to it, be added to the standard library, either in the itertools module, or the standard namespace. If nothing else, doing so would at least give a single *obvious* way to exhaust an iterator, instead of the several miscellaneous methods available.
YAGNI. This is not a very common operation. On the point of obvious ways, the first one you gave for _ in iterable: pass is perfectly obvious and simple enough AFAICS. cheers, Georg
29.09.13 07:06, Clay Sweetser написав(ла):
I would like to propose that this function, or one very similar to it, be added to the standard library, either in the itertools module, or the standard namespace. If nothing else, doing so would at least give a single *obvious* way to exhaust an iterator, instead of the several miscellaneous methods available.
I prefer optimize the for loop so that it will be most efficient way (it is already most obvious way).
This was also my thought. On Sunday, September 29, 2013 4:42:20 PM UTC-4, Serhiy Storchaka wrote:
29.09.13 07:06, Clay Sweetser написав(ла):
I would like to propose that this function, or one very similar to it, be added to the standard library, either in the itertools module, or the standard namespace. If nothing else, doing so would at least give a single *obvious* way to exhaust an iterator, instead of the several miscellaneous methods available.
I prefer optimize the for loop so that it will be most efficient way (it is already most obvious way).
_______________________________________________ Python-ideas mailing list Python...@python.org <javascript:> https://mail.python.org/mailman/listinfo/python-ideas
It is hard to imagine that doing this: for _ in side_effect_iter: pass Could EVER realistically spend a significant share of its time in the loop code. Side effects almost surely need to do something that vastly overpowers the cost of the loop itself (maybe some I/O, maybe some computation), or there's no point in using a side-effect iterator. I know you *could* technically write: def side_effect_iter(N, obj): for n in range(N): obj.val = n yield True And probably something else whose only side effect was changing some value that doesn't need real computation. But surely writing that and exhausting that iterator is NEVER the best way to code such a thing. On the other hand, a more realistic one like this: def side_effect_iter(N): for n in range(N): val = complex_computation(n) write_to_slow_disk(val) yield True Is going to take a long time in each iteration, and there's no reason to care that the loop isn't absolutely optimal speed. On Fri, Oct 11, 2013 at 11:29 AM, Neil Girdhar <mistersheik@gmail.com>wrote:
This was also my thought.
On Sunday, September 29, 2013 4:42:20 PM UTC-4, Serhiy Storchaka wrote:
29.09.13 07:06, Clay Sweetser написав(ла):
I would like to propose that this function, or one very similar to it, be added to the standard library, either in the itertools module, or the standard namespace. If nothing else, doing so would at least give a single *obvious* way to exhaust an iterator, instead of the several miscellaneous methods available.
I prefer optimize the for loop so that it will be most efficient way (it is already most obvious way).
______________________________**_________________ Python-ideas mailing list Python...@python.org https://mail.python.org/**mailman/listinfo/python-ideas<https://mail.python.org/mailman/listinfo/python-ideas>
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
11.10.13 21:51, David Mertz написав(ла):
It is hard to imagine that doing this:
for _ in side_effect_iter: pass
Could EVER realistically spend a significant share of its time in the loop code.
When I written a test for tee() (issue #13454) I needed very fast iterator exhausting. There were one or two other similar cases.
On Sep 28, 2013, at 9:06 PM, Clay Sweetser <clay.sweetser@gmail.com> wrote:
As it turns out, the fastest and most efficient method available in the standard library is collections.deque's __init__ and extend methods.
That technique is shown in the itertools docs in the consume() recipe. It is the fastest way in CPython (in PyPy, a straight for-loop will likely be the fastest). I didn't immortalize it as a real itertool because I think most code is better-off with a straight for-loop. The itertools were inspired by functional languages and intended to be used in a functional style where iterators with side-effects would be considered bad form. A regular for-loop is only a little bit slower, but it has a number of virtues including clarity, signal checking, and thread switching. In a real application, the speed difference of consume() vs a for-loop is likely to be insignificant if the iterator is doing anything interesting at all. Raymond
participants (6)
-
Clay Sweetser
-
David Mertz
-
Georg Brandl
-
Neil Girdhar
-
Raymond Hettinger
-
Serhiy Storchaka