On Fri, Jul 26, 2019 at 10:06 PM Kyle Stanley firstname.lastname@example.org wrote:
From my understanding, consume() effectively provides the functionality the author was looking for. Also, between the options of `for _ in iter:` vs `colllections.deque(it, maxlen=0)`, how significant is the performance difference?
I had assumed that the performance of `for _ in iter` would be significantly better, since due to the overhead cost of creating and filling a double ended queue, which provides optimization for insertion at the beginning and end. Wouldn't a one directional iterator provide better performance and have a lower memory cost if there is no modification required?
collections.deque with an explicit maxlen of 0 doesn't actually populate the queue at all; it has a special case for maxlen 0 that just pulls items and immediately throws away the reference to what it pulled without storing it in the deque at all. They split off that special case into its own function at the C layer, consume_iterator: https://github.com/python/cpython/blob/master/Modules/_collectionsmodule.c#L...
It's basically impossible to beat that in CPython in the general case. By contrast, for _ in iterable would need to execute at least three bytecodes per item (advance iterator, store, jump), which is *way* more expensive per item. collections.deque(maxlen=0) can lose for small inputs (because it does have to call a constructor, create a deque, then throw it away; precreating a singleton for consume with `consumer = collections.deque(maxlen=0).extend` can save on some of that though), but for any meaningful length input, the reduced cost per item makes up for it.