[Python-Dev] RE: cloning iterators again

Sun Oct 26 18:17:35 EST 2003

> The following is just so beautiful, I have to share it.

I have to say, it is a thing of beauty.

> I've been thinking about various implementations of Andrew Koenig's
> idea of "copyable iterator wrappers", which support a generalization
> of Raymond Hettinger's tee(). 

I've re-read some of the old email on the subject but didn't see what
this buys us that we don't already get with the current tee().

When I wrote tee(), I had considered implementing it as a multi-way
tee(it, n=2) so you could write a,b,c,d=tee(myiterable, 4).  Then, I
wracked my brain for use cases and found nothing that warranted:

* the additional memory consumption (the current implementation consumes
only one pointer per element and it stores them in contiguous memory); 

* the additional memory management utilization (the underlying list.pop
and list.append have already been optimized to avoid excessive
malloc/free calls); 

* or the impact on cache performance (using contiguous memory means that
consecutive pops are in the L1 cache at least 90% of the time and using
only one word per entry means that a long series of pops is less likely
to blow everything else out of the cache).

With only two iterators, I can imagine use cases where the two iterators
track each other fairly closely.  But with multiple iterators, one
iterator typically lags far behind (meaning that list(it) is the best
solution) or they track within a fixed number of elements of each other
(meaning that windowing is the best solution).

The itertools example section shows the pure python code for windowing.
AFAICT, that windowing code is unbeatable in terms of speed and memory
consumption (nearly all the time is spent forming the result tuple).

> class Link(object):
>     """Singly-linked list of (state, value) pairs.
. . .
>     __slots__ = ["state", "value", "next"]

One way to implement this is with a type which adds PyHEAD to the space
consumption for the three fields.  An alternate approach is to use PyMem
directly and request space for four fields (including a refcount field).

>         if state < 0:
>             self.link = None
>             raise value

Is it kosher to re-raise the exception long after something else make
have handled it and the execution context has long since disappeared?

> def test():
>     """A simple demonstration of the Wrapper class."""
>     import random
>     def gen():
>         for i in range(10):
>             yield i
>     it = gen()
>     a, b = tee(it)
>     b, c = tee(b)
>     c, d = tee(c)

This is very nice.  The current tee() increases memory consumption and
workload when nested like this.

Raymond Hettinger