[Python-Dev] RE: cloning iterators again

Guido van Rossum guido at python.org
Mon Oct 27 09:49:39 EST 2003


> My question was more directed toward non-performance issues.  Do we
> really have *any* need for more than two iterators running concurrently?
> After all, it's already difficult to come-up with good use cases for two
> that are not dominated by list() or window().
> 
>  
> > With the current tee(), I was thinking that if the two iterators stay
> > close, you end up moving the in basket to the out basket rather
> > frequently, and the overhead of that might beat the simplicity of the
> > linked lists. 
> 
> With current tee(), runtime is dominated by calls to Append and Pop
> (reverse is super-fast and moves each element only once).  Those are
> calls are more expensive than a link jump; however append() and pop()
> are optimized to avoid calls to the memory manager while every link
> would need steps for alloc/initialization/reference/dealloc.  Cache
> effects are also important because the current tee() uses much less
> memory and the two memory blocks are contiguous.
> 
> 
> 
> > Also, *if* you need a lot of clones, using multiple
> > tee() calls ends up creating several queues, again causing more
> > overhead.  (These queues end up together containing all the items from
> > the oldest to the newest iterator.)
> 
> *If* we want to support multiple clones, there is an alternate
> implementation of the current tee that only costs one extra word per
> iteration.  That would be in there already.  I really *wanted* a
> multi-way tee but couldn't find a single use case that warranted it.

All points well taken.


> > I also note that the current tee() doesn't let you use __copy__ easily
> > (it would be quite messy I think).
> 
> To __copy__ is to tee.  Both make two iterators from one.
> They are different names for the same thing.
> Right now, they don't seem comparable because the current tee is only a
> two way split and you think of copy as being a multi-way split for no
> extra cost.

Here I respectfully differ.  When you tee, you have to stop using the
underlying iterator, and replace it with one of the tee'ed copies.
When you __copy__, you can continue to use the original.  The
difference matters if you're tee'ing an iterator owned by another
piece of code.


> > Maybe Andrew has some use cases?
> 
> I hope so.  I can't think of anything that isn't dominated by list(),
> window(), or the current tee().
> 
> And, if needed, the current tee() can easily be made multi-way.  It
> doubles the unit memory cost from one word to two but that's nothing
> compared to the link method (two words for PyHead, another 3 (Linux) or
> 4 (Windows) words for GC, and another 3 for the data fields).

As you said in your first msg, you could do it with much less overhead
if the link cell wasn't made a PyObject.  Also, Andrew's suggestion of
using a link cell containing an array of values could be explored.

But I'll happily back off until we find a use case that needs more
than a 2-way tee *and* we find it's a performance bottleneck for your
approach.  We may never find that.

> > The question is, how often does one need this?  Have you seen real use
> > cases for tee() that aren't better served with list()?
> 
> I'm sure they exist, but they are very few.  I was hoping that a simple,
> fast, memory efficient two-way tee() would have satisfied all the
> requests, but this thing appears to be taking on a life of its own with
> folks thinking they need multiple concurrent iterators created by a
> magic method (what for?).

Well, there was a separate thread about __copy__'ing iterators.

--Guido van Rossum (home page: http://www.python.org/~guido/)



More information about the Python-Dev mailing list