[Python-Dev] RE: cloning iterators again
Raymond Hettinger
python at rcn.com
Mon Oct 27 00:12:33 EST 2003
> > I've re-read some of the old email on the subject but didn't see
what
> > this buys us that we don't already get with the current tee().
>
> Performance-wise I don't know; we'd have to profile it I guess. :-(
My question was more directed toward non-performance issues. Do we
really have *any* need for more than two iterators running concurrently?
After all, it's already difficult to come-up with good use cases for two
that are not dominated by list() or window().
> With the current tee(), I was thinking that if the two iterators stay
> close, you end up moving the in basket to the out basket rather
> frequently, and the overhead of that might beat the simplicity of the
> linked lists.
With current tee(), runtime is dominated by calls to Append and Pop
(reverse is super-fast and moves each element only once). Those are
calls are more expensive than a link jump; however append() and pop()
are optimized to avoid calls to the memory manager while every link
would need steps for alloc/initialization/reference/dealloc. Cache
effects are also important because the current tee() uses much less
memory and the two memory blocks are contiguous.
> Also, *if* you need a lot of clones, using multiple
> tee() calls ends up creating several queues, again causing more
> overhead. (These queues end up together containing all the items from
> the oldest to the newest iterator.)
*If* we want to support multiple clones, there is an alternate
implementation of the current tee that only costs one extra word per
iteration. That would be in there already. I really *wanted* a
multi-way tee but couldn't find a single use case that warranted it.
> I also note that the current tee() doesn't let you use __copy__ easily
> (it would be quite messy I think).
To __copy__ is to tee. Both make two iterators from one.
They are different names for the same thing.
Right now, they don't seem comparable because the current tee is only a
two way split and you think of copy as being a multi-way split for no
extra cost.
> Maybe Andrew has some use cases?
I hope so. I can't think of anything that isn't dominated by list(),
window(), or the current tee().
And, if needed, the current tee() can easily be made multi-way. It
doubles the unit memory cost from one word to two but that's nothing
compared to the link method (two words for PyHead, another 3 (Linux) or
4 (Windows) words for GC, and another 3 for the data fields).
> The question is, how often does one need this? Have you seen real use
> cases for tee() that aren't better served with list()?
I'm sure they exist, but they are very few. I was hoping that a simple,
fast, memory efficient two-way tee() would have satisfied all the
requests, but this thing appears to be taking on a life of its own with
folks thinking they need multiple concurrent iterators created by a
magic method (what for?).
Raymond
More information about the Python-Dev
mailing list