[Python-Dev] RE: cloning iterators again
Raymond Hettinger
python at rcn.com
Mon Oct 27 06:24:57 EST 2003
> I also note that the current tee() doesn't let you use __copy__ easily
> (it would be quite messy I think). The linked-list version supports
> __copy__ trivially. This may be important if we execute (as I think
> we should) on the idea of making selected iterators __copy__-able
> (especially all the standard container iterators and xrange).
The current tee() was written to support only a two way split, but it
can easily be cast as a multi-way splitter with no problem.
The only real difference in the ideas presented so far are whether the
underlying queue should be implemented as a singly linked list or as a
double stack.
As a proof-of-concept, here is GvR's code re-cast with the queue changed
to a double stack implementation. The interface is completely
unchanged. The memory consumed is double that the current tee() but
much less than the linked list version. The speed is half that of the
current tee() and roughly comparable to or slightly better than the
linked list version.
Raymond Hettinger
------------------------------------------------------------------------
--
""" Guido's demo program re-cast with a different underlying data
structure
Replaces the linked list based queue with a two stack based queue.
Advantages:
The double stack method consumes only two pointers per data element
while the linked list method consumes space for a link object
(8 to 10 words).
The double stack method uses contiguous memory while the link
objects are more fragmented.
The stack method uses append() and pop() which are optimized to
minimize memory management calls. For the link method, every
link costs a malloc and free.
Todo:
Handle Wrappers that are GC'd before termination.
Add support for holding an exception.
"""
class TeeMaster(object):
"""Holder for information common to wrapped iterators
"""
def __init__(self, it):
self.inbasket = []
self.inrefcnt = []
self.outbasket = []
self.outrefcnt = []
self.numseen = 0
self.it = it
self.numsplits = 0
class Wrapper(object):
"""Copyable wrapper around an iterator.
Any number of Wrappers around the same iterator share the TeeMaster
object. The Wrapper that is most behind will drop the refcnt to
zero, which causes the reference to be popped off of the queue.
The newest Wrapper gets a brand new TeeMaster object. Later
wrappers share an existing TeeMaster object. Since they may
have arrived late in the game, they need to know how many objects
have already been seen by the wrapper. When they call next(),
they ask for the next numseen.
If a Wrapper is garbage-collected before it finishes, the refcount
floor needs to be raised. That has not yet been implemented.
"""
__slots__ = ["master", "numseen"]
def __init__(self, it, master=None):
"""Constructor. The master argument is used by __copy__
below."""
if master is None:
master = TeeMaster(it)
self.master = master
self.numseen = master.numseen
self.master.numsplits += 1
def __copy__(self):
"""Copy the iterator.
This returns a new iterator that will return the same series
of results as the original.
"""
return Wrapper(None, self.master)
def __iter__(self):
"""All iterators should support __iter__() returning self."""
return self
def next(self):
"""Get the next value of the iterator, or raise
StopIteration."""
master = self.master
inbasket, inrefcnt = master.inbasket, master.inrefcnt
if master.numseen == self.numseen:
# This is the lead dog so get a value through the iterator
value = master.it.next()
master.numseen += 1
# Save it for the other dogs
inbasket.append(value)
inrefcnt.append(master.numsplits-1)
self.numseen += 1
return value
# Not a lead dog -- the view never changes :-(
location = len(inbasket) - (master.numseen - self.numseen)
if location >= 0:
# Our food is in the inbasket
value = inbasket[location]
inrefcnt[location] -= 1
rc = inrefcnt[location]
else:
# Our food is in the outbasket
location = -(location + 1)
value = master.outbasket[location]
master.outrefcnt[location] -= 1
rc = master.outrefcnt[location]
# Purge doggie bowl when no food is left
if rc == 0:
if len(master.outbasket) == 0:
master.outbasket, master.inbasket = master.inbasket,
master.outbasket
master.outrefcnt, master.inrefcnt = master.inrefcnt,
master.outrefcnt
master.outbasket.reverse()
master.outrefcnt.reverse()
master.outbasket.pop()
master.outrefcnt.pop()
self.numseen += 1
return value
def tee(it):
"""Replacement for Raymond's tee(); see examples in itertools
docs."""
if not hasattr(it, "__copy__"):
it = Wrapper(it)
return (it, it.__copy__())
def test():
"""A simple demonstration of the Wrapper class."""
import random
def gen():
for i in range(10):
yield i
it = gen()
a, b = tee(it)
b, c = tee(b)
c, d = tee(c)
iterators = [a, b, c, d]
while iterators != [None, None, None, None]:
i = random.randrange(4)
it = iterators[i]
if it is None:
next = "----"
else:
try:
next = it.next()
except StopIteration:
next = "****"
iterators[i] = None
print "%4d%s%4s%s" % (i, " ."*i, next, " ."*(3-i))
if __name__ == "__main__":
test()
More information about the Python-Dev
mailing list