[Python-ideas] A cute Python implementation of itertools.tee
leewangzhong+python at gmail.com
Mon May 14 06:49:25 EDT 2018
The topic reminded me of Stephan Houben's streamtee.
It was an attempt at a memory-efficient tee, but it turned out tee was
efficient enough. It uses a thunk-like method and recursion to remove
the need for an explicit linked list.
In the spirit of the topic, I tried to strip it down to the bare essentials:
def tee(iterable, n):
stream = _streamiter(iter(iterable))
return tuple(_wrapper(stream) for _ in range(n))
result = (next(itr), _streamiter(itr))
del itr # Release unneeded reference.
result, stream = next(stream)
Unfortunately, this code, too, breaks with the new StopIteration
On Sun, Apr 15, 2018 at 1:05 AM, Tim Peters <tim.peters at gmail.com> wrote:
> Just for fun - no complaint, no suggestion, just sharing a bit of code
> that tickled me.
> The docs for `itertools.tee()` contain a Python work-alike, which is
> easy to follow. It gives each derived generator its own deque, and
> when a new value is obtained from the original iterator it pushes that
> value onto each of those deques.
> Of course it's possible for them to share a single deque, but the code
> gets more complicated. Is it possible to make it simpler instead?
> What it "really" needs is a shared singly-linked list of values,
> pointing from oldest value to newest. Then each derived generator can
> just follow the links, and yield its next result in time independent
> of the number of derived generators. But how do you know when a new
> value needs to be obtained from the original iterator, and how do you
> know when space for an older value can be recycled (because all of the
> derived generators have yielded it)?
> I ended up with almost a full page of code to do that, storing with
> each value and link a count of the number of derived generators that
> had yet to yield the value, effectively coding my own reference-count
> scheme by hand, along with "head" and "tail" pointers to the ends of
> the linked list that proved remarkably tricky to keep correct in all
> Then I thought "this is stupid! Python already does reference
> counting." Voila! Vast swaths of tedious code vanished, giving this
> remarkably simple implementation:
> def mytee(xs, n):
> last = [None, None]
> def gen(it, mylast):
> nonlocal last
> while True:
> mylast = mylast
> if not mylast:
> mylast = last = last = [next(it), None]
> yield mylast
> it = iter(xs)
> return tuple(gen(it, last) for _ in range(n))
> There's no need to keep a pointer to the start of the shared list at
> all - we only need a pointer to the end of the list ("last"), and each
> derived generator only needs a pointer to its own current position in
> the list ("mylast").
> What I find kind of hilarious is that it's no help at all as a
> prototype for a C implementation: Python recycles stale `[next(it),
> None]` pairs all by itself, when their internal refcounts fall to 0.
> That's the hardest part.
> BTW, I certainly don't suggest adding this to the itertools docs
> either. While it's short and elegant, it's too subtle to grasp easily
> - if you think "it's obvious", you haven't yet thought hard enough
> about the problem ;-)
> Python-ideas mailing list
> Python-ideas at python.org
> Code of Conduct: http://python.org/psf/codeofconduct/
More information about the Python-ideas