[Python-ideas] A cute Python implementation of itertools.tee

Tim Peters tim.peters at gmail.com
Sun Apr 15 01:05:58 EDT 2018


Just for fun - no complaint, no suggestion, just sharing a bit of code
that tickled me.

The docs for `itertools.tee()` contain a Python work-alike, which is
easy to follow.  It gives each derived generator its own deque, and
when a new value is obtained from the original iterator it pushes that
value onto each of those deques.

Of course it's possible for them to share a single deque, but the code
gets more complicated.  Is it possible to make it simpler instead?

What it "really" needs is a shared singly-linked list of values,
pointing from oldest value to newest.  Then each derived generator can
just follow the links, and yield its next result in time independent
of the number of derived generators.  But how do you know when a new
value needs to be obtained from the original iterator, and how do you
know when space for an older value can be recycled (because all of the
derived generators have yielded it)?

I ended up with almost a full page of code to do that, storing with
each value and link a count of the number of derived generators that
had yet to yield the value, effectively coding my own reference-count
scheme by hand, along with "head" and "tail" pointers to the ends of
the linked list that proved remarkably tricky to keep correct in all
cases.

Then I thought "this is stupid!  Python already does reference
counting."  Voila!  Vast swaths of tedious code vanished, giving this
remarkably simple implementation:

    def mytee(xs, n):
        last = [None, None]

        def gen(it, mylast):
            nonlocal last
            while True:
                mylast = mylast[1]
                if not mylast:
                    mylast = last[1] = last = [next(it), None]
                yield mylast[0]

        it = iter(xs)
        return tuple(gen(it, last) for _ in range(n))

There's no need to keep a pointer to the start of the shared list at
all - we only need a pointer to the end of the list ("last"), and each
derived generator only needs a pointer to its own current position in
the list ("mylast").

What I find kind of hilarious is that it's no help at all as a
prototype for a C implementation:  Python recycles stale `[next(it),
None]` pairs all by itself, when their internal refcounts fall to 0.
That's the hardest part.

BTW, I certainly don't suggest adding this to the itertools docs
either.  While it's short and elegant, it's too subtle to grasp easily
- if you think "it's obvious", you haven't yet thought hard enough
about the problem ;-)


More information about the Python-ideas mailing list