[Python-ideas] A cute Python implementation of itertools.tee

Franklin? Lee leewangzhong+python at gmail.com
Mon May 14 06:49:25 EDT 2018


The topic reminded me of Stephan Houben's streamtee.
https://github.com/stephanh42/streamtee/blob/master/streamtee.py

It was an attempt at a memory-efficient tee, but it turned out tee was
efficient enough. It uses a thunk-like method and recursion to remove
the need for an explicit linked list.

In the spirit of the topic, I tried to strip it down to the bare essentials:

    def tee(iterable, n):
        stream = _streamiter(iter(iterable))
        return tuple(_wrapper(stream) for _ in range(n))

    def _streamiter(itr):
        result = (next(itr), _streamiter(itr))
        del itr # Release unneeded reference.
        while True:
            yield result

    def _wrapper(stream):
        while True:
            result, stream = next(stream)
            yield result

Unfortunately, this code, too, breaks with the new StopIteration
propagation rules.

On Sun, Apr 15, 2018 at 1:05 AM, Tim Peters <tim.peters at gmail.com> wrote:
> Just for fun - no complaint, no suggestion, just sharing a bit of code
> that tickled me.
>
> The docs for `itertools.tee()` contain a Python work-alike, which is
> easy to follow.  It gives each derived generator its own deque, and
> when a new value is obtained from the original iterator it pushes that
> value onto each of those deques.
>
> Of course it's possible for them to share a single deque, but the code
> gets more complicated.  Is it possible to make it simpler instead?
>
> What it "really" needs is a shared singly-linked list of values,
> pointing from oldest value to newest.  Then each derived generator can
> just follow the links, and yield its next result in time independent
> of the number of derived generators.  But how do you know when a new
> value needs to be obtained from the original iterator, and how do you
> know when space for an older value can be recycled (because all of the
> derived generators have yielded it)?
>
> I ended up with almost a full page of code to do that, storing with
> each value and link a count of the number of derived generators that
> had yet to yield the value, effectively coding my own reference-count
> scheme by hand, along with "head" and "tail" pointers to the ends of
> the linked list that proved remarkably tricky to keep correct in all
> cases.
>
> Then I thought "this is stupid!  Python already does reference
> counting."  Voila!  Vast swaths of tedious code vanished, giving this
> remarkably simple implementation:
>
>     def mytee(xs, n):
>         last = [None, None]
>
>         def gen(it, mylast):
>             nonlocal last
>             while True:
>                 mylast = mylast[1]
>                 if not mylast:
>                     mylast = last[1] = last = [next(it), None]
>                 yield mylast[0]
>
>         it = iter(xs)
>         return tuple(gen(it, last) for _ in range(n))
>
> There's no need to keep a pointer to the start of the shared list at
> all - we only need a pointer to the end of the list ("last"), and each
> derived generator only needs a pointer to its own current position in
> the list ("mylast").
>
> What I find kind of hilarious is that it's no help at all as a
> prototype for a C implementation:  Python recycles stale `[next(it),
> None]` pairs all by itself, when their internal refcounts fall to 0.
> That's the hardest part.
>
> BTW, I certainly don't suggest adding this to the itertools docs
> either.  While it's short and elegant, it's too subtle to grasp easily
> - if you think "it's obvious", you haven't yet thought hard enough
> about the problem ;-)
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/


More information about the Python-ideas mailing list