[Python-ideas] A cute Python implementation of itertools.tee

Antoine Pitrou solipsis at pitrou.net
Sun Apr 15 05:55:46 EDT 2018


On Sun, 15 Apr 2018 00:05:58 -0500
Tim Peters <tim.peters at gmail.com> wrote:

> Just for fun - no complaint, no suggestion, just sharing a bit of code
> that tickled me.
> 
> The docs for `itertools.tee()` contain a Python work-alike, which is
> easy to follow.  It gives each derived generator its own deque, and
> when a new value is obtained from the original iterator it pushes that
> value onto each of those deques.
> 
> Of course it's possible for them to share a single deque, but the code
> gets more complicated.  Is it possible to make it simpler instead?
> 
> What it "really" needs is a shared singly-linked list of values,
> pointing from oldest value to newest.  Then each derived generator can
> just follow the links, and yield its next result in time independent
> of the number of derived generators.  But how do you know when a new
> value needs to be obtained from the original iterator, and how do you
> know when space for an older value can be recycled (because all of the
> derived generators have yielded it)?
> 
> I ended up with almost a full page of code to do that, storing with
> each value and link a count of the number of derived generators that
> had yet to yield the value, effectively coding my own reference-count
> scheme by hand, along with "head" and "tail" pointers to the ends of
> the linked list that proved remarkably tricky to keep correct in all
> cases.
> 
> Then I thought "this is stupid!  Python already does reference
> counting."  Voila!  Vast swaths of tedious code vanished, giving this
> remarkably simple implementation:

This implementation doesn't work with Python 3.7 or 3.8.
I've tried it here:
https://gist.github.com/pitrou/b3991f638300edb6d06b5be23a4c66d6

and get:
Traceback (most recent call last):
  File "mytee.py", line 14, in gen
    mylast = last[1] = last = [next(it), None]
StopIteration

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "mytee.py", line 47, in <module>
    run(mytee1)
  File "mytee.py", line 36, in run
    lists[i].append(next(iters[i]))
RuntimeError: generator raised StopIteration

(Yuck!)

In short, you want the following instead:

                try:
                    mylast = last[1] = last = [next(it), None]
                except StopIteration:
                    return

>     def mytee(xs, n):
>         last = [None, None]
> 
>         def gen(it, mylast):
>             nonlocal last
>             while True:
>                 mylast = mylast[1]
>                 if not mylast:
>                     mylast = last[1] = last = [next(it), None]

That's smart and obscure :-o
The way it works is that the `last` assignment changes the `last` value
seen by all derived generators, while the `last[1]` assignment updates
the bindings made in the other generators' `mylast` lists...  It's
difficult to find the words to explain it.

The chained assignment makes it more difficult to parse as well (when I
read this I don't know if `last[i]` or `last` gets assigned first;
apparently the answer is `last[i]`, otherwise the recipe wouldn't work
correctly).  Perhaps like this:

        while True:
            mylast = mylast[1]
            if not mylast:
                try:
                    # Create new list link
                    mylast = [next(it), None]
                except StopIteration:
                    return
                else:
                    # Append to other generators `mylast` linked lists
                    last[1] = mylast
                    # Update shared list link
                    last = last[1]
            yield mylast[0]


Regards

Antoine.




More information about the Python-ideas mailing list