[Python-ideas] A cute Python implementation of itertools.tee
Antoine Pitrou
solipsis at pitrou.net
Sun Apr 15 05:55:46 EDT 2018
On Sun, 15 Apr 2018 00:05:58 -0500
Tim Peters <tim.peters at gmail.com> wrote:
> Just for fun - no complaint, no suggestion, just sharing a bit of code
> that tickled me.
>
> The docs for `itertools.tee()` contain a Python work-alike, which is
> easy to follow. It gives each derived generator its own deque, and
> when a new value is obtained from the original iterator it pushes that
> value onto each of those deques.
>
> Of course it's possible for them to share a single deque, but the code
> gets more complicated. Is it possible to make it simpler instead?
>
> What it "really" needs is a shared singly-linked list of values,
> pointing from oldest value to newest. Then each derived generator can
> just follow the links, and yield its next result in time independent
> of the number of derived generators. But how do you know when a new
> value needs to be obtained from the original iterator, and how do you
> know when space for an older value can be recycled (because all of the
> derived generators have yielded it)?
>
> I ended up with almost a full page of code to do that, storing with
> each value and link a count of the number of derived generators that
> had yet to yield the value, effectively coding my own reference-count
> scheme by hand, along with "head" and "tail" pointers to the ends of
> the linked list that proved remarkably tricky to keep correct in all
> cases.
>
> Then I thought "this is stupid! Python already does reference
> counting." Voila! Vast swaths of tedious code vanished, giving this
> remarkably simple implementation:
This implementation doesn't work with Python 3.7 or 3.8.
I've tried it here:
https://gist.github.com/pitrou/b3991f638300edb6d06b5be23a4c66d6
and get:
Traceback (most recent call last):
File "mytee.py", line 14, in gen
mylast = last[1] = last = [next(it), None]
StopIteration
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "mytee.py", line 47, in <module>
run(mytee1)
File "mytee.py", line 36, in run
lists[i].append(next(iters[i]))
RuntimeError: generator raised StopIteration
(Yuck!)
In short, you want the following instead:
try:
mylast = last[1] = last = [next(it), None]
except StopIteration:
return
> def mytee(xs, n):
> last = [None, None]
>
> def gen(it, mylast):
> nonlocal last
> while True:
> mylast = mylast[1]
> if not mylast:
> mylast = last[1] = last = [next(it), None]
That's smart and obscure :-o
The way it works is that the `last` assignment changes the `last` value
seen by all derived generators, while the `last[1]` assignment updates
the bindings made in the other generators' `mylast` lists... It's
difficult to find the words to explain it.
The chained assignment makes it more difficult to parse as well (when I
read this I don't know if `last[i]` or `last` gets assigned first;
apparently the answer is `last[i]`, otherwise the recipe wouldn't work
correctly). Perhaps like this:
while True:
mylast = mylast[1]
if not mylast:
try:
# Create new list link
mylast = [next(it), None]
except StopIteration:
return
else:
# Append to other generators `mylast` linked lists
last[1] = mylast
# Update shared list link
last = last[1]
yield mylast[0]
Regards
Antoine.
More information about the Python-ideas
mailing list