The topic reminded me of Stephan Houben's streamtee. https://github.com/stephanh42/streamtee/blob/master/streamtee.py It was an attempt at a memory-efficient tee, but it turned out tee was efficient enough. It uses a thunk-like method and recursion to remove the need for an explicit linked list. In the spirit of the topic, I tried to strip it down to the bare essentials: def tee(iterable, n): stream = _streamiter(iter(iterable)) return tuple(_wrapper(stream) for _ in range(n)) def _streamiter(itr): result = (next(itr), _streamiter(itr)) del itr # Release unneeded reference. while True: yield result def _wrapper(stream): while True: result, stream = next(stream) yield result Unfortunately, this code, too, breaks with the new StopIteration propagation rules. On Sun, Apr 15, 2018 at 1:05 AM, Tim Peters <tim.peters@gmail.com> wrote:
Just for fun - no complaint, no suggestion, just sharing a bit of code that tickled me.
The docs for `itertools.tee()` contain a Python work-alike, which is easy to follow. It gives each derived generator its own deque, and when a new value is obtained from the original iterator it pushes that value onto each of those deques.
Of course it's possible for them to share a single deque, but the code gets more complicated. Is it possible to make it simpler instead?
What it "really" needs is a shared singly-linked list of values, pointing from oldest value to newest. Then each derived generator can just follow the links, and yield its next result in time independent of the number of derived generators. But how do you know when a new value needs to be obtained from the original iterator, and how do you know when space for an older value can be recycled (because all of the derived generators have yielded it)?
I ended up with almost a full page of code to do that, storing with each value and link a count of the number of derived generators that had yet to yield the value, effectively coding my own reference-count scheme by hand, along with "head" and "tail" pointers to the ends of the linked list that proved remarkably tricky to keep correct in all cases.
Then I thought "this is stupid! Python already does reference counting." Voila! Vast swaths of tedious code vanished, giving this remarkably simple implementation:
def mytee(xs, n): last = [None, None]
def gen(it, mylast): nonlocal last while True: mylast = mylast[1] if not mylast: mylast = last[1] = last = [next(it), None] yield mylast[0]
it = iter(xs) return tuple(gen(it, last) for _ in range(n))
There's no need to keep a pointer to the start of the shared list at all - we only need a pointer to the end of the list ("last"), and each derived generator only needs a pointer to its own current position in the list ("mylast").
What I find kind of hilarious is that it's no help at all as a prototype for a C implementation: Python recycles stale `[next(it), None]` pairs all by itself, when their internal refcounts fall to 0. That's the hardest part.
BTW, I certainly don't suggest adding this to the itertools docs either. While it's short and elegant, it's too subtle to grasp easily - if you think "it's obvious", you haven't yet thought hard enough about the problem ;-) _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/