[Python-ideas] a memory-efficient variant of itertools.tee
leewangzhong+python at gmail.com
Sat May 27 12:53:26 EDT 2017
On Fri, May 26, 2017 at 2:45 PM, Stephan Houben <stephanh42 at gmail.com> wrote:
> Hi all,
> The itertools.tee function can hold on to objects "unnecessarily".
> In particular, if you do
> iter2 = itertools.tee(iter1, 2)
> i.e. you "leak" one of the returned iterators, then all returned
> objects are not collected until also iter2 is collected.
> I propose a different implementation, namely the one in:
> streamtee.tee is a drop-in alternative for itertools.tee but
> as you can see from the test in the repo, it will not hold
> on to the generated objects as long.
> I propose this as an improved implementation of itertools.tee.
For convenience, the implementation itself is here (33 lines including
I like this. It uses an infinite generator as a thunk. Though I think
it belongs on bugs.python.org rather than -ideas, because the
interface should be the same and the memory/time use is asymptotically
The current tee implementation in C (lines 378 to 852):
What bothers me is, the current implementation looks very similar to
what I imagine the C implementation of this looks like. Instead of
thunks with a single element, it has thunks with up to LINKCELLS
elements. The comment from the commit
> Formerly, underlying queue was implemented in terms of two lists. The
> new queue is a series of singly-linked fixed length lists.
> The new implementation runs much faster, supports multi-way tees, and
> allows tees of tees without additional memory costs.
The delay in deletion, then, seems to be a feature for efficiency, and
can be implemented with `#define LINKCELLS 1`.
P.S.: I couldn't find a pure Python implementation of tee in the CPython repo.
More information about the Python-ideas