Split iterator into multiple streams

Arnaud Delobelle arnodel at gmail.com
Sat Nov 6 11:45:18 EDT 2010


Steven D'Aprano <steve at REMOVE-THIS-cybersource.com.au> writes:

> Suppose I have an iterator that yields tuples of N items (a, b, ... n).
>
> I want to split this into N independent iterators:
>
> iter1 -> a, a2, a3, ...
> iter2 -> b, b2, b3, ...
> ...
> iterN -> n, n2, n3, ...
>
> The iterator may be infinite, or at least too big to collect in a list.
>
> My first attempt was this:
>
>
> def split(iterable, n):
>     iterators = []
>     for i, iterator in enumerate(itertools.tee(iterable, n)):
>         iterators.append((t[i] for t in iterator))
>     return tuple(iterators)
>
> But it doesn't work, as all the iterators see the same values:
>
>>>> data = [(1,2,3), (4,5,6), (7,8,9)]
>>>> a, b, c = split(data, 3)
>>>> list(a), list(b), list(c)
> ([3, 6, 9], [3, 6, 9], [3, 6, 9])
>
>
> I tried changing the t[i] to use operator.itergetter instead, but no 
> luck. Finally I got this:
>
> def split(iterable, n):
>     iterators = []
>     for i, iterator in enumerate(itertools.tee(iterable, n)):
>         f = lambda it, i=i: (t[i] for t in it)
>         iterators.append(f(iterator))
>     return tuple(iterators)
>
> which seems to work:
>
>>>> data = [(1,2,3), (4,5,6), (7,8,9)]
>>>> a, b, c = split(data, 3)
>>>> list(a), list(b), list(c)
> ([1, 4, 7], [2, 5, 8], [3, 6, 9])
>
>
>
>
> Is this the right approach, or have I missed something obvious?

It is quite straightforward to implement your "split" function without
itertools.tee: 

from collections import deque

def split(iterable):
    it = iter(iterable)
    q = [deque([x]) for x in it.next()] 
    def proj(qi):
        while True:
            if not qi:
                for qj, xj in zip(q, it.next()):
                    qj.append(xj)
            yield qi.popleft()
    for qi in q:
        yield proj(qi)

>>> data = [(1,2,3), (4,5,6), (7,8,9)]
>>> a, b, c = split(data)
>>> print list(a), list(b), list(c)
[1, 4, 7] [2, 5, 8] [3, 6, 9]

Interestingly, given "split" it is very easy to implement "tee":

def tee(iterable, n=2):
    return split(([x]*n for x in iterable))

>>> a, b = tee(range(10), 2)
>>> a.next(), a.next(), b.next()
(0, 1, 0)
>>> a.next(), a.next(), b.next()
(2, 3, 1)

In fact, split(x) is the same as zip(*x) when x is finite.  The
difference is that with split(x), x is allowed to be infinite and with
zip(*x), each term of x is allowed to be infinite.  It may be good to
have a function unifying the two.

-- 
Arnaud



More information about the Python-list mailing list