Rewriting the "roundrobin" recipe in the itertools documentation

For taking values alternately from a series of iterables, there's two primary functions: builtin.zip itertools.zip_longest zip of course stops when the shortest iterable ends. zip_longest is generally a useful substitute for when you don't want the zip behavior, but it fills extra values in the blanks rather than just ignoring a finished iterator and moving on with the rest. This latter most use case is at least somewhat common, according to this[1] StackOverflow question (and other duplicates), in addition to the existence of the `roundrobin` recipe[2] in the itertools docs. The recipe satisfies this use case, and its code is repeated in the StackOverflow answer. However, it is remarkably unpythonic, in my opinion, which is one thing when such is necessary to achieve a goal, but for this functionality, such is most definitely *not* necessary. I'll paste the code here for quick reference: def roundrobin(*iterables): "roundrobin('ABC', 'D', 'EF') --> A D E B F C" pending = len(iterables) nexts = cycle(iter(it).__next__ for it in iterables) while pending: try: for next in nexts: yield next() except StopIteration: pending -= 1 nexts = cycle(islice(nexts, pending)) Things that strike me as unpythonic: 1) requiring the total number of input iterables 2) making gratuitous use of `next`, 3) using a while loop in code dealing with iterables, 4) combining loops, exceptions, and composed itertools functions in non-obvious ways that make control flow difficult to determine Now, I get it, looking at the "roughly equivalent to" code for zip_longest in the docs, there doesn't seem to be much way around it for generally similar goals, and as I said above, unpythonic is fine when necessary (practicality beats purity), but in this case, for being a "recipe" in the itertools docs, it should *make use* of the zip_longest which already does all the unpythonic stuff for you (though honestly I'm not convinced either that the zip_longest code in the docs is the most possible pythonic-ness). Instead, the following recipe (which I also submitted to the StackOverflow question, and which is generally similar to several other later answers, all remarking that they believe it's more pythonic) is much cleaner and more suited to demonstrating the power of itertools to new developers than the mess of a "recipe" pasted above. def roundrobin(*iters): "roundrobin('ABC', 'D', 'EF') --> A D E B F C" # Perhaps "flat_zip_nofill" is a better name, or something similar sentinel = object() for tup in it.zip_longest(*iters, fillvalue=sentinel): yield from (x for x in tup if x is not sentinel) In particular, this is just an extremely thin wrapper around zip_longest, whose primary purpose is to eliminate the otherwise-mandatory "fillvalues" that zip_longest requires to produce uniform-length tuples. It's also an excellent example of how to make best pythonic use of iterables in general, and itertools in particular, and as such a much better implementation to be demonstrated in documentation. I would thus advocate that the former recipe is replaced with the latter recipe, being much more pythonic, understandable, and useful for helping new developers acquire the style of python. (Using the common linguistics analogy: a dictionary and grammar for a spoken language may be enough to communicate, but we rely on a large body of literature -- fiction, research, poetry, etc -- as children to get that special flavor and most expressive taste to the language. The stdlib is no Shakespeare, but it and its docs still form an important part of the formative literature of the Python language.) I realize at the end of the day this is a pretty trivial and ultimately meaningless nit to pick, but I've never contributed before and have a variety of similar minor pain points in the docs/stdlib, and I'm trying to gauge 1) how well this sort of minor QoL improvement is wanted, and 2) even if it is wanted, am I going about it the right way. If the answers to both of these questions are positive regarding this particular case, then I'll look into making a BPO issue and pull request on GitHub, which IIUC is the standard path for contributions. Thank you for your consideration. ~~~~ [1]: https://stackoverflow.com/questions/3678869/ pythonic-way-to-combine-two-lists-in-an-alternating-fashion/ [2]: https://docs.python.org/3/library/itertools.html#itertools-recipes

I agree this is a much better recipe presented. Have you benchmarked the two on more realistically long iterators. E.g. a hundred iterators of millions of items where many terminate much earlier than others. I doubt the repeated 'is not' comparison makes much difference, but it would be good to see. On Nov 16, 2017 5:57 AM, "bunslow" <bunslow@gmail.com> wrote:

I think the idea behind the original recipe is that when one of the inner lists has been iterated through, it is removed and never looked at again. Imagine the following scenario: L is a list which contains one million empty lists and also a list containing one million numbers Then the original recipe will iterate over two million(ish) items: each inner list must get visited and each item from the long inner list must get visited. However, your use of zip_longest must visit one trillion items, which will likely not finish in a reasonable amount of time. I'm not saying that this is likely to be the case, but this is probably why the original recipe is what it is. It would be great to see a recipe that is more pythonic but that maintains the efficiencies of the first recipy, but I could not come up with one. -Brent On Thu, Nov 16, 2017 at 10:08 AM, David Mertz <mertz@gnosis.cx> wrote:

On 11/16/2017 8:56 AM, bunslow wrote:
These bunch together the nth items of each iterable, while itertools.cycle does not. ...
Things that strike me as unpythonic: 1) requiring the total number of input iterable, > 2) making gratuitous use of `next`,
I disagree that 1 and 2 are problems.
3) using a while loop in code dealing with iterables,
I agree that this is not necessary, and give a replacement below.
4) combining loops, exceptions, and composed itertools functions in non-obvious ways that make control flow difficult to determine
I agree that the correctness of the last statement is slightly opaque. But this nicely demonstrates a non-trivial use of cycle.
This adds and then deletes grouping and fill values that are not wanted. To me, this is an 'algorithm smell'. One of the principles of algorithm design is to avoid unnecessary calculations. For an edge case such as roundrobin(1000000*'a', ''), the above mostly does unnecessary work. The following combines 3 statements into one for statement. def roundrobin(*iterables): "roundrobin('ABC', 'D', 'EF') --> A D E B F C" nexts = cycle(iter(it).__next__ for it in iterables) for reduced_len in reversed(range(1, len(iterables))): try: for next in nexts: yield next() except StopIteration: nexts = cycle(islice(nexts, reduced_len))
But we do not want tuples or fill values.
I disagree. [I have mostly stopped using 'pythonic' because there is too much disagreement on particulars, and its use seems to inhibit as much as facilitate insight.] [snip more]
We constantly improve the docs.
2) even if it is wanted, am I going about it the right way.
Typos and trivial grammar issues can be filed as a PR with no issue required. Clarifications usually require an issue and perhaps discussion. Since this is more about philosophy of algorithm design, python-ideas was a good place to start.
Since I have a competing 'improvement', I would hold off on a PR until Raymond Hettinger, the itertools author, comments. -- Terry Jan Reedy

On 11/16/2017 2:56 PM, Terry Reedy wrote: Correct off-by-one error. I should have tested with an edge case such as print(list(roundrobin('ABC', '')))
Make that 0 rather than 1 for start value.
A slightly clearer, slightly less efficient alternative would be def roundrobin(*iterables): "roundrobin('ABC', 'D', 'EF') --> A D E B F C" nexts = cycle(iter(it).__next__ for it in iterables) for current_len in reversed(range(1, len(iterables)+1)): try: for next in nexts: yield next() except StopIteration: nexts = cycle(islice(nexts, current_len - 1)) -- Terry Jan Reedy

On 11/16/2017 5:57 PM, Terry Reedy wrote:
I submitted the 'current_len' version as https://bugs.python.org/issue32099 "Use range in itertools roundrobin recipe" -- Terry Jan Reedy

On Thu, Nov 16, 2017 at 02:56:29PM -0500, Terry Reedy wrote:
3) using a while loop in code dealing with iterables,
I agree that this is not necessary, and give a replacement below.
The OP isn't just saying that its unnecessary in this case, but that its unPythonic to ever use a while loop in code dealing with iterables. I disagree with that stronger statement.
Its a recipe, not a tuned and optimized piece of production code. And if you're going to criticise code on the basis of efficiency, then I would hope you've actually profiled the code first. Because it isn't clear to me at all that what you call "unnecessary work" is more expensive than re-writing the recipe using a more complex algorithm with calls to cycle and islice. But I'm not here to nit-pick your recipe over the earlier ones. [...]
Since I have a competing 'improvement', I would hold off on a PR until Raymond Hettinger, the itertools author, comments.
Raise a doc issue on the tracker, and take the discussion there. I think that this is too minor an issue to need long argument on the list. Besides, it's not really on-topic as such -- it isn't about a change to the language. Its about an implementation change to a recipe in the docs. -- Steve

On 11/16/2017 05:56 AM, bunslow wrote:
I realize at the end of the day this is a pretty trivial and ultimately meaningless nit to pick,
Not at all -- documentation is extremely important.
I found your example much easier to understand. Just include a note the the "roughly similar" python code is not performant in extreme situations; after all, it's there to aid understanding, not to be used. -- ~Ethan~

On Thu, Nov 16, 2017 at 07:56:21AM -0600, bunslow wrote about the roundrobin recipe in the itertools docs:
I don't find *any* of those things unPythonic in the least, nor the recipe difficult to follow.
The roundrobin recipe was introduced to the docs some time during Python 2.5. zip_longest (or izip_longest as it was originally called) wasn't introduced until 2.6. [...]
A "mess" of a quote-unquote "recipe". That's an insulting way of describing it. I'd like to see how you would have solved this problem back in Python 2.5. Your post is a very long-winded way of saying: "I have an improved, more modern implementation for the itertools roundrobin recipe, should I raise a documentation issue on the tracker to replace it?" to which I would say, "Sure, why not?".
1) is fine. 2) is not. You don't need to put down existing code to make your point, nor do you have to sing your code's praises. Let the code speak for itself. And while its one thing to write detailed and pedantic posts (I have a tendency to do likewise) know when its necessary and when it isn't. Sometimes we need to discuss a lot of background material to explain something, but this wasn't one of those times. Thank you, -- Steve

I find this refactored code intriguing. While I would not suggest changes to the itertools recipes, it is a pleasant exercise to think of alternative recipes with an itertools way. There are a few third-party libraries dedicated to itertools recipes. For example, more-itertools has an iter-like recipe called `more_itertools.interleave_longest`. It turns out it behaves like `round_robin` and the implementation happens to be similar to your suggestion: _marker = object() def interleave_longest(*iterables): """Return a new iterable yielding from each iterable in turn, skipping any that are exhausted. >>> list(interleave_longest([1, 2, 3], [4, 5], [6, 7, 8])) [1, 4, 6, 2, 5, 7, 3, 8] """ i = chain.from_iterable(zip_longest(*iterables, fillvalue=_marker)) return filter(lambda x: x is not _marker, i) For comparison, your suggestion: def roundrobin(*iters):
In summary, both versions zip iterables and filter sentinels. I believe `yield from` in your suggestion restricts your recipe to Python 3 (which can be resolved with a for loop). Aside, given some performance gain, such third-party libraries may be better suited for your particular contribution. On Thu, Nov 16, 2017 at 8:56 AM, bunslow <bunslow@gmail.com> wrote:

The roundrobin() implementation in recipes has quadratic time for large number of iterables. As well as all other proposed implementations. This is a problem if you use it with hundreds or thousands of iterables. For example: list(roundrobin(*([[1]]*1000))) next(roundrobin(*([[]]*1000 + [[1]]]))) The following implementation has linear time. def roundrobin(*iterables): "roundrobin('ABC', 'D', 'EF') --> A D E B F C" nexts = [iter(it).__next__ for it in iterables] while nexts: next_nexts = [] append = next_nexts.append for next in nexts: try: yield next() except StopIteration: pass else: append(next) nexts = next_nexts Actually it expands cycle() in Python. And this makes it slower for smaller number of iterations.

21.11.17 11:44, Serhiy Storchaka пише:
Yet one natural implementation with linear time is: from collections import deque def roundrobin(*iterables): "roundrobin('ABC', 'D', 'EF') --> A D E B F C" nexts = deque(iter(it).__next__ for it in iterables) popleft = nexts.popleft append = nexts.append while nexts: next = popleft() try: yield next() except StopIteration: pass else: append(next) As the previous example it doesn't have any relation to itertools.

I agree this is a much better recipe presented. Have you benchmarked the two on more realistically long iterators. E.g. a hundred iterators of millions of items where many terminate much earlier than others. I doubt the repeated 'is not' comparison makes much difference, but it would be good to see. On Nov 16, 2017 5:57 AM, "bunslow" <bunslow@gmail.com> wrote:

I think the idea behind the original recipe is that when one of the inner lists has been iterated through, it is removed and never looked at again. Imagine the following scenario: L is a list which contains one million empty lists and also a list containing one million numbers Then the original recipe will iterate over two million(ish) items: each inner list must get visited and each item from the long inner list must get visited. However, your use of zip_longest must visit one trillion items, which will likely not finish in a reasonable amount of time. I'm not saying that this is likely to be the case, but this is probably why the original recipe is what it is. It would be great to see a recipe that is more pythonic but that maintains the efficiencies of the first recipy, but I could not come up with one. -Brent On Thu, Nov 16, 2017 at 10:08 AM, David Mertz <mertz@gnosis.cx> wrote:

On 11/16/2017 8:56 AM, bunslow wrote:
These bunch together the nth items of each iterable, while itertools.cycle does not. ...
Things that strike me as unpythonic: 1) requiring the total number of input iterable, > 2) making gratuitous use of `next`,
I disagree that 1 and 2 are problems.
3) using a while loop in code dealing with iterables,
I agree that this is not necessary, and give a replacement below.
4) combining loops, exceptions, and composed itertools functions in non-obvious ways that make control flow difficult to determine
I agree that the correctness of the last statement is slightly opaque. But this nicely demonstrates a non-trivial use of cycle.
This adds and then deletes grouping and fill values that are not wanted. To me, this is an 'algorithm smell'. One of the principles of algorithm design is to avoid unnecessary calculations. For an edge case such as roundrobin(1000000*'a', ''), the above mostly does unnecessary work. The following combines 3 statements into one for statement. def roundrobin(*iterables): "roundrobin('ABC', 'D', 'EF') --> A D E B F C" nexts = cycle(iter(it).__next__ for it in iterables) for reduced_len in reversed(range(1, len(iterables))): try: for next in nexts: yield next() except StopIteration: nexts = cycle(islice(nexts, reduced_len))
But we do not want tuples or fill values.
I disagree. [I have mostly stopped using 'pythonic' because there is too much disagreement on particulars, and its use seems to inhibit as much as facilitate insight.] [snip more]
We constantly improve the docs.
2) even if it is wanted, am I going about it the right way.
Typos and trivial grammar issues can be filed as a PR with no issue required. Clarifications usually require an issue and perhaps discussion. Since this is more about philosophy of algorithm design, python-ideas was a good place to start.
Since I have a competing 'improvement', I would hold off on a PR until Raymond Hettinger, the itertools author, comments. -- Terry Jan Reedy

On 11/16/2017 2:56 PM, Terry Reedy wrote: Correct off-by-one error. I should have tested with an edge case such as print(list(roundrobin('ABC', '')))
Make that 0 rather than 1 for start value.
A slightly clearer, slightly less efficient alternative would be def roundrobin(*iterables): "roundrobin('ABC', 'D', 'EF') --> A D E B F C" nexts = cycle(iter(it).__next__ for it in iterables) for current_len in reversed(range(1, len(iterables)+1)): try: for next in nexts: yield next() except StopIteration: nexts = cycle(islice(nexts, current_len - 1)) -- Terry Jan Reedy

On 11/16/2017 5:57 PM, Terry Reedy wrote:
I submitted the 'current_len' version as https://bugs.python.org/issue32099 "Use range in itertools roundrobin recipe" -- Terry Jan Reedy

On Thu, Nov 16, 2017 at 02:56:29PM -0500, Terry Reedy wrote:
3) using a while loop in code dealing with iterables,
I agree that this is not necessary, and give a replacement below.
The OP isn't just saying that its unnecessary in this case, but that its unPythonic to ever use a while loop in code dealing with iterables. I disagree with that stronger statement.
Its a recipe, not a tuned and optimized piece of production code. And if you're going to criticise code on the basis of efficiency, then I would hope you've actually profiled the code first. Because it isn't clear to me at all that what you call "unnecessary work" is more expensive than re-writing the recipe using a more complex algorithm with calls to cycle and islice. But I'm not here to nit-pick your recipe over the earlier ones. [...]
Since I have a competing 'improvement', I would hold off on a PR until Raymond Hettinger, the itertools author, comments.
Raise a doc issue on the tracker, and take the discussion there. I think that this is too minor an issue to need long argument on the list. Besides, it's not really on-topic as such -- it isn't about a change to the language. Its about an implementation change to a recipe in the docs. -- Steve

On 11/16/2017 05:56 AM, bunslow wrote:
I realize at the end of the day this is a pretty trivial and ultimately meaningless nit to pick,
Not at all -- documentation is extremely important.
I found your example much easier to understand. Just include a note the the "roughly similar" python code is not performant in extreme situations; after all, it's there to aid understanding, not to be used. -- ~Ethan~

On Thu, Nov 16, 2017 at 07:56:21AM -0600, bunslow wrote about the roundrobin recipe in the itertools docs:
I don't find *any* of those things unPythonic in the least, nor the recipe difficult to follow.
The roundrobin recipe was introduced to the docs some time during Python 2.5. zip_longest (or izip_longest as it was originally called) wasn't introduced until 2.6. [...]
A "mess" of a quote-unquote "recipe". That's an insulting way of describing it. I'd like to see how you would have solved this problem back in Python 2.5. Your post is a very long-winded way of saying: "I have an improved, more modern implementation for the itertools roundrobin recipe, should I raise a documentation issue on the tracker to replace it?" to which I would say, "Sure, why not?".
1) is fine. 2) is not. You don't need to put down existing code to make your point, nor do you have to sing your code's praises. Let the code speak for itself. And while its one thing to write detailed and pedantic posts (I have a tendency to do likewise) know when its necessary and when it isn't. Sometimes we need to discuss a lot of background material to explain something, but this wasn't one of those times. Thank you, -- Steve

I find this refactored code intriguing. While I would not suggest changes to the itertools recipes, it is a pleasant exercise to think of alternative recipes with an itertools way. There are a few third-party libraries dedicated to itertools recipes. For example, more-itertools has an iter-like recipe called `more_itertools.interleave_longest`. It turns out it behaves like `round_robin` and the implementation happens to be similar to your suggestion: _marker = object() def interleave_longest(*iterables): """Return a new iterable yielding from each iterable in turn, skipping any that are exhausted. >>> list(interleave_longest([1, 2, 3], [4, 5], [6, 7, 8])) [1, 4, 6, 2, 5, 7, 3, 8] """ i = chain.from_iterable(zip_longest(*iterables, fillvalue=_marker)) return filter(lambda x: x is not _marker, i) For comparison, your suggestion: def roundrobin(*iters):
In summary, both versions zip iterables and filter sentinels. I believe `yield from` in your suggestion restricts your recipe to Python 3 (which can be resolved with a for loop). Aside, given some performance gain, such third-party libraries may be better suited for your particular contribution. On Thu, Nov 16, 2017 at 8:56 AM, bunslow <bunslow@gmail.com> wrote:

The roundrobin() implementation in recipes has quadratic time for large number of iterables. As well as all other proposed implementations. This is a problem if you use it with hundreds or thousands of iterables. For example: list(roundrobin(*([[1]]*1000))) next(roundrobin(*([[]]*1000 + [[1]]]))) The following implementation has linear time. def roundrobin(*iterables): "roundrobin('ABC', 'D', 'EF') --> A D E B F C" nexts = [iter(it).__next__ for it in iterables] while nexts: next_nexts = [] append = next_nexts.append for next in nexts: try: yield next() except StopIteration: pass else: append(next) nexts = next_nexts Actually it expands cycle() in Python. And this makes it slower for smaller number of iterations.

21.11.17 11:44, Serhiy Storchaka пише:
Yet one natural implementation with linear time is: from collections import deque def roundrobin(*iterables): "roundrobin('ABC', 'D', 'EF') --> A D E B F C" nexts = deque(iter(it).__next__ for it in iterables) popleft = nexts.popleft append = nexts.append while nexts: next = popleft() try: yield next() except StopIteration: pass else: append(next) As the previous example it doesn't have any relation to itertools.
participants (9)
-
brent bejot
-
bunslow
-
David Mertz
-
Ethan Furman
-
Neil Girdhar
-
pylang
-
Serhiy Storchaka
-
Steven D'Aprano
-
Terry Reedy