[issue32099] Use range in itertools roundrobin recipe
New submission from Terry J. Reedy <tjreedy@udel.edu>: The itertools roundrobin recipe has an outer loop executed a preset number of times. It is currently implemented with two assignments and a while loop. https://docs.python.org/3/library/itertools.html#itertools-recipes These can be replaced with a for loop using a reversed range. def roundrobin(*iterables): "roundrobin('ABC', 'D', 'EF') --> A D E B F C" nexts = cycle(iter(it).__next__ for it in iterables) for current_len in reversed(range(1, len(iterables)+1)): try: for next in nexts: yield next() except StopIteration: nexts = cycle(islice(nexts, current_len - 1)) I think this is easier to understand. So do some other participants in the current python-ideas thread 'Rewriting the "roundrobin" recipe in the itertools documentation'. I changed 'pending' to 'current_len' because, to me, 'pending' should be the set of iter_nexts and not their number. I originally avoided the '-1' in the islice call by decrementing both range arguments by 1 and calling the loop variable 'reduced_len'. But having the loop variable be the size of the nexts iterable in the next outer iteration seemed confusing and not worth the trivial efficiency gain. An independent change would be to replace 'next' with 'iter' on the basis that reusing the builtin name is not good and because 'nexts' is awkward to pronounce. I will produce a PR if any version is preferred to the current one. --- The OP proposed, and some participants like, an accept-reject algorithm based on zip_longest. def roundrobin(*iters): "roundrobin('ABC', 'D', 'EF') --> A D E B F C" # Perhaps "flat_zip_nofill" is a better name, or something similar sentinel = object() for tup in it.zip_longest(*iters, fillvalue=sentinel): yield from (x for x in tup if x is not sentinel) I dislike creating tuples we don't want, with values we don't want, with an arbitrarily small acceptance ratio. I also note that zip_longest is properly used in grouper, whereas roundrobin is the only recipe using cycle. ---------- assignee: docs@python components: Documentation messages: 306605 nosy: docs@python, rhettinger, terry.reedy priority: normal severity: normal stage: needs patch status: open title: Use range in itertools roundrobin recipe type: enhancement versions: Python 3.7 _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue32099> _______________________________________
Dubslow <bunslow@gmail.com> added the comment: Note that this follows a bit of discussion on python-ideas, in two threads: https://mail.python.org/pipermail/python-ideas/2017-November/047920.html https://mail.python.org/pipermail/python-ideas/2017-November/047989.html I agree the zip_longest-derived solution is both easier to read/understand and also not really accurate, especially in extreme cases. But, and see the second thread, I think it might just be better to add this funcitonality itself to itertools, under the name zip_flat -- per the second thread, there's a lot of confusion around the topic beyond just the suitability of the current recipe (including such things as lack of a clear name, legibility, efficiency, and brevity -- a fair number of people are looking for one or two liners, even if that doesn't really exist). (One alternative to zip_flat would be to add a second keyword argument to zip_longest that *doesn't* use a fillvalue, producing variable-length tuples when the supplied iters run out. Then the recipe and the entire conversation/topic could be replaced by "yield from zip_longest(*iters, usefill=False)".) Despite that opinion, this suggested change is better than nothing. ---------- nosy: +Dubslow _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue32099> _______________________________________
Dubslow <bunslow@gmail.com> added the comment: Regarding the current bug more specifically, I think a few comments would go a long way to helping readers understand the code. And also, I do think the (1, +1, -1) is less readable, simply because it doesn't follow the most common usage patterns of range, where your first version using (0,0,0) (with implicit zeroes of course) is cleaner. It's much more apparent how long the loop is at first glance ("loop once per iter" vs "loop... from what to what? oh, once per iter"). Perhaps not using reversed() but instead setting step=-1 would be a better case to use the off-by-1 variant. Such a version with both proposed changes (which are independent and can be considered or accepted separately) looks like this: def roundrobin(*iterables): "roundrobin('ABC', 'D', 'EF') --> A D E B F C" # This recipe is also called other names like "alternate", "interleave", or # "merge". "zip_flat" would also be an accurate name. # Algorithm: cycle around the __next__ methods of each iterable. when an # iter runs out, remove its __next__ from the cycle. nexts = cycle(iter(it).__next__ for it in iterables) for reduced_len in reversed(range(len(iterables))): try: for next in nexts: yield next() except StopIteration: nexts = cycle(islice(nexts, reduced_len)) # This skips one round of the cycle, starting the next round # without the current iter The last comment is probably the least valuable, though it does point out the slight quirk of how the last line works. I suppose this is the case for having the loop variable be named the actual length, but I think most Python users are accustomed to the last element of a list having index length-1. At any rate, I think the readability gain in the for statement is more than the readability loss in the islice(). ---------- _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue32099> _______________________________________
Dubslow <bunslow@gmail.com> added the comment: Perhaps the loop variable could be renamed to "len_minus_1" or some such something which is more semantic to the algorithm. ---------- _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue32099> _______________________________________
Dubslow <bunslow@gmail.com> added the comment: Er, in my first message, make that "(yield from tup for tup in zip_longest(*iters, usefill=False))" ---------- _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue32099> _______________________________________
Change by Raymond Hettinger <raymond.hettinger@gmail.com>: ---------- assignee: docs@python -> rhettinger _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue32099> _______________________________________
Raymond Hettinger <raymond.hettinger@gmail.com> added the comment: IMO, George Sakkis's original version is better than the TJR's proposed revision which has three separate adjustments by one and adds to extraneous functions which aren't central to the example. Dubslow's alternative at least avoids the three offsets by one; however, it has the disadvantage that meaning of the loop induction variable is somewhat indirect and creates its own source of confusion. I agree that some algorithmic comment should be added, probably just for the last line. Let's not add any of the listed alternative names to the comments. Each one is either confusing or wrong. The word "merge" already has an established, different meaning as in "sort/merge". For example, heapq.merge('ABC', 'D', 'EF') gives ['A', 'B', 'C', 'D', 'E', 'F']. The word "alternate" tends to mean "toggle back-and-forth" in common usage. The term "zip_flat" has no meaning to me, it has no hits on Google, and the closest match is a recipe on StackOverflow that does something different. And "interleave" is not suggestive of looping back over the iterables until each is exhausted. Also, we may yet use interleave() to mean something else. In contrast, "round robin" has a well established meaning that matches what this recipe does. Until now, not a single reader has ever claimed to not know what it means. https://en.wikipedia.org/wiki/Round-robin_scheduling FWIW, the recipe has several benefit. 1) Give a way to implement round robin iteration without re-inventing the wheel, 2) Demonstrate ways to use cycle() and islice(). 3) Demonstrate useful optimization technique of factoring the try/except out of the for-loop, 4) Demonstrate the useful optimization technique of calling bound methods, and 5) Be concise (I've left longer or more complex recipes for the ASPN cookbook or StackOverflow). Ideally, I would prefer to not add two extra builtin lookups (the recipe is sometime used on short inputs which would be affected by the extra overhead). Also, I prefer the visual weight to be on the central message of tight inner loops that exploit itertools rather than having the visual weight shift to the for-loop which is the least important part. Can you a suggest a concise single line comment that would make the last line clearer about what it is doing? Also, I'm open to changing the name of the "pending" variable but think "current_len" and "reduced_len" are not improvements. ---------- _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue32099> _______________________________________
Tim Peters <tim@python.org> added the comment: I agree the current recipe strikes a very nice balance among competing interests, and is educational on several counts. s/pending/numactive/ # Remove the iterator we just exhausted from the cycle. numactive -= 1 nexts = cycle(islice(nexts, numactive)) ---------- nosy: +tim.peters _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue32099> _______________________________________
Change by Raymond Hettinger <raymond.hettinger@gmail.com>: ---------- keywords: +patch pull_requests: +4424 stage: needs patch -> patch review _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue32099> _______________________________________
Serhiy Storchaka <storchaka+cpython@gmail.com> added the comment: roundrobin() has quadratic computational complexity. For example list(roundrobin(*([[1]]*N))). Is there a way to make it with linear complexity? ---------- nosy: +serhiy.storchaka _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue32099> _______________________________________
Raymond Hettinger <raymond.hettinger@gmail.com> added the comment: Serhiy, I think you're focusing on an irrelevant edge case and reading too much into the recipe. You could create an underlying binary tree with O(n) iteration and O(log n) deletion but then that completely misses the point of the itertools recipes and would likely be pointless in the real-world and likely perform worse than the current recipe in common cases unless you wrote a C extension for it. ---------- _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue32099> _______________________________________
Change by Raymond Hettinger <raymond.hettinger@gmail.com>: ---------- Removed message: https://bugs.python.org/msg306629 _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue32099> _______________________________________
Serhiy Storchaka <storchaka+cpython@gmail.com> added the comment: I agree that using binary tree would be excessively here. I just wondering if there is a simple way to get rid of the quadratic complexity. In any case this does not matter much until you work with many hundreds or thousands of iterables. ---------- _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue32099> _______________________________________
Serhiy Storchaka <storchaka+cpython@gmail.com> added the comment: Actually I tried to test if this implementation can cause a crash due to using deeply nested iterators (like in issue14010). For example in next(roundrobin(*([[]]*N + [[1]]))). But if there is such problem here, roundrobin() becomes unusable due to quadratic time for smaller N (tens of thousands). ---------- _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue32099> _______________________________________
Change by Roundup Robot <devnull@psf.upfronthosting.co.za>: ---------- pull_requests: +4425 _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue32099> _______________________________________
Raymond Hettinger <raymond.hettinger@gmail.com> added the comment: While we're on the topic, I had some thought of also adding a similar recipe to https://docs.python.org/3/library/collections.html#deque-recipes . The alternative recipe is slower is common cases but doesn't degrade when there are a huge number of iterables: def roundrobin(*iterables): "roundrobin('ABC', 'D', 'EF') --> A D E B F C" nexts = deque(iter(it).__next__ for it in iterables) while nexts: try: while True: yield nexts[0]() nexts.rotate(-1) except StopIteration: nexts.popleft() ---------- _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue32099> _______________________________________
Serhiy Storchaka <storchaka+cpython@gmail.com> added the comment: Today I have published a similar recipe on Python-Ideas. It uses popleft/append instead of __getitem__/rotate. def roundrobin(*iterables): "roundrobin('ABC', 'D', 'EF') --> A D E B F C" nexts = deque(iter(it).__next__ for it in iterables) popleft = nexts.popleft append = nexts.append while nexts: next = popleft() try: yield next() except StopIteration: pass else: append(next) It is faster (10-25%) in all microbenchmarks that I did (Steven's benchmarks for small number of iterables and my examples for large number of iterables). ---------- _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue32099> _______________________________________
Raymond Hettinger <raymond.hettinger@gmail.com> added the comment: Really, I don't give a whit about the micro-benchmarks -- that completely misses the point. The recipes are primarily intended to have educational value on ways to use itertools, deques, and whatnot. For itertools, I'm satisfied with new variable name and the additional comment. For collections, there is an open question about whether adding an extra example would make users better off. Beyond that, I have little interest is exploring all the little variants or wasting further time on this (that is what ASPN, StackOverflow, and blog posts are for). ---------- _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue32099> _______________________________________
Serhiy Storchaka <storchaka+cpython@gmail.com> added the comment: But why use less efficient code? Is my code worse? ---------- _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue32099> _______________________________________
Change by Raymond Hettinger <raymond.hettinger@gmail.com>: ---------- pull_requests: +4434 _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue32099> _______________________________________
Raymond Hettinger <raymond.hettinger@gmail.com> added the comment: New changeset 0858495a50e19defde786a4ec050ec182e920f46 by Raymond Hettinger in branch 'master': bpo-32099 Add deque variant of roundrobin() recipe (#4497) https://github.com/python/cpython/commit/0858495a50e19defde786a4ec050ec182e9... ---------- _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue32099> _______________________________________
Change by Raymond Hettinger <raymond.hettinger@gmail.com>: ---------- resolution: -> fixed stage: patch review -> resolved status: open -> closed _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue32099> _______________________________________
participants (6)
-
Dubslow
-
Raymond Hettinger
-
Roundup Robot
-
Serhiy Storchaka
-
Terry J. Reedy
-
Tim Peters