[Python-ideas] Rewriting the "roundrobin" recipe in the itertools documentation

Thu Nov 16 14:56:29 EST 2017

On 11/16/2017 8:56 AM, bunslow wrote:
> For taking values alternately from a series of iterables, there's two 
> primary functions:
> 
> builtin.zip
> itertools.zip_longest

These bunch together the nth items of each iterable, while 
itertools.cycle does not.

...
> def roundrobin(*iterables):
>      "roundrobin('ABC', 'D', 'EF') --> A D E B F C"
>      pending = len(iterables)
>      nexts = cycle(iter(it).__next__ for it in iterables)
>      while pending:
>          try:
>              for next in nexts:
>                  yield next()
>          except StopIteration:
>              pending -= 1
>              nexts = cycle(islice(nexts, pending))

> Things that strike me as unpythonic:
> 1) requiring the total number of input iterable, > 2) making gratuitous use of `next`,

I disagree that 1 and 2 are problems.

> 3) using a while loop in code dealing with iterables,

I agree that this is not necessary, and give a replacement below.

> 4) combining loops, exceptions, and composed itertools functions in
> non-obvious ways that make control flow difficult to determine

I agree that the correctness of the last statement is slightly opaque. 
But this nicely demonstrates a non-trivial use of cycle.

> Now, I get it, looking at the "roughly equivalent to" code for 
> zip_longest in the docs, there doesn't seem to be much way around it for 
> generally similar goals, and as I said above, unpythonic is fine when 
> necessary (practicality beats purity), but in this case, for being a 
> "recipe" in the itertools docs, it should *make use* of the zip_longest 
> which already does all the unpythonic stuff for you (though honestly I'm 
> not convinced either that the zip_longest code in the docs is the most 
> possible pythonic-ness). Instead, the following recipe (which I also 
> submitted to the StackOverflow question, and which is generally similar 
> to several other later answers, all remarking that they believe it's 
> more pythonic) is much cleaner and more suited to demonstrating the 
> power of itertools to new developers than the mess of a "recipe" pasted 
> above.
> 
> def roundrobin(*iters):
>      "roundrobin('ABC', 'D', 'EF') --> A D E B F C"
>      # Perhaps "flat_zip_nofill" is a better name, or something similar
>      sentinel = object()
>      for tup in it.zip_longest(*iters, fillvalue=sentinel):
>          yield from (x for x in tup if x is not sentinel)

This adds and then deletes grouping and fill values that are not wanted. 
  To me, this is an 'algorithm smell'.  One of the principles of 
algorithm design is to avoid unnecessary calculations.  For an edge case 
such as roundrobin(1000000*'a', ''), the above mostly does unnecessary work.

The following combines 3 statements into one for statement.

def roundrobin(*iterables):
     "roundrobin('ABC', 'D', 'EF') --> A D E B F C"
     nexts = cycle(iter(it).__next__ for it in iterables)
     for reduced_len in reversed(range(1, len(iterables))):
         try:
             for next in nexts:
                 yield next()
         except StopIteration:
             nexts = cycle(islice(nexts, reduced_len))

> In particular, this is just an extremely thin wrapper around 
> zip_longest, whose primary purpose is to eliminate the 
> otherwise-mandatory "fillvalues" that zip_longest requires to produce 
> uniform-length tuples.

But we do not want tuples or fill values.

> It's also an excellent example of how to make 
> best pythonic use of iterables in general, and itertools in particular, 
> and as such a much better implementation to be demonstrated in 
> documentation.

I disagree.  [I have mostly stopped using 'pythonic' because there is 
too much disagreement on particulars, and its use seems to inhibit as 
much as facilitate insight.]

[snip more]

> I realize at the end of the day this is a pretty trivial and ultimately 
> meaningless nit to pick, but I've never contributed before and have a 
> variety of similar minor pain points in the docs/stdlib, and I'm trying 
> to gauge 1) how well this sort of minor QoL improvement is wanted,

We constantly improve the docs.

> 2) even if it is wanted, am I going about it the right way.

Typos and trivial grammar issues can be filed as a PR with no issue 
required.  Clarifications usually require an issue and perhaps 
discussion.  Since this is more about philosophy of algorithm design, 
python-ideas was a good place to start.

> If the 
> answers to both of these questions are positive regarding this 
> particular case, then I'll look into making a BPO issue and pull request 
> on GitHub, which IIUC is the standard path for contributions.

Since I have a competing 'improvement', I would hold off on a PR until 
Raymond Hettinger, the itertools author, comments.

-- 
Terry Jan Reedy