[Python-ideas] Rewriting the "roundrobin" recipe in the itertools documentation

Thu Nov 16 14:42:41 EST 2017

I think the idea behind the original recipe is that when one of the inner
lists has been iterated through, it is removed and never looked at again.
Imagine the following scenario:

L is a list which contains one million empty lists and also a list
containing one million numbers

Then the original recipe will iterate over two million(ish) items: each
inner list must get visited and each item from the long inner list must get
visited.  However, your use of zip_longest must visit one trillion items,
which will likely not finish in a reasonable amount of time.

I'm not saying that this is likely to be the case, but this is probably why
the original recipe is what it is.  It would be great to see a recipe that
is more pythonic but that maintains the efficiencies of the first recipy,
but I could not come up with one.

-Brent

On Thu, Nov 16, 2017 at 10:08 AM, David Mertz <mertz at gnosis.cx> wrote:

> I agree this is a much better recipe presented.
>
> Have you benchmarked the two on more realistically long iterators. E.g. a
> hundred iterators of millions of items where many terminate much earlier
> than others. I doubt the repeated 'is not' comparison makes much
> difference, but it would be good to see.
>
> On Nov 16, 2017 5:57 AM, "bunslow" <bunslow at gmail.com> wrote:
>
>> For taking values alternately from a series of iterables, there's two
>> primary functions:
>>
>> builtin.zip
>> itertools.zip_longest
>>
>> zip of course stops when the shortest iterable ends. zip_longest is
>> generally a useful substitute for when you don't want the zip behavior, but
>> it fills extra values in the blanks rather than just ignoring a finished
>> iterator and moving on with the rest.
>>
>> This latter most use case is at least somewhat common, according to
>> this[1] StackOverflow question (and other duplicates), in addition to the
>> existence of the `roundrobin` recipe[2] in the itertools docs. The recipe
>> satisfies this use case, and its code is repeated in the StackOverflow
>> answer.
>>
>> However, it is remarkably unpythonic, in my opinion, which is one thing
>> when such is necessary to achieve a goal, but for this functionality, such
>> is most definitely *not* necessary.  I'll paste the code here for quick
>> reference:
>>
>>
>> def roundrobin(*iterables):
>>     "roundrobin('ABC', 'D', 'EF') --> A D E B F C"
>>     pending = len(iterables)
>>     nexts = cycle(iter(it).__next__ for it in iterables)
>>     while pending:
>>         try:
>>             for next in nexts:
>>                 yield next()
>>         except StopIteration:
>>             pending -= 1
>>             nexts = cycle(islice(nexts, pending))
>>
>>
>> Things that strike me as unpythonic: 1) requiring the total number of
>> input iterables 2) making gratuitous use of `next`, 3) using a while loop
>> in code dealing with iterables, 4) combining loops, exceptions, and
>> composed itertools functions in non-obvious ways that make control flow
>> difficult to determine
>>
>> Now, I get it, looking at the "roughly equivalent to" code for
>> zip_longest in the docs, there doesn't seem to be much way around it for
>> generally similar goals, and as I said above, unpythonic is fine when
>> necessary (practicality beats purity), but in this case, for being a
>> "recipe" in the itertools docs, it should *make use* of the zip_longest
>> which already does all the unpythonic stuff for you (though honestly I'm
>> not convinced either that the zip_longest code in the docs is the most
>> possible pythonic-ness). Instead, the following recipe (which I also
>> submitted to the StackOverflow question, and which is generally similar to
>> several other later answers, all remarking that they believe it's more
>> pythonic) is much cleaner and more suited to demonstrating the power of
>> itertools to new developers than the mess of a "recipe" pasted above.
>>
>>
>> def roundrobin(*iters):
>>     "roundrobin('ABC', 'D', 'EF') --> A D E B F C"
>>     # Perhaps "flat_zip_nofill" is a better name, or something similar
>>     sentinel = object()
>>     for tup in it.zip_longest(*iters, fillvalue=sentinel):
>>         yield from (x for x in tup if x is not sentinel)
>>
>>
>> In particular, this is just an extremely thin wrapper around zip_longest,
>> whose primary purpose is to eliminate the otherwise-mandatory "fillvalues"
>> that zip_longest requires to produce uniform-length tuples. It's also an
>> excellent example of how to make best pythonic use of iterables in general,
>> and itertools in particular, and as such a much better implementation to be
>> demonstrated in documentation.
>>
>> I would thus advocate that the former recipe is replaced with the latter
>> recipe, being much more pythonic, understandable, and useful for helping
>> new developers acquire the style of python. (Using the common linguistics
>> analogy: a dictionary and grammar for a spoken language may be enough to
>> communicate, but we rely on a large body of literature -- fiction,
>> research, poetry, etc -- as children to get that special flavor and most
>> expressive taste to the language. The stdlib is no Shakespeare, but it and
>> its docs still form an important part of the formative literature of the
>> Python language.)
>>
>> I realize at the end of the day this is a pretty trivial and ultimately
>> meaningless nit to pick, but I've never contributed before and have a
>> variety of similar minor pain points in the docs/stdlib, and I'm trying to
>> gauge 1) how well this sort of minor QoL improvement is wanted, and 2) even
>> if it is wanted, am I going about it the right way. If the answers to both
>> of these questions are positive regarding this particular case, then I'll
>> look into making a BPO issue and pull request on GitHub, which IIUC is the
>> standard path for contributions.
>>
>> Thank you for your consideration.
>>
>> ~~~~
>>
>> [1]: https://stackoverflow.com/questions/3678869/pythonic-wa
>> y-to-combine-two-lists-in-an-alternating-fashion/
>>
>> [2]: https://docs.python.org/3/library/itertools.html#itertools-recipes
>>
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20171116/64346228/attachment.html>