Mailman 3 Adding a new function "zip_flat" to itertools (Re: Rewriting the "roundrobin" recipe in the itertools documentation) - Python-ideas

Nov. 20, 2017

      Dear all: thank you for your replies and thoughts, most especially Steve
and Terry. I am more-or-less new to contributing to Python, so I wasn't
sure that the bug tracker was the best way to start -- I was looking for a
sanity check and received exactly what I wanted :) Thanks to the feedback
here, the bug tracker issue will be much cleaner from the start.

Regarding the meta discussion about my OP: yes it was long winded,
detailed, pedantic and (to a certain extent) bombastic, but I was imaging
the many possible responses to the suggestion (and suggested replacement)
the I felt I should explain where I was coming from. Even though there was
a lot of disagreement about how I described the current recipe as not very
Pythonic, I'm very glad for all the perspectives and lessons I got from
reading the ensuing discussions.

Now, having had some time to think, I've come up with further thoughts on
the topic. In particular, I was going to create a new bug, and wrote
several paragraphs on the topic summarizing this thread, and my thoughts.
I'll paste those paragraphs here:

"""
To summarize, I found the current implementation of the "roundrobin"
function difficult to understand, and proposed a much simpler solution
that, while producing correct results, isn't quite "correct" in the sense
that it glosses over a detail before "manually" correcting the detail at
the end, and as such is prone to severe inefficiency in extreme cases.
There were a smattering of comments from several people indicating that
they liked the simpler recipe better, despite its performance drawbacks.

Terry Reedy proposed a slightly rewritten version of the current recipe,
which implements the correct algorithm (without glossing over and manually
correcting the details). Although I have since changed my perceptions of
the original, now understanding how it works, the rewritten version from
Terry was clearer enough that I was able to understand *it* where I could
not previously understand the original. (My newfound understanding of the
original is largely derived from making sense of the rewritten version,
which properly clued me in to what cycle and islice actually do.

Either way, the current recipe can certainly be improved. Although I now
find the original and its rewrite to be algorithmically clean and correct
(even if the code and inline comments can be improved), the StackOverflow
question (
https://stackoverflow.com/questions/3678869/pythonic-way-to-combine-two-list...)
that originally got me thinking about the problem  demonstrates that the
algorithmically clean way is *not* obvious at all to people who aren't very
familiar with the itertools module -- which is the large majority of people
who use Python (even if that's a very small fraction of the people reading
this bug).  The second from top answer is the answer which references the
recipe in the docs, but as my own first post to python-ideas demonstrates,
the (large?) majority of python users aren't familiar enough with the
itertools module to be able to understand that recipe, and I also believe
many people were looking for one or two liners to use in their own code
without a function call. Further confusion on the overall topic is the lack
of a clear name -- "roundrobin", "alternate", "interleave", "merge", and
variations and others.
"""

Having completed those, I found a roughly duplicate StackOverflow question
to the one from my OP:
https://stackoverflow.com/questions/243865/how-do-i-merge-two-python-iterato...

Besides emphasizing my points about not having even a clear name for the
topic, a desire for one liners, mass confusion around the issue (especially
regarding flattening zip [which terminates on the shortest input, a hidden
gotcha], zip_longest [someone else found the same solution as me and others
in this op), and all around failure to generate anything even resembling a
consensus on the topic, I also found this answer:

https://stackoverflow.com/questions/243865/how-do-i-merge-two-python-iterato...

which proposes a solution that is both more correct and efficient than the
zip_longest-with-sentinels, and also noticeably more readable than either
the original doc recipe or even Terry's cleaned up replacement of it.

Given this variety of problems with the issue, I now think that -- while
updating the itertools recipe is certainly better than nothing -- the
better thing to do might be to just add a new function to itertools called
"zip_flat" which solves this problem. In addition to answering the stack
overflow questions with ongoing debate about efficiency, correctness, and
pythonicity (pythonicness?), it would also help to greatly clarify the
naming issue as well. (Sidenote: whoever came up with "zip" as the name for
the builtin was quite creative. It's a remarkably short and descriptive.)

What are the sentiments of readers here? If positive, I'll create an issue
on BPO about zip_flat (rather than just improving the docs recipe).

(Sorry Steve for bringing this back to -ideas! At least this time I'm
proposing an addition to the language itself! :)

Thanks for your consideration,
Bill

On Thu, Nov 16, 2017 at 5:06 PM, Steven D'Aprano <steve@pearwood.info>
wrote:
...
On Thu, Nov 16, 2017 at 02:56:29PM -0500, Terry Reedy wrote:
...
...
3) using a while loop in code dealing with iterables,
I agree that this is not necessary, and give a replacement below.
The OP isn't just saying that its unnecessary in this case, but that its
unPythonic to ever use a while loop in code dealing with iterables. I
disagree with that stronger statement.
...
...
def roundrobin(*iters):
    "roundrobin('ABC', 'D', 'EF') --> A D E B F C"
    # Perhaps "flat_zip_nofill" is a better name, or something similar
    sentinel = object()
    for tup in it.zip_longest(*iters, fillvalue=sentinel):
        yield from (x for x in tup if x is not sentinel)
This adds and then deletes grouping and fill values that are not wanted.
 To me, this is an 'algorithm smell'.  One of the principles of
algorithm design is to avoid unnecessary calculations.  For an edge case
such as roundrobin(1000000*'a', ''), the above mostly does unnecessary
work.
Its a recipe, not a tuned and optimized piece of production code.
And if you're going to criticise code on the basis of efficiency, then I
would hope you've actually profiled the code first. Because it isn't
clear to me at all that what you call "unnecessary work" is more
expensive than re-writing the recipe using a more complex algorithm with
calls to cycle and islice.
But I'm not here to nit-pick your recipe over the earlier ones.
[...]
...
Since I have a competing 'improvement', I would hold off on a PR until
Raymond Hettinger, the itertools author, comments.
Raise a doc issue on the tracker, and take the discussion there. I think
that this is too minor an issue to need long argument on the list.
Besides, it's not really on-topic as such -- it isn't about a change to
the language. Its about an implementation change to a recipe in the
docs.
--
Steve
_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Adding a new function "zip_flat" to itertools (Re: Rewriting the "roundrobin" recipe in the itertools documentation)

bunslow

Steven D'Aprano

Steven D'Aprano

Terry Reedy

David Mertz

Steven D'Aprano

Terry Reedy

Serhiy Storchaka

bunslow

Steven D'Aprano

Steven D'Aprano

Terry Reedy

David Mertz

Steven D'Aprano

Terry Reedy

Serhiy Storchaka

bunslow

tags

participants (5)