[Python-ideas] Adding a new function "zip_flat" to itertools (Re: Rewriting the "roundrobin" recipe in the itertools documentation)

David Mertz mertz at gnosis.cx
Mon Nov 20 13:04:31 EST 2017


After correcting Terry's off-by-one error, I get this on Python 3.6 (and
bumping NUMBER to 1000).  In speed, Terry's is either very slightly faster
or very slightly slower than the recipe, depending on the data.  I think
Terry's is more readable that roundrobin(), but still requires a little
thought compared to Bunslow's.  However, using the builtin name 'next' as a
variable name seems like a bad choice to me (it doesn't break in this code,
but seems like bad practice, i'd choose some other variable name).

507-code % python rr_tester.py
testing with equal_data
  roundrobin      0.187816
  bunslow         0.393162
  terry           0.183851
  stackoverflow   0.675543
**************************************************
testing with unequal_data
  roundrobin      0.209959
  bunslow         0.555731
  terry           0.233015
  stackoverflow   0.746880
**************************************************
testing with extreme_data
  roundrobin      0.053149
  bunslow         0.273607
  terry           0.051963
  stackoverflow   0.158515

508-code % python --version
Python 3.6.2 :: Anaconda custom (x86_64)


On Mon, Nov 20, 2017 at 9:15 AM, Terry Reedy <tjreedy at udel.edu> wrote:

> On 11/20/2017 11:08 AM, Steven D'Aprano wrote:
>
> Please don't make claims about correctness and efficiency without
>> testing the code first. The second suggestion given there, using deque,
>> is *not* correct as provided, as it fails to work with iterables. It
>> requires the caller to pass only iterators, unlike the existing
>> roundrobin recipe which accepts any iterable.
>>
>> Nor is it more efficient, at least on my machine -- in fact the
>> opposite, it is the worst performing of the four recipes I've tried:
>>
>> - the current recipe from the itertools docs;
>> - your re-write, using zip_longest;
>> - Terry's version;
>> - and the one from stackoverflow.
>>
>> I've attached my test code, in case you want to play around with it.
>> Apologies in advance for any bugs in the test code (its 2 in the
>> morning here and I've had a long day).
>>
>> According to my testing, on my computer using Python 3.5, Terry's code
>> is by far the fastest in all three separate test cases, but that
>> probably shouldn't count since it's buggy (it truncates the results and
>> bails out early under some circumstances). Out of the implementations
>> that don't truncate, the existing recipe is by far the fastest.
>>
>> Terry, if you're reading this, try:
>>
>> list(roundrobin('A', 'B', 'CDE'))
>>
>
> Your version truncates the results to A B C instead of A B C D E as the
>> itertools recipe gives.
>>
>
> This is due to an off-by-1 error which I corrected 3 hours later in a
> follow-up post, repeated below.
> ---
>
> Correct off-by-one error.  I should have tested with an edge case such as
> print(list(roundrobin('ABC', '')))
>
> > The following combines 3 statements into one for statement.
> >
> > def roundrobin(*iterables):
> >      "roundrobin('ABC', 'D', 'EF') --> A D E B F C"
> >      nexts = cycle(iter(it).__next__ for it in iterables)
> >      for reduced_len in reversed(range(1, len(iterables))):
>
> Make that 0 rather than 1 for start value.
>
> >          try:
> >              for next in nexts:
> >                  yield next()
> >          except StopIteration:
> >              nexts = cycle(islice(nexts, reduced_len))
>
> A slightly clearer, slightly less efficient alternative would be
>
> def roundrobin(*iterables):
>     "roundrobin('ABC', 'D', 'EF') --> A D E B F C"
>     nexts = cycle(iter(it).__next__ for it in iterables)
>     for current_len in reversed(range(1, len(iterables)+1)):
>         try:
>             for next in nexts:
>                 yield next()
>         except StopIteration:
>             nexts = cycle(islice(nexts, current_len - 1))
>
> --
> Terry Jan Reedy
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>



-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20171120/755fa2a0/attachment.html>


More information about the Python-ideas mailing list