[Python-ideas] Adding a new function "zip_flat" to itertools (Re: Rewriting the "roundrobin" recipe in the itertools documentation)
Steven D'Aprano
steve at pearwood.info
Mon Nov 20 11:08:02 EST 2017
Hi Bill,
I don't have time to go through your email in detail and respond to
every point you raise, but I'd like to respond to one point you made.
On Mon, Nov 20, 2017 at 02:03:13AM -0600, bunslow wrote:
> I also found this answer:
>
> https://stackoverflow.com/questions/243865/how-do-i-merge-two-python-iterators/40498526#40498526
>
> which proposes a solution that is both more correct and efficient than the
> zip_longest-with-sentinels, and also noticeably more readable than either
> the original doc recipe or even Terry's cleaned up replacement of it.
Please don't make claims about correctness and efficiency without
testing the code first. The second suggestion given there, using deque,
is *not* correct as provided, as it fails to work with iterables. It
requires the caller to pass only iterators, unlike the existing
roundrobin recipe which accepts any iterable.
Nor is it more efficient, at least on my machine -- in fact the
opposite, it is the worst performing of the four recipes I've tried:
- the current recipe from the itertools docs;
- your re-write, using zip_longest;
- Terry's version;
- and the one from stackoverflow.
I've attached my test code, in case you want to play around with it.
Apologies in advance for any bugs in the test code (its 2 in the
morning here and I've had a long day).
According to my testing, on my computer using Python 3.5, Terry's code
is by far the fastest in all three separate test cases, but that
probably shouldn't count since it's buggy (it truncates the results and
bails out early under some circumstances). Out of the implementations
that don't truncate, the existing recipe is by far the fastest.
Terry, if you're reading this, try:
list(roundrobin('A', 'B', 'CDE'))
Your version truncates the results to A B C instead of A B C D E as the
itertools recipe gives.
But I digress... the point is, if you're going to make claims about the
correctness and efficiency of code, you really ought to actually check
the correctness and efficiency first.
I may or may not get around to responding to some of your other points
over the next few days.
--
Steve
-------------- next part --------------
import itertools
from itertools import *
from timeit import Timer
# From the itertools recipe
def roundrobin(*iterables):
"roundrobin('ABC', 'D', 'EF') --> A D E B F C"
pending = len(iterables)
nexts = cycle(iter(it).__next__ for it in iterables)
while pending:
try:
for next in nexts:
yield next()
except StopIteration:
pending -= 1
nexts = cycle(islice(nexts, pending))
def bunslow(*iters):
"roundrobin('ABC', 'D', 'EF') --> A D E B F C"
sentinel = object()
for tup in itertools.zip_longest(*iters, fillvalue=sentinel):
yield from (x for x in tup if x is not sentinel)
def terry(*iterables):
"roundrobin('ABC', 'D', 'EF') --> A D E B F C"
nexts = cycle(iter(it).__next__ for it in iterables)
for reduced_len in reversed(range(1, len(iterables))):
try:
for next in nexts:
yield next()
except StopIteration:
nexts = cycle(islice(nexts, reduced_len))
from collections import deque
def stackoverflow(*iterators):
queue = deque(iter(it) for it in iterators)
while len(queue) > 0:
iterator = queue.popleft()
try:
yield next(iterator)
queue.append(iterator)
except StopIteration:
pass
FUNCTIONS = (roundrobin, bunslow, terry, stackoverflow)
equal_data = [range(500)]*4
unequal_data = [range(500), range(600), range(700), range(400)]
extreme_data = [range(1), range(1), range(1), range(500)]
def trial(data_name, number, repeat):
print("testing with %s" % data_name)
template = 'from __main__ import %s as rr, %s as data'
data = globals()[data_name]
assert len(data) == 4 and all(type(x) is range for x in data)
expected = list(roundrobin(*data))
failed = []
for func in FUNCTIONS:
actual = list(func(*data))
if actual != expected:
failed.append(func)
setup = template % (func.__name__, data_name)
T = Timer('list(rr(*data))', setup=setup)
t = min(T.repeat(number=number, repeat=repeat))
print(' %-15s %f' % (func.__name__, t))
for f in failed:
print(' *** %s returned incorrect results ***' % f.__name__)
NUMBER = 30
REPEAT = 5
trial('equal_data', NUMBER, REPEAT)
print("*"*50)
trial('unequal_data', NUMBER, REPEAT)
print("*"*50)
trial('extreme_data', NUMBER, REPEAT)
More information about the Python-ideas
mailing list