itertools.izip brokeness
Tom Anderson
twic at urchin.earth.li
Tue Jan 3 10:18:03 EST 2006
On Tue, 3 Jan 2006, it was written:
> rurpy at yahoo.com writes:
>
>> The problem is that sometimes, depending on which file is the shorter,
>> a line ends up missing, appearing neither in the izip() output, or in
>> the subsequent direct file iteration. I would guess that it was in
>> izip's buffer when izip terminates due to the exception on the other
>> file.
>
> A different possible long term fix: change StopIteration so that it
> takes an optional arg that the program can use to figure out what
> happened. Then change izip so that when one of its iterator args runs
> out, it wraps up the remaining ones in a new tuple and passes that
> to the StopIteration it raises.
+1
I think you also want to send back the items you read out of the iterators
which are still alive, which otherwise would be lost. Here's a somewhat
minimalist (but tested!) implementation:
def izip(*iters):
while True:
z = []
try:
for i in iters:
z.append(i.next())
yield tuple(z)
except StopIteration:
raise StopIteration, z
The argument you get back with the exception is z, the list of items read
before the first empty iterator was encountered; if you still have your
array iters hanging about, you can find the iterator which stopped with
iters[len(z)], the ones which are still going with iters[:len(z)], and the
ones which are in an uncertain state, since they were never tried, with
iters[(len(z) + 1):]. This code could easily be extended to return more
information explicitly, of course, but simple, sparse, etc.
> You would want some kind of extended for-loop syntax (maybe involving
> the new "with" statement) with a clean way to capture the exception
> info.
How about for ... except?
for z in izip(a, b):
lovingly_fondle(z)
except StopIteration, leftovers:
angrily_discard(leftovers)
This has the advantage of not giving entirely new meaning to an existing
keyword. It does, however, afford the somewhat dubious use:
for z in izip(a, b):
lovingly_fondle(z)
except ValueError, leftovers:
pass # execution should almost certainly never get here
Perhaps that form should be taken as meaning:
try:
for z in izip(a, b):
lovingly_fondle(z)
except ValueError, leftovers:
pass # execution could well get here if the fondling goes wrong
Although i think it would be more strictly correct if, more generally, it
made:
for LOOP_VARIABLE in ITERATOR:
SUITE
except EXCEPTION:
HANDLER
Work like:
try:
while True:
try:
LOOP_VARIABLE = ITERATOR.next()
except EXCEPTION:
raise __StopIteration__, sys.exc_info()
except StopIteration:
break
SUITE
except __StopIteration__, exc_info:
somehow_set_sys_exc_info(exc_info)
HANDLER
As it stands, throwing a StopIteration in the suite inside a for loop
doesn't terminate the loop - the exception escapes; by analogy, the
for-except construct shouldn't trap exceptions from the loop body, only
those raised by the iterator.
tom
--
Chance? Or sinister scientific conspiracy?
More information about the Python-list
mailing list