itertools.izip brokeness

rurpy at yahoo.com rurpy at yahoo.com
Wed Jan 4 17:28:13 EST 2006


"Raymond Hettinger" <python at rcn.com> wrote in message
news:1136362998.895356.66640 at g44g2000cwa.googlegroups.com...
> rurpy at yahoo.com wrote:
> > izip's uses can be partitioned two ways:
> > 1. All iterables have equal lengths
> > 2. Iterables have different lengths.
> >
> > Case 1 is no problem obviously.
> > In Case 2 there are two sub-cases:
> >
> > 2a. You don't care what values occur in the other iterators
> >   after then end of the shortest.
> > 2b. You do care.
> >
> > In my experience 1 and 2b are the cases I encounter the most.
> > Seldom do I need case 2a.  That is, when I can have iterators
> > of unequal length, usually I want to do *something* with the
> > extra items in the longer iterators.  Seldom do I want to just
> > ignore them.
>
> That is a reasonable use case that is not supported by zip() or izip()
> as currently implemented.

I haven't thought a lot about zip because I haven't needed to.
I would phrase this as "...not supported by the itertools module...".
If it makes sense to extend izip() to provide end-of-longest
iteration, fine.  If not that adding an izip_longest() to itertools
(and perhaps a coresponding imap and whatever else shares
the terminate-at-shortest behavior.)

> > The whole point of using izip is to make the code shorter,
> > more concise, and easier to write and understand.
>
> That should be the point of using anything in Python.  The specific
> goal for izip() was for an iterator version of zip().  Unfortunately,
> neither tool fits your problem.  At the root of it is the iterator
> protocol not having an unget() method for pushing back unused elements
> of the data stream.

I don't understand this.   Why do you need look ahead?  (I
mean that literally,  I am not disagreeing in a veiled way.)

This is my (mis?)understanding of how izip works:
- izip is a class
- when instantiated, it returns another iterator object, call it "x".
- the x object (being an iterator) has a next method that
  returns a list of the next values returned by all the iterators
  given when x was created.

So why can't izip's next method collect the results of
it's set of argument iterators, as I presume it does now,
except when one of them starts generating StopIteration
exceptions, an alternate value is placed in the result list.
When all the iterators start generating exceptions, izip
itself raises a StopIteration to signal that all the iterators
have reached exhaustion.  This is what the code I posted
in a message last night does.  Why is something like that
not acceptable?

All this talk of pushbacks and returning shorter lists of
unexhausted iterators makes me think I am misunderstanding
something.

> > This should be pointed out in the docs,
>
> I'll add a note to the docs.
>
> > However, it would be better if izip could be made useful
> > fot case 2b situations.  Or maybe, an izip2 (or something)
> > added.
>
> Feel free to submit a feature request to the SF tracker (surprisingly,
> this behavior has not been previously reported, nor have there any
> related feature requests, nor was the use case contemplated in the PEP
> discussions: http://www.python.org/peps/pep-0201 ).

Yes, this is interesting.  In the print multiple columns"
example I presented, I felt the use of izip() met the
"one obvious way" test.  The resulting code was simple
and clear.  The real-world case where I ran into the
problem was comparing two files until two different
lines were found.  Again, izip was the "one obvious
way".

So yes it is surprising and disturbing that these use
cases were not identified.  I wonder what other features
that "should" be in Python, were similarly missed?
And more importantly what needs to change, to fix 
the problem?




More information about the Python-list mailing list