[Python-ideas] zip() problem.

Michael Selik mike at selik.org
Fri Feb 12 18:50:23 EST 2016


On Fri, Feb 12, 2016 at 6:39 PM Erik <python at lucidity.plus.com> wrote:

> Hi.
>
> In writing my previous email, I noticed something about zip() that I'd
> not seen before (but is obvious, I guess) - when it reaches the shortest
> sequence and terminates, any iterators already processed in that pass
> will have generated one extra value than the others. Those additional
> values are discarded.
>
> For example:
>
> h = iter("Hello")
> w = iter("World")
> s = iter("Spam")
> e = iter("Eggs")
>
> for i in zip(h, w, s, e):
>    print(i)
>
> for i in (h, w, s, e):
>    print(list(i))
>
> ---> All iterators are exhausted.
>
> h = iter("Hello")
> w = iter("World")
> s = iter("Spam")
> e = iter("Eggs")
>
> for i in zip(h, s, e, w):
>    print(i)
>
> for i in (h, w, s, e):
>    print(list(i))
>
>
> ---> "w" still has the trailing 'd' character.
>
>
> So, if you're using zip() over itertools.zip_longest() then you have to
> be careful of the order of your arguments and try to put the
> probably-shortest one first if this would otherwise cause problems.
>
>
> The reason I'm posting to 'ideas' is: what should/could be done about it?
>
> 1) A simple warning in the docstring for zip()?
>

I wouldn't want to clutter the docstring, but a note in the long-form
documentation could be useful.


> 2) Something to prevent it (for example a keyword argument to zip() to
> switch on some behaviour where the iterators are first queried that they
> have more items to generate before the values start being consumed)?
>

How can you query whether an iterator has another value without consuming
that value?


> 3) Nothing. There are bigger things to worry about ;)
>
> WRT (2), I thought that perhaps __len__ was part of the iterator
> protocol, but it's not (just __iter__ and __next__), hence:
>
>  >>> len(range(5, 40))
> 35
>  >>> len(iter(range(5, 40)))
> Traceback (most recent call last):
>    File "<stdin>", line 1, in <module>
> TypeError: object of type 'range_iterator' has no len()
>  >>> len(iter("FooBar"))
> Traceback (most recent call last):
>    File "<stdin>", line 1, in <module>
> TypeError: object of type 'str_iterator' has no len()
>
> ... though would that also be something to consider (I guess all
> iterators would have to keep some state regarding the amount of values
> previously generated and then apply that offset to the result of len()
> on the underlying object)? Perhaps that would just be too heavyweight
> for what is a relatively minor wart.
>

How would you handle the length of an infinite iterator? Or one that
*might* be infinite, depending on current state of the program?

A more realistic example: if I'm looking up N records from a distributed
database, I might do that in parallel and get the results back unordered,
as an iterator. If M of the queries timeout, I might choose to ignore those
records and exclude them from the resulting iterator. So, when I kick off
the queries, the length of that iterator might be N. When the timeouts are
finished, the length is N-M. Further, if I've consumed 2 records, is the
length still N-M or N-M-2?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20160212/61861a6e/attachment.html>


More information about the Python-ideas mailing list