[Python-ideas] zip() problem.
Michael Selik
mike at selik.org
Fri Feb 12 18:51:34 EST 2016
BTW, from the documentation (
"zip() <https://docs.python.org/3/library/functions.html#zip> should only
be used with unequal length inputs when you don’t care about trailing,
unmatched values from the longer iterables. If those values are important,
On Fri, Feb 12, 2016 at 6:50 PM Michael Selik <mike at selik.org> wrote:
> On Fri, Feb 12, 2016 at 6:39 PM Erik <python at lucidity.plus.com> wrote:
>> Hi.
>> In writing my previous email, I noticed something about zip() that I'd
>> not seen before (but is obvious, I guess) - when it reaches the shortest
>> sequence and terminates, any iterators already processed in that pass
>> will have generated one extra value than the others. Those additional
>> values are discarded.
>> For example:
>> h = iter("Hello")
>> w = iter("World")
>> s = iter("Spam")
>> e = iter("Eggs")
>> for i in zip(h, w, s, e):
>> print(i)
>> for i in (h, w, s, e):
>> print(list(i))
>> ---> All iterators are exhausted.
>> h = iter("Hello")
>> w = iter("World")
>> s = iter("Spam")
>> e = iter("Eggs")
>> for i in zip(h, s, e, w):
>> print(i)
>> for i in (h, w, s, e):
>> print(list(i))
>> ---> "w" still has the trailing 'd' character.
>> So, if you're using zip() over itertools.zip_longest() then you have to
>> be careful of the order of your arguments and try to put the
>> probably-shortest one first if this would otherwise cause problems.
>> The reason I'm posting to 'ideas' is: what should/could be done about it?
>> 1) A simple warning in the docstring for zip()?
> I wouldn't want to clutter the docstring, but a note in the long-form
> documentation could be useful.
>> 2) Something to prevent it (for example a keyword argument to zip() to
>> switch on some behaviour where the iterators are first queried that they
>> have more items to generate before the values start being consumed)?
> How can you query whether an iterator has another value without consuming
> that value?
>> 3) Nothing. There are bigger things to worry about ;)
>> WRT (2), I thought that perhaps __len__ was part of the iterator
>> protocol, but it's not (just __iter__ and __next__), hence:
>> >>> len(range(5, 40))
>> 35
>> >>> len(iter(range(5, 40)))
>> Traceback (most recent call last):
>> File "<stdin>", line 1, in <module>
>> TypeError: object of type 'range_iterator' has no len()
>> >>> len(iter("FooBar"))
>> Traceback (most recent call last):
>> File "<stdin>", line 1, in <module>
>> TypeError: object of type 'str_iterator' has no len()
>> ... though would that also be something to consider (I guess all
>> iterators would have to keep some state regarding the amount of values
>> previously generated and then apply that offset to the result of len()
>> on the underlying object)? Perhaps that would just be too heavyweight
>> for what is a relatively minor wart.
> How would you handle the length of an infinite iterator? Or one that
> *might* be infinite, depending on current state of the program?
> A more realistic example: if I'm looking up N records from a distributed
> database, I might do that in parallel and get the results back unordered,
> as an iterator. If M of the queries timeout, I might choose to ignore those
> records and exclude them from the resulting iterator. So, when I kick off
> the queries, the length of that iterator might be N. When the timeouts are
> finished, the length is N-M. Further, if I've consumed 2 records, is the
> length still N-M or N-M-2?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20160212/ff696ee0/attachment.html>
More information about the Python-ideas
mailing list