[ python-Bugs-1121416 ] zip incorrectly and incompletely documented

Thu Feb 17 02:10:54 CET 2005

Bugs item #1121416, was opened at 2005-02-12 12:18
Message generated for change (Comment added) made by rhettinger
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1121416&group_id=5470

Category: Documentation
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Alan (aisaac0)
Assigned to: Raymond Hettinger (rhettinger)
Summary: zip incorrectly and incompletely documented

Initial Comment:
See the zip documentation:
http://www.python.org/dev/doc/devel/lib/built-in-funcs.html

i. documentation refers to sequences not to iterables

ii. The other problem is easier to explain by example.
Let it=iter([1,2,3,4]).
What is the result of zip(*[it]*2)?
The current answer is: [(1,2),(3,4)],
but it is impossible to determine this from the docs,
which would allow [(1,3),(2,4)] instead (or indeed
other possibilities).

The example expresses the solution to an actual need,
so the behavior should be documented or warned against,
I believe.

----------------------------------------------------------------------

>Comment By: Raymond Hettinger (rhettinger)
Date: 2005-02-16 20:10

Message:
Logged In: YES 
user_id=80475

The first sentence becomes even less clear with the "in the
same order" wording.  

The note about truncating to the shortest sequence length is
essential and should not have been dropped.  

The py2.4 change note is in a standard form
(\versionchanged{} following the explanation of current
behavior) and should not have been altered.

The part that addresses the OP's concern is too specific to
the his one example and is unclear unless you know about
that example.  The wording is discomforting, doesn't add new
information, and is somewhat not obvious in its meaning.

I suggest simply changing "sequence" to "iterable".

There is no sense in stating that the order of combination
is undefined.  It doesn't help with the OP's original desire
to be able to predict the outcome of the example.  However,
it does have the negative effect of making a person question
whether they've understood the preceding description of what
actually zip() does do.

zip() is about lockstep iteration and the docs should serve
those users as straight-forwardly as possible.  The OP's
issue on the other hand only comes up when trying funky
iterator magic -- adding a sentence about undefined ordering
doesn't help one bit.

There is a lesson in all this.  These tools were borrowed
from the world of functional programming which is all about
programming that is free of side-effects.  The OP's problem
should be left as a code smell indicating a misuse of
functionals.

----------------------------------------------------------------------

Comment By: Terry J. Reedy (tjreedy)
Date: 2005-02-16 19:03

Message:
Logged In: YES 
user_id=593130

I agree that the zip doc needs improvement.  Confusion will 
continue until it is.  Here is my suggested rewrite:
-------------------------------------------------------------------
zip([iterable1, ...]) 

Return a list of tuples, where the i-th tuple contains the i-th 
element from each input in the same order as the inputs.  
With no arguments, return an empty list (before 2.4, a 
TypeError was raised instead.)  With a single input, return a 
list of 1-tuples.  With multiple inputs, the output length is 
that of the shorted input.  When multiple input lengths are 
equal, zip(i1, ...) is similar to map(None, i1, ...), but there is 
no padding when otherwise.  The result of zipping a volatile 
iterable with itself is undefined.  New in 2.0. 
-------------------------------------------------------------------

There you have it.  More information is about 15% fewer 
words.  The reduction came from greatly condensing the  
overwordy sentence about obsolete behavior into a 
parenthetical comment.  For comparison, here is the current 
version.
-------------------------------------------------------------------
zip( [seq1, ...]) 

This function returns a list of tuples, where the i-th tuple 
contains the i-th element from each of the argument 
sequences. The returned list is truncated in length to the 
length of the shortest argument sequence. When there are 
multiple argument sequences which are all of the same 
length, zip() is similar to map() with an initial argument of 
None. With a single sequence argument, it returns a list of 1-
tuples. With no arguments, it returns an empty list. New in 
version 2.0. 
Changed in version 2.4: Formerly, zip() required at least one 
argument and zip() raised a TypeError instead of returning 
an empty list.. 

----------------------------------------------------------------------

Comment By: Nick Coghlan (ncoghlan)
Date: 2005-02-12 21:25

Message:
Logged In: YES 
user_id=1038590

The generator in the previous comment was incorrect (tuple
swallows the StopIteration, so it never terminates). Try
this instead:

def partition(iterable, part_len):
    itr = iter(iterable)
    while 1:
        item = tuple(islice(itr, part_len))
        if len(item) < part_len:
            raise StopIteration
        yield item 

----------------------------------------------------------------------

Comment By: Nick Coghlan (ncoghlan)
Date: 2005-02-12 20:30

Message:
Logged In: YES 
user_id=1038590

Raymond's point about opaqueness is well-taken, since the
given partitioning behaviour in the example was actually
what was intended (I was part of the relevant c.l.p discussion).

For future reference, the reliable approach is to use a
generator function instead:

from itertools import islice
def partition(iterable, part_len):
    itr = iter(iterable)
    while 1:
        yield tuple(islice(itr, part_len))

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2005-02-12 14:50

Message:
Logged In: YES 
user_id=80475

The problem with your example does not lie with zip(). 
Instead, there is a misunderstanding of iter() and how
iterators are consumed.  Instead of iter(), the correct
function is itertools.tee():
>>> zip(*tee([1,2,3,4]))
[(1, 1), (2, 2), (3, 3), (4, 4)]

Also, stylistically, the zip(*func) approach is too opaque.
 It is almost always better (at least for other readers and
possibly for yourself) to write something more obvious in
its intent and operation.  List comprehensions and generator
expressions are often more clear and easier to write correctly:
>>> [(x,x) for x in [1,2,3,4]]
[(1, 1), (2, 2), (3, 3), (4, 4)]

I do agree that the word sequence should be dropped because
it implies that non-sequence iterables are not acceptable as
arguments.  That's too bad because the word "sequence" seems
to help people understand what zip is doing.

You're correct that the zip docs do not describe its
implementation in such detail as to be able to predict the
[(1,2),(3,4)] result.  However, that would be an
over-specification.  That particular result is an
implementation specific detail that is subject to change. 
It probably won't change, but we don't want to encourage
people to write code that relies on the specific order of
operations within zip().  If someone wants to do something
tricky, such as [(1,2),(3,4)], then they are better off
writing an explicit loop so that the order of operation is
clear both to themselves and to code reviewers.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1121416&group_id=5470