[Python-Dev] zip() in Py2.4

Guido van Rossum guido@python.net
Wed, 30 Jul 2003 13:26:39 -0400

> > What are typical use situations for zip(...) these days?
> The main use is still lockstep iteration in for-loops.
> If someone mistakenly omits a zip argument here, the
> mistake will still surface immediately with a [] return value:
>       for timestamp, user, event in zip():
>            pass # never gets here because the tuple won't unpack

Hm...  I worry about this case specifically, because it's easily
imaginable that the intended list args were empty, and the code
behaves the same way in that case.  So if the code is written
to cope with empty input, only careful testing will reveal the bug.

> Another use is for joining together keys/indices and values
> and then undoing the join (like a Schwarzian transform):
> >>> k = ('pear', 'apple', 'banana', 'coconut')
> >>> i = range(len(k))
> >>> temp = zip(k, i)
> >>> temp
> [('pear', 0), ('apple', 1), ('banana', 2), ('coconut', 3)]
> >>> temp.sort()
> >>> fruits, indices = zip(*temp)
> >>> fruits
> ('apple', 'banana', 'coconut', 'pear')
> >>> indices
> (1, 2, 3, 0)

That looks cute, but sounds inefficient when the list is 1000s of
items long -- the zip(*temp) abuses the argument passing machinery.

> Another use is a fast row/column switch in
> rectangular tables organized as lists of tuples.

But somehow that doesn't strike me as a very likely table
organization.  It can be a list of lists, too, but the transform
changes it to a list of tuples.

> > Can you elaborate the case for zip() without arguments?
> Yes, it comes up whenever zip() is used with the * operator
> (as in the last two examples above).  When there are
> variable length argument lists, it is helpful to have a smooth
> transition to the null case -- for instance a zero matrix used with:
>     def transpose(mat):
>         return zip(*mat)

Fair enough.

> Transposing is a common first step in when reading CSV files
> so that the tuples (records) of non-homogenenous data can be 
> re-grouped into homogenous data from each column position (field):
>    dates, rainfall, hightemp, lowtemp = zip(*csv.reader(file("weather.csv")))
>    print 'Monthly total rainfall', sum(rainfall)
> Subsequent to the time the PEP was decided, I think the trend has 
> been for python to handle null cases the same as regular cases and
> not raise exceptions.  For instance, sum([]) returns 0 instead of 
> demanding at least one addend.

But max([]) raises an exception, and I think you'll agree that
it should.  It's not that I started recently paying attention
to empty lists (I always treated them as first-class citizens),
it's that I wasn't aware of zip(*xxx) as a common use case before.

> The star operator is also used in the unzip operation, described
> above as zip(*temp) in the Schwarzian transform example.
> Unzipping a reasonably common way of breaking out data
> aggregates. 
> IOW, zip(*something) is handy for re-organizing data and the
> empty dataset should need not special case handling.  This is 
> especially true when data is built-up from the null case:
> mytable = []
> gen_report(mytable)             # don't have this raise an exception
> mytable.append(newrecord())
> gen_report(mytable)
> mytable.append(newrecord())
> gen_report(mytable)
> mytable.append(newrecord())
> That about covers the subject.

OK.  While I still have some misgivings about the first example,
I think I see that zip(*[]) should return [].  It's OK for 2.4.