[Python-Dev] zip() in Py2.4

Raymond Hettinger python@rcn.com
Wed, 30 Jul 2003 13:00:11 -0400


> > After Guido gets settled in, I'll see if the California air has
> > changed his thinking.  If it hasn't, I'll at least get the
> > rationale documented.
> 
> I don't know about the CA air (it's nice and col here though)
> and I'm not quite settled yet (still living in a hotel room
> with my family at least until Friday) but the rationale
> is that I expect zip() to be called by mistake more likely
> than with an intention.  Now, the question is if that mistake
> will be hidden by returning [] or not.  IOW whether returning
> [] causes a broken program to limp along with bogus data,
> or whether it will cause a clear-enough error to happen next.
> 
> What are typical use situations for zip(...) these days?

The main use is still lockstep iteration in for-loops.
If someone mistakenly omits a zip argument here, the
mistake will still surface immediately with a [] return value:
 
      for timestamp, user, event in zip():
           pass # never gets here because the tuple won't unpack


Another use is for joining together keys/indices and values
and then undoing the join (like a Schwarzian transform):

>>> k = ('pear', 'apple', 'banana', 'coconut')
>>> i = range(len(k))
>>> temp = zip(k, i)
>>> temp
[('pear', 0), ('apple', 1), ('banana', 2), ('coconut', 3)]
>>> temp.sort()
>>> fruits, indices = zip(*temp)
>>> fruits
('apple', 'banana', 'coconut', 'pear')
>>> indices
(1, 2, 3, 0)

Another use is a fast row/column switch in
rectangular tables organized as lists of tuples.




> Can you elaborate the case for zip() without arguments?

Yes, it comes up whenever zip() is used with the * operator
(as in the last two examples above).  When there are
variable length argument lists, it is helpful to have a smooth
transition to the null case -- for instance a zero matrix used with:

    def transpose(mat):
        return zip(*mat)

Transposing is a common first step in when reading CSV files
so that the tuples (records) of non-homogenenous data can be 
re-grouped into homogenous data from each column position (field):

   dates, rainfall, hightemp, lowtemp = zip(*csv.reader(file("weather.csv")))
   print 'Monthly total rainfall', sum(rainfall)

Subsequent to the time the PEP was decided, I think the trend has 
been for python to handle null cases the same as regular cases and
not raise exceptions.  For instance, sum([]) returns 0 instead of 
demanding at least one addend.

The star operator is also used in the unzip operation, described
above as zip(*temp) in the Schwarzian transform example.
Unzipping a reasonably common way of breaking out data
aggregates. 

IOW, zip(*something) is handy for re-organizing data and the
empty dataset should need not special case handling.  This is 
especially true when data is built-up from the null case:

mytable = []
gen_report(mytable)             # don't have this raise an exception
mytable.append(newrecord())
gen_report(mytable)
mytable.append(newrecord())
gen_report(mytable)
mytable.append(newrecord())


That about covers the subject.


Raymond Hettinger