[Python-Dev] zip() in Py2.4
Raymond Hettinger
python@rcn.com
Wed, 30 Jul 2003 13:00:11 -0400
> > After Guido gets settled in, I'll see if the California air has
> > changed his thinking. If it hasn't, I'll at least get the
> > rationale documented.
>
> I don't know about the CA air (it's nice and col here though)
> and I'm not quite settled yet (still living in a hotel room
> with my family at least until Friday) but the rationale
> is that I expect zip() to be called by mistake more likely
> than with an intention. Now, the question is if that mistake
> will be hidden by returning [] or not. IOW whether returning
> [] causes a broken program to limp along with bogus data,
> or whether it will cause a clear-enough error to happen next.
>
> What are typical use situations for zip(...) these days?
The main use is still lockstep iteration in for-loops.
If someone mistakenly omits a zip argument here, the
mistake will still surface immediately with a [] return value:
for timestamp, user, event in zip():
pass # never gets here because the tuple won't unpack
Another use is for joining together keys/indices and values
and then undoing the join (like a Schwarzian transform):
>>> k = ('pear', 'apple', 'banana', 'coconut')
>>> i = range(len(k))
>>> temp = zip(k, i)
>>> temp
[('pear', 0), ('apple', 1), ('banana', 2), ('coconut', 3)]
>>> temp.sort()
>>> fruits, indices = zip(*temp)
>>> fruits
('apple', 'banana', 'coconut', 'pear')
>>> indices
(1, 2, 3, 0)
Another use is a fast row/column switch in
rectangular tables organized as lists of tuples.
> Can you elaborate the case for zip() without arguments?
Yes, it comes up whenever zip() is used with the * operator
(as in the last two examples above). When there are
variable length argument lists, it is helpful to have a smooth
transition to the null case -- for instance a zero matrix used with:
def transpose(mat):
return zip(*mat)
Transposing is a common first step in when reading CSV files
so that the tuples (records) of non-homogenenous data can be
re-grouped into homogenous data from each column position (field):
dates, rainfall, hightemp, lowtemp = zip(*csv.reader(file("weather.csv")))
print 'Monthly total rainfall', sum(rainfall)
Subsequent to the time the PEP was decided, I think the trend has
been for python to handle null cases the same as regular cases and
not raise exceptions. For instance, sum([]) returns 0 instead of
demanding at least one addend.
The star operator is also used in the unzip operation, described
above as zip(*temp) in the Schwarzian transform example.
Unzipping a reasonably common way of breaking out data
aggregates.
IOW, zip(*something) is handy for re-organizing data and the
empty dataset should need not special case handling. This is
especially true when data is built-up from the null case:
mytable = []
gen_report(mytable) # don't have this raise an exception
mytable.append(newrecord())
gen_report(mytable)
mytable.append(newrecord())
gen_report(mytable)
mytable.append(newrecord())
That about covers the subject.
Raymond Hettinger