PEP 201, zip() builtin
Barry A. Warsaw
bwarsaw at beopen.com
Thu Jul 27 15:24:38 EDT 2000
Folks,
Here is the latest PEP 201, updated based on conversations with Guido
yesterday. Notice that there are no more Open Issues. See instead
the BDFL Pronouncements section. Plan on seeing a C-based
implementation checked into the CVS devel tree soon.
If you want to comment on this PEP, you must include "PEP 201" in the
Subject: header.
Enjoy,
-Barry
-------------------- snip snip --------------------
PEP: 201
Title: Parallel Iteration
Version: $Revision: 1.10 $
Author: bwarsaw at beopen.com (Barry A. Warsaw)
Python-Version: 2.0
Status: Draft
Created: 13-Jul-2000
Post-History: 27-Jul-2000
Introduction
This PEP describes the `parallel iteration' proposal for Python
2.0, previously known as `parallel for loops'. This PEP tracks
the status and ownership of this feature, slated for introduction
in Python 2.0. It contains a description of the feature and
outlines changes necessary to support the feature. This PEP
summarizes discussions held in mailing list forums, and provides
URLs for further information, where appropriate. The CVS revision
history of this file contains the definitive historical record.
Motivation
Standard for-loops in Python iterate over every element in a
sequence until the sequence is exhausted[1]. However, for-loops
iterate over only a single sequence, and it is often desirable to
loop over more than one sequence, in a lock-step, "Chinese Menu"
type of way.
The common idioms used to accomplish this are unintuitive and
inflexible. This PEP proposes a standard way of performing such
iterations by introducing a new builtin function called `zip'.
Parallel For-Loops
Parallel for-loops are non-nested iterations over two or more
sequences, such that at each pass through the loop, one element
from each sequence is taken to compose the target. This behavior
can already be accomplished in Python through the use of the map()
built-in function:
>>> a = (1, 2, 3)
>>> b = (4, 5, 6)
>>> for i in map(None, a, b): print i
...
(1, 4)
(2, 5)
(3, 6)
>>> map(None, a, b)
[(1, 4), (2, 5), (3, 6)]
The for-loop simply iterates over this list as normal.
While the map() idiom is a common one in Python, it has several
disadvantages:
- It is non-obvious to programmers without a functional
programming background.
- The use of the magic `None' first argument is non-obvious.
- It has arbitrary, often unintended, and inflexible semantics
when the lists are not of the same length: the shorter sequences
are padded with `None'.
>>> c = (4, 5, 6, 7)
>>> map(None, a, c)
[(1, 4), (2, 5), (3, 6), (None, 7)]
For these reasons, several proposals were floated in the Python
2.0 beta time frame for providing a better spelling of parallel
for-loops. The initial proposals centered around syntactic
changes to the for statement, but conflicts and problems with the
syntax were unresolvable, especially when parallel for-loops were
combined with another proposed feature called `list
comprehensions' (see pep-0202.txt).
The Proposed Solution
The proposed solution is to introduce a new built-in sequence
generator function, available in the __builtin__ module. This
function is to be called `zip' and has the following signature:
zip(seqa, [seqb, [...]])
zip() takes one or more sequences and weaves their elements
together, just as map(None, ...) does with sequences of equal
length. The weaving stops when the shortest sequence is
exhausted.
Return Value
zip() returns a real Python list, the same way map() does.
Examples
Here are some examples, based on the reference implementation
below.
>>> a = (1, 2, 3, 4)
>>> b = (5, 6, 7, 8)
>>> c = (9, 10, 11)
>>> d = (12, 13)
>>> zip(a, b)
[(1, 5), (2, 6), (3, 7), (4, 8)]
>>> zip(a, d)
[(1, 12), (2, 13)]
>>> zip(a, b, c, d)
[(1, 5, 9, 12), (2, 6, 10, 13)]
Note that when the sequences are of the same length, zip() is
reversible:
>>> a = (1, 2, 3)
>>> b = (4, 5, 6)
>>> x = zip(a, b)
>>> y = zip(*x) # alternatively, apply(zip, x)
>>> z = zip(*y) # alternatively, apply(zip, y)
>>> x
[(1, 4), (2, 5), (3, 6)]
>>> y
[(1, 2, 3), (4, 5, 6)]
>>> z
[(1, 4), (2, 5), (3, 6)]
>>> x == z
1
It is not possible to reverse zip this way when the sequences are
not all the same length.
Reference Implementation
Here is a reference implementation, in Python of the zip()
built-in function. These would ultimately be replaced by
equivalent C code.
def zip(*args):
if not args:
raise TypeError('zip() expects one or more sequence arguments')
ret = []
# find the length of the shortest sequence
shortest = min(*map(len, args))
for i in range(shortest):
item = []
for s in args:
item.append(s[i])
ret.append(tuple(item))
return ret
BDFL Pronouncements
Note: the BDFL refers to Guido van Rossum, Python's Benevolent
Dictator For Life.
- The function's name. An earlier version of this PEP included an
open issue listing 20+ proposed alternative names to zip(). In
the face of no overwhelmingly better choice, the BDFL strongly
prefers zip() due to it's Haskell[2] heritage. See version 1.7
of this PEP for the list of alteratives.
- zip() shall be a built-in function.
- Optional padding. An earlier version of this PEP proposed an
optional `pad' keyword argument, which would be used when the
argument sequences were not the same length. This is similar
behavior to the map(None, ...) semantics except that the user
would be able to specify pad object. This has been rejected by
the BDFL in favor of always truncating to the shortest sequence.
- Lazy evaluation. An earlier version of this PEP proposed that
zip() return a built-in object that performed lazy evaluation
using __getitem__() protocol. This has been strongly rejected
by the BDFL in favor of returning a real Python list. If lazy
evaluation is desired in the future, the BDFL suggests an xzip()
function be added.
- zip() with no arguments. the BDFL strongly prefers this raise a
TypeError exception.
- zip() with one argument. the BDFL strongly prefers that this
return a list of 1-tuples.
- Inner and outer container control. An earlier version of this
PEP contains a rather lengthy discussion on a feature that some
people wanted, namely the ability to control what the inner and
outer container types were (they are tuples and list
respectively in this version of the PEP). Given the simplified
API and implementation, this elaboration is rejected. For a
more detailed analysis, see version 1.7 of this PEP.
References
[1] http://www.python.org/doc/current/ref/for.html
[2] http://www.haskell.org/onlinereport/standard-prelude.html#$vzip
TBD: URL to python-dev archives
Copyright
This document has been placed in the public domain.
Local Variables:
mode: indented-text
indent-tabs-mode: nil
End:
More information about the Python-list
mailing list