
Here is a rough draft on the resurrected PEP.
Thanks -- that was quick!
I'm sure it contains many flaws and I welcome suggested amendments. In particular, the follow needs attention:
* Precise specification of the syntax including the edge cases with commas where enclosing parentheses are required.
* Making sure the acknowledgements are correct and complete.
* Verifying my understanding of the issues surrounding late binding, modification of locals, and returning generator expressions.
* Clear articulation of the expected benefits. There are so many, it was difficult to keep it focused.
Raymond Hettinger
----------------------------------------------------------------------
PEP: 289 Title: Generator Expressions Version: $Revision: 1.2 $ Last-Modified: $Date: 2003/08/30 23:57:36 $ Author: python@rcn.com (Raymond D. Hettinger) Status: Active Type: Standards Track Created: 30-Jan-2002 Python-Version: 2.3 Post-History: 22-Oct-2003
Abstract
This PEP introduces generator expressions as a high performance, memory efficient generalization of list expressions and generators.
Um, please change "list expressions" back to "list comprehensions" everywhere. Global substitute gone awry? :-)
Rationale
Experience with list expressions has shown their wide-spread utility throughout Python. However, many of the use cases do not need to have a full list created in memory. Instead, they only need to iterate over the elements one at a time.
For instance, the following dictionary constructor code will build a full item list in memory, iterate over that item list, and, when the reference is no longer needed, delete the list:
d = dict([(k, func(v)) for k in keylist])
I'd prefer to use the example sum([x*x for x in range(10)])
Time, clarity, and memory are conserved by using an generator expession instead:
d = dict((k, func(v)) for k in keylist)
which becomes sum(x*x for x in range(10)) (I find the dict constructor example sub-optimal because it starts with two parentheses, and visually finding the match for the second of those is further complicated by the use of func(v) for the value.)
Similar benefits are conferred on the constructors for other container objects:
(Here you can use the dict constructor example.)
s = Set(word for line in page for word in line.split())
Having a syntax similar to list comprehensions makes it easy to switch to an iterator expression when scaling up application.
^^^^^^^^ generator
Generator expressions are especially useful in functions that reduce an iterable input to a single value:
sum(len(line) for line.strip() in file if len(line)>5)
^^^^^^^^^^^^ That's not valid syntax; my example was something like sum(len(line) for line in file if line.strip())
Accordingly, generator expressions are expected to partially eliminate the need for reduce() which is notorious for its lack of clarity. And, there are additional speed and clarity benefits from writing expressions directly instead of using lambda.
List expressions greatly reduced the need for filter() and map(). Likewise, generator expressions are expected to minimize the need for itertools.ifilter() and itertools.imap(). In contrast, the utility of other itertools will be enhanced by generator expressions:
dotproduct = sum(x*y for x,y in itertools.izip(x_vector, y_vector))
BDFL Pronouncements
The previous version of this PEP was REJECTED. The bracketed yield syntax left something to be desired; the performance gains had not been demonstrated; and the range of use cases had not been shown. After, much discussion on the python-dev list, the PEP has been resurrected its present form. The impetus for the discussion was an innovative proposal from Peter Norvig.
The Gory Details
1) In order to achieve a performance gain, generator expressions need to be run in the local stackframe; otherwise, the improvement in cache performance gets offset by the time spent switching stackframes. The upshot of this is that generator expressions need to be both created and consumed within the context of a single stackframe. Accordingly, the generator expression cannot be returned to another function:
return (k, func(v)) for k in keylist
Heh? Did you keep this from the old PEP? Performance tests show that a generator function is already faster than a list comprehension, and the semantics are now defined as equivalent to creating an anonymous generator function and calling it. (There's still discussion about whether that generator function should copy the current value of all free variables into default arguments.) We need a Gory Detail item explaining the exact syntax. I propose that a generator expression always needs to be inside a set of parentheses and cannot have a comma on either side. Unfortunately this is different from list comprehensions; while [1, x for x in R] is illegal, [x for x in 1, 2, 3] is legal, meaning [x for x in (1,2,3)]. With reference to the file Grammar/Grammar in CVS, I think these changes are suitable: (1) The rule atom: '(' [testlist] ')' changes to atom: '(' [listmaker1] ')' where listmaker1 is almost the same as listmaker, but only allows a single test after 'for' ... 'in'. (2) The rule for arglist is similarly changed so that it can be either a bunch of arguments possibly followed by *xxx and/or **xxx, or a single generator expression. This is even hairier, so I'm not going to present the exact changes here; I'm confident that it can be done though using the same kind of breakdown as used for listmaker. Yes, maybe the compiler may have to work a little harder to distinguish all the cases. :-)
2) The loop variable is not exposed to the surrounding function. This both facilates the implementation and makes typical use cases more reliable. In some future version of Python, list comprehensions will also hide the induction variable from the surrounding code (and, in Py2.4, warnings will be issued for code accessing the induction variable).
3) Variables references in the generator expressions will exhibit late binding just like other Python code. In the following example, the iterator runs *after* the value of y is set to one:
def h(): y = 0 l = [1,2] def gen(S): for x in S: yield x+y it = gen(l) y = 1 for v in it: print v
There is still discussion about this one.
4) List comprehensions will remain unchanged. So, [x for x in S] is a list comprehension and [(x for x in S)] is a list containing one generator expression.
5) It is prohibited to use locals() for other than read-only use in generator expressions. This simplifies the implementation and precludes a certain class of obfuscated code.
I wouldn't mention this. assigning into locals() has an undefined effect anyway.
Acknowledgements:
Peter Norvig resurrected the discussion proposal for "accumulation displays".
Can you do inline URLs in the final version? Maybe an opportunity to learn reST. :-) Or else at least add [3] to the text.
Alex Martelli provided critical measurements that proved the the performance benefits of generator expressions.
And also argued with great force that this was a useful thing to have (as have several others).
Samuele Pedroni provided the example of late binding.
(But he wanted generator expressions *not* to use late binding!)
Guido van Rossum suggested the bracket free, yield free syntax.
I don't need credits, and I wouldn't be surprised if someone else had suggested it first.
Raymond Hettinger first proposed "generator comprehensions" in January 2002.
Phillip Eby suggested "iterator expressions" as the name and subsequently Tim Peters suggested "generator expressions".
References
[1] PEP 255 Simple Generators http://python.sourceforge.net/peps/pep-0255.html
[2] PEP 202 List Comprehensions http://python.sourceforge.net/peps/pep-0202.html
[3] Peter Norvig's Accumulation Display Proposal http:///www.norvig.com/pyacc.html
I'd point to the thread in python-dev too. BTW I think the idea of having some iterators support __copy__ as a way to indicate they can be cloned is also PEPpable; we've pretty much reached closure on that one. PEP 1 explains how to get a PEP number. --Guido van Rossum (home page: http://www.python.org/~guido/)