[Python-Dev] accumulator display syntax

Wed Oct 22 02:02:28 EDT 2003

> Here is a rough draft on the resurrected PEP.

Thanks -- that was quick!

> I'm sure it contains many flaws and I welcome suggested amendments.
> In particular, the follow needs attention:
> 
> * Precise specification of the syntax including the edge cases
>   with commas where enclosing parentheses are required.
> 
> * Making sure the acknowledgements are correct and complete.
> 
> * Verifying my understanding of the issues surrounding late binding,
>   modification of locals, and returning generator expressions.
> 
> * Clear articulation of the expected benefits.  There are so many,
>   it was difficult to keep it focused.
> 
> 
> Raymond Hettinger
> 
> ----------------------------------------------------------------------
> 
> PEP: 289
> Title: Generator Expressions
> Version: $Revision: 1.2 $
> Last-Modified: $Date: 2003/08/30 23:57:36 $
> Author: python at rcn.com (Raymond D. Hettinger)
> Status: Active
> Type: Standards Track
> Created: 30-Jan-2002
> Python-Version: 2.3
> Post-History: 22-Oct-2003
> 
> 
> Abstract
> 
>     This PEP introduces generator expressions as a high performance,
>     memory efficient generalization of list expressions and
>     generators.

Um, please change "list expressions" back to "list comprehensions"
everywhere.  Global substitute gone awry? :-)

> Rationale
> 
>     Experience with list expressions has shown their wide-spread
>     utility throughout Python.  However, many of the use cases do
>     not need to have a full list created in memory.  Instead, they
>     only need to iterate over the elements one at a time.
> 
>     For instance, the following dictionary constructor code will
>     build a full item list in memory, iterate over that item list,
>     and, when the reference is no longer needed, delete the list:
> 
>         d = dict([(k, func(v)) for k in keylist])

I'd prefer to use the example

          sum([x*x for x in range(10)])

>     Time, clarity, and memory are conserved by using an generator
>     expession instead:
> 
>         d = dict((k, func(v)) for k in keylist)

which becomes

          sum(x*x for x in range(10))

(I find the dict constructor example sub-optimal because it starts
with two parentheses, and visually finding the match for the second of
those is further complicated by the use of func(v) for the value.)

>     Similar benefits are conferred on the constructors for other
>     container objects:

(Here you can use the dict constructor example.)

>         s = Set(word  for line in page  for word in line.split())
> 
>     Having a syntax similar to list comprehensions makes it easy to
>     switch to an iterator expression when scaling up application.
                   ^^^^^^^^
                   generator

>     Generator expressions are especially useful in functions that reduce
>     an iterable input to a single value:
> 
>         sum(len(line) for line.strip() in file if len(line)>5)
                            ^^^^^^^^^^^^

That's not valid syntax; my example was something like

          sum(len(line)  for line in file  if line.strip())

>     Accordingly, generator expressions are expected to partially
>     eliminate the need for reduce() which is notorious for its lack
>     of clarity. And, there are additional speed and clarity benefits
>     from writing expressions directly instead of using lambda.
> 
>     List expressions greatly reduced the need for filter() and
>     map().  Likewise, generator expressions are expected to minimize
>     the need for itertools.ifilter() and itertools.imap().  In
>     contrast, the utility of other itertools will be enhanced by
>     generator expressions:
> 
>         dotproduct = sum(x*y for x,y in itertools.izip(x_vector, y_vector))
> 
> 
> BDFL Pronouncements
> 
>     The previous version of this PEP was REJECTED.  The bracketed
>     yield syntax left something to be desired; the performance gains
>     had not been demonstrated; and the range of use cases had not
>     been shown.  After, much discussion on the python-dev list, the
>     PEP has been resurrected its present form.  The impetus for the
>     discussion was an innovative proposal from Peter Norvig.
> 
> 
> The Gory Details
> 
>     1) In order to achieve a performance gain, generator expressions need
>     to be run in the local stackframe; otherwise, the improvement in
>     cache performance gets offset by the time spent switching
>     stackframes.  The upshot of this is that generator expressions
>     need to be both created and consumed within the context of a
>     single stackframe.  Accordingly, the generator expression cannot
>     be returned to another function:
> 
>         return (k, func(v)) for k in keylist

Heh?  Did you keep this from the old PEP?  Performance tests show that
a generator function is already faster than a list comprehension, and
the semantics are now defined as equivalent to creating an anonymous
generator function and calling it.  (There's still discussion about
whether that generator function should copy the current value of all
free variables into default arguments.)

We need a Gory Detail item explaining the exact syntax.  I propose
that a generator expression always needs to be inside a set of
parentheses and cannot have a comma on either side.  Unfortunately
this is different from list comprehensions; while [1, x for x in R] is
illegal, [x for x in 1, 2, 3] is legal, meaning [x for x in (1,2,3)].

With reference to the file Grammar/Grammar in CVS, I think these
changes are suitable:

(1) The rule

      atom: '(' [testlist] ')'

    changes to

      atom: '(' [listmaker1] ')'

    where listmaker1 is almost the same as listmaker, but only allows a
    single test after 'for' ... 'in'.

(2) The rule for arglist is similarly changed so that it can be either
    a bunch of arguments possibly followed by *xxx and/or **xxx, or a
    single generator expression.  This is even hairier, so I'm not
    going to present the exact changes here; I'm confident that it can
    be done though using the same kind of breakdown as used for
    listmaker.  Yes, maybe the compiler may have to work a little
    harder to distinguish all the cases. :-)

>     2) The loop variable is not exposed to the surrounding function.
>     This both facilates the implementation and makes typical use
>     cases more reliable.  In some future version of Python, list
>     comprehensions will also hide the induction variable from the
>     surrounding code (and, in Py2.4, warnings will be issued for
>     code accessing the induction variable).
>                                                                 
>     3) Variables references in the generator expressions will
>     exhibit late binding just like other Python code.  In the
>     following example, the iterator runs *after* the value of y is
>     set to one:
> 
>         def h():
>             y = 0
>             l = [1,2]
>             def gen(S):
>                 for x in S:
>                     yield x+y
>             it = gen(l)
>             y = 1
>             for v in it:
>               print v

There is still discussion about this one.

>     4) List comprehensions will remain unchanged.
>     So, [x for x in S] is a list comprehension and
>     [(x for x in S)] is a list containing one generator expression.
> 
>     5) It is prohibited to use locals() for other than read-only use
>     in generator expressions.  This simplifies the implementation and
>     precludes a certain class of obfuscated code.

I wouldn't mention this.  assigning into locals() has an undefined
effect anyway.

> Acknowledgements:
> 
>     Peter Norvig resurrected the discussion proposal for "accumulation
>     displays".

Can you do inline URLs in the final version?  Maybe an opportunity to
learn reST. :-)  Or else at least add [3] to the text.

>     Alex Martelli provided critical measurements that proved the
>     the performance benefits of generator expressions.

And also argued with great force that this was a useful thing to have
(as have several others).

>     Samuele Pedroni provided the example of late binding.

(But he wanted generator expressions *not* to use late binding!)

>     Guido van Rossum suggested the bracket free, yield free syntax.

I don't need credits, and I wouldn't be surprised if someone else
had suggested it first.

>     Raymond Hettinger first proposed "generator comprehensions" in
>     January 2002.

Phillip Eby suggested "iterator expressions" as the name and
subsequently Tim Peters suggested "generator expressions".

> References
> 
>     [1] PEP 255 Simple Generators
>         http://python.sourceforge.net/peps/pep-0255.html
> 
>     [2] PEP 202 List Comprehensions
>         http://python.sourceforge.net/peps/pep-0202.html
> 
>     [3] Peter Norvig's Accumulation Display Proposal
>         http:///www.norvig.com/pyacc.html

I'd point to the thread in python-dev too.

BTW I think the idea of having some iterators support __copy__ as a
way to indicate they can be cloned is also PEPpable; we've pretty much
reached closure on that one.  PEP 1 explains how to get a PEP number.

--Guido van Rossum (home page: http://www.python.org/~guido/)