PEP 289 - Generator Expressions - Let's Move Forward
I'd like to get generator expressions checked into CVS. Regarding the early-or-late binding issue, here's what I'd like to see happen: I'd like the late-binding (i.e. non-capture) version checked in and released with Python 2.4a1 and a2. If we find there are real problems with these semantics, we can switch to early-binding semantics in 2.4b1; this would be almost completely backwards compatible. But if it works in the alpha release, we'll stick with late binding. My reasons for (still!) preferring late binding are: (a) I find it hard to defend automatic variable capture given Python's general late-binding semantics (b) I believe that use cases needing early binding are uncommon and strained: they all involve creating a list of generator expressions, which IMO is a pretty unusual thing to do I cannot volunteer to do the final code review and checkin, and I believe Raymond is still on vacation (he's asked me to relieve him from the responsibility of shepherding this patch) so we need someone to volunteer. The patch is here: http://sf.net/tracker/?func=detail&aid=872326&group_id=5470&atid=305470 --Guido van Rossum (home page: http://www.python.org/~guido/)
Hello Guido, I did a quick review of the stdlib, including the tests, to see which list comprehensions could be replaced with generator expressions. Thought I admit I am biased towards early binding, I ended up looking for cases with the following properties: * doesn't use the result immediately * contains free variables This is less that 10% of the cases. However, for each of them, I had to check if the free variables had a chance of being modified after the genexpr. This is pretty rare, admittedly, but there is an example in test/test_random.py (test_avg_std). There are a number of other examples that just don't happen to modify the free variables, by chance; e.g. in optparse.py: metavar = option.metavar or option.dest.upper() short_opts = [sopt + metavar for sopt in option._short_opts] long_opts = [lopt + "=" + metavar for lopt in option._long_opts] If we find out later that long_opts actually needs a slightly different value for metavar, it would be tempting to do: metavar = option.metavar or option.dest.upper() short_opts = [sopt + metavar for sopt in option._short_opts] metavar = option.metavar or option.dest.capitalize() long_opts = [lopt + "=" + metavar for lopt in option._long_opts] Replace these with genexprs and it doesn't work any more: the 2nd metavar is unexpectedly used in the first genexpr as well. In general I find it strange to have to look if a given variable could be modified much later in the same function to be sure of what a genexpr really means. Early binding is closer to the idea that turning a listcomp into a genexprs should just work if you only iterate once on the result. A bientôt, Armin.
I did a quick review of the stdlib, including the tests, to see which list comprehensions could be replaced with generator expressions. Thought I admit I am biased towards early binding, I ended up looking for cases with the following properties:
* doesn't use the result immediately * contains free variables
This is less that 10% of the cases. However, for each of them, I had to check if the free variables had a chance of being modified after the genexpr. This is pretty rare, admittedly, but there is an example in test/test_random.py (test_avg_std). There are a number of other examples that just don't happen to modify the free variables, by chance; e.g. in optparse.py:
metavar = option.metavar or option.dest.upper() short_opts = [sopt + metavar for sopt in option._short_opts] long_opts = [lopt + "=" + metavar for lopt in option._long_opts]
If we find out later that long_opts actually needs a slightly different value for metavar, it would be tempting to do:
metavar = option.metavar or option.dest.upper() short_opts = [sopt + metavar for sopt in option._short_opts] metavar = option.metavar or option.dest.capitalize() long_opts = [lopt + "=" + metavar for lopt in option._long_opts]
Replace these with genexprs and it doesn't work any more: the 2nd metavar is unexpectedly used in the first genexpr as well. In general I find it strange to have to look if a given variable could be modified much later in the same function to be sure of what a genexpr really means. Early binding is closer to the idea that turning a listcomp into a genexprs should just work if you only iterate once on the result.
Thanks for the analysis. My comment on this: why on earth would you want to replace the perfectly sensible list comprehension with a generator comprehension in this particular case? (Note that the remainder of that function proceeds to do list concatenation of the two lists.) A more general question would be: have you found any examples of list comprehensions that weren't immediately used where there would be a compelling reason to use a generator expression instead? In this example, I can't see any of the touted advantages -- the set of command line options will never be long enough to have to worry about the memory consumption. How about the other examples you found? --Guido van Rossum (home page: http://www.python.org/~guido/)
At 05:16 30.04.2004 -0700, Guido van Rossum wrote:
A more general question would be: have you found any examples of list comprehensions that weren't immediately used where there would be a compelling reason to use a generator expression instead? In this example, I can't see any of the touted advantages -- the set of command line options will never be long enough to have to worry about the memory consumption. How about the other examples you found?
this assumes that people will choose either generator expressions or list comprehesions on a case-by-case basis. I'm not sure it's a reasonable assumption for the whole range of users - starting with newbies. Probably documentation and what is predicated on c.l.p will influence this. OTOH both are probably good enough default choices, list comprehesions because if one gets insufficient performance she will learn to switch, generator expressions because for small numbers of elements things are a wash (unless someone is cretating a lot of small sequences :)).So the more they are interchangeable in all other respects the better. "There should be one-- and preferably only one --obvious way to do it." of course this is the case for generator expressions and list comprehsensions because they are separated by performance characteristics and somehow by the fact that one directly produces a list. I suspect that we have to learn to live with the fact that they are distinguished by (just) that, both a nuisance (sometimes people don't want to bother to choose) and a feature. regards.
In article <5.2.1.1.0.20040430142620.030304e8@pop.bluewin.ch>, Samuele Pedroni <pedronis@bluewin.ch> wrote:
... this assumes that people will choose either generator expressions or list comprehesions on a case-by-case basis. I'm not sure it's a reasonable assumption for the whole range of users - starting with newbies. Probably documentation and what is predicated on c.l.p will influence this.
I completely agree. Generator expressions and list comprehensions are similar enough that it seems a wart to have to carefully differentiate between two in normal use. Personally, I'd only like to see generator expressions added to Python if they can allow list comprehensions to be deprecated -- to avoid language clutter. One question, though: is "if" part of generator expressions or not? I read the current version of the PEP and I'm confused. It offers the example: max(len(line) for line in file if line.strip()) But that is the only instance in the PEP that I noticed. The section "The Details" says almost nothing about "if". The only thing I found that might be relevant appeared at the end of: "2. The syntax requires...": "...where listmaker1 is almost the same as listmaker, but only allows a single test after 'for' ... 'in'." I suppose that might be saying that a single "if" condition is allowed. If so, great! -- Russell
[Russell E. Owen]
I completely agree. Generator expressions and list comprehensions are similar enough that it seems a wart to have to carefully differentiate between two in normal use.
In normal use you won't have to distinguish, where "normal use" is pretty much defined by "cases where it doesn't matter" <wink>.
Personally, I'd only like to see generator expressions added to Python if they can allow list comprehensions to be deprecated -- to avoid language clutter.
Listcomps can be deprecated, although not all existing uses of listcomps will translate directly into genexps. I think this has become a "practicality beats purity" feature. When Python had only 3 scopes, it was fine to ignore "explicit is better than implicit": the scope model was so bare that explictness would have been overkill. When multiple nested scopes got introduced, and function bodies start appearing in places other than lambda and def (which is the case for genexps), scope gets much more mysterious (and so also potentially surprising) in the absence of explicit scope declarations. But simple genexps will work fine anyway almost all the time, so it doesn't appear to matter much that devious uses will have nightmarish semantics.
One question, though: is "if" part of generator expressions or not?
Yes. If the PEP isn't clear about this, it's a flaw in the PEP. I don't think the semantics of genexps are clear in the PEP either (unless that's changed recently).
In article <LNBBLJKPBEHFEDALKOLCAEPOKJAB.tim.one@comcast.net>, "Tim Peters" <tim.one@comcast.net> wrote:
Personally, I'd only like to see generator expressions added to Python if they can allow list comprehensions to be deprecated -- to avoid language clutter.
Listcomps can be deprecated, although not all existing uses of listcomps will translate directly into genexps.
Not all? When would [listcomp] ever not be replaceable by list(gencomp)? -- David Eppstein http://www.ics.uci.edu/~eppstein/ Univ. of California, Irvine, School of Information & Computer Science
On Fri, 2004-04-30 at 18:09, David Eppstein wrote:
Not all? When would [listcomp] ever not be replaceable by list(gencomp)?
I dunno. I happen to like [listcomp] syntax over list(genexpr). TOOWTDI-be-damned-ly y'rs, -Barry
On 4/30/04 6:11 PM -0400 Barry Warsaw <barry@python.org> wrote:
Not all? When would [listcomp] ever not be replaceable by list(gencomp)?
I dunno. I happen to like [listcomp] syntax over list(genexpr).
Oh, me too (and I'm getting a little irritated with writing dict([...]) so often instead of having a proper dictcomp syntax) but semantically I think they are the same. -- David Eppstein http://www.ics.uci.edu/~eppstein/ Univ. of California, Irvine, School of Information & Computer Science
On Fri, 2004-04-30 at 18:28, David Eppstein wrote:
Oh, me too (and I'm getting a little irritated with writing dict([...]) so often instead of having a proper dictcomp syntax) but semantically I think they are the same.
I'm easily convinced. pep-274-ly y'rs, -Barry
Listcomps can be deprecated, although not all existing uses of listcomps will translate directly into genexps.
[David Eppstein]
Not all? When would [listcomp] ever not be replaceable by list(gencomp)?
The point of Armin's original msg in this thread was that direct replacement can have different semantics. Applying a transformation (via list()) is a different story, although I expect someone obsessed enough could concoct an example that worked differently, because genexp guts truly live in a different scope than listcomp guts (e.g., play games with locals()).
"Russell E. Owen" <rowen@cesmail.net>:
Personally, I'd only like to see generator expressions added to Python if they can allow list comprehensions to be deprecated -- to avoid language clutter.
That's possible in the sense that a list comprehension can be written as list(generator_expression). But it will still be necessary to decide whether to wrap a list() around it. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
Samuele Pedroni <pedronis@bluewin.ch>:
"There should be one-- and preferably only one --obvious way to do it." of course this is the case for generator expressions and list comprehsensions because they are separated by performance characteristics and somehow by the fact that one directly produces a list. I suspect that we have to learn to live with the fact that they are distinguished by (just) that, both a nuisance (sometimes people don't want to bother to choose) and a feature.
I just had a thought. What if generator expressions were allowed *only* as a parameter to a function call? Would that help avoid leading people into the trap of using them where they're not appropriate? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
What if generator expressions were allowed *only* as a parameter to a function call?
Would that help avoid leading people into the trap of using them where they're not appropriate?
Nah, people would just use an identity function to work around the restriction. --Guido van Rossum (home page: http://www.python.org/~guido/)
Hello, On Fri, Apr 30, 2004 at 05:16:32AM -0700, Guido van Rossum wrote:
Thanks for the analysis. My comment on this: why on earth would you want to replace the perfectly sensible list comprehension with a generator comprehension in this particular case?
I was using this example more generally to show that if you have a late-binding genexpr, you have to worry about not changing the local variables it uses, which looks like all uses of genexprs with free variables are subtle bugs waiting to appear whenever someone modifies the function anywhere after the genexpr. The same argument applies to the other stdlib cases I mentioned (which make somewhat less than 10% of all uses). The alternative -- actively using the late-binding aspect -- would lead to code in which you have to worry about which values the local variables have at the moment the next element is fetched from the iterator. For example: x_square = (x*x for i in itertools.repeat(None)).next ... x = 5 print x_square() # 25 x = 6 print x_square() # 36 This looks (a) cool and (b) a complete hack that nobody should be allowed to do without messing with sys._getframe(). A bientôt, Armin.
On Fri, 2004-04-30 at 09:07, Armin Rigo wrote:
This looks (a) cool and (b) a complete hack that nobody should be allowed to do without messing with sys._getframe().
It reminds me a little bit of the dynamic binding in Emacs lisp. As incredibly useful as that is sometimes, it's a disgusting hack. :) -Barry
This looks (a) cool and (b) a complete hack that nobody should be allowed to do without messing with sys._getframe().
It reminds me a little bit of the dynamic binding in Emacs lisp. As incredibly useful as that is sometimes, it's a disgusting hack. :)
And, if I understand correctly, something from which the Lisp community is moving away--preferring, instead, the notion of "lexical scoping" such as in Scheme.
On Fri, 2004-04-30 at 13:59, Andrew Koenig wrote:
This looks (a) cool and (b) a complete hack that nobody should be allowed to do without messing with sys._getframe().
It reminds me a little bit of the dynamic binding in Emacs lisp. As incredibly useful as that is sometimes, it's a disgusting hack. :)
And, if I understand correctly, something from which the Lisp community is moving away--preferring, instead, the notion of "lexical scoping" such as in Scheme.
That's my understanding too. Back when I followed such things, there was a lot of discussion in the XEmacs lists about implementing lexical scoping and getting rid of this ugly hack (eventually -- no doubt there's tons of old elisp that would break as a result). tim-would-have-fixed-python-mode-ly y'rs, -Barry
Andrew Koenig wrote:
This looks (a) cool and (b) a complete hack that nobody should be
allowed to
do without messing with sys._getframe().
It reminds me a little bit of the dynamic binding in Emacs lisp. As incredibly useful as that is sometimes, it's a disgusting hack. :)
And, if I understand correctly, something from which the Lisp community is moving away--preferring, instead, the notion of "lexical scoping" such as in Scheme. Well, global variables in Common Lisp are still dynamically-scoped, which makes perfect sense IMO.
Also, I think there is a SRFI almost every Scheme implements (SRFI ~ PEP) which add dynamic bindings. Cheers, Michael
On Fri, 2004-04-30 at 09:30, Barry Warsaw wrote:
On Fri, 2004-04-30 at 09:07, Armin Rigo wrote:
This looks (a) cool and (b) a complete hack that nobody should be allowed to do without messing with sys._getframe().
It reminds me a little bit of the dynamic binding in Emacs lisp. As incredibly useful as that is sometimes, it's a disgusting hack. :)
The funny thing is that it's the result of a static scoping discipline rather than dynamic scoping. What's funny about it has more to do with side-effects that scoping rules. If the x_square function was returned out of its defining block, there would be no way to rebind x and it would not be possible to define a variable x local to the caller that would affect it. Put another way, it's possible to reason statically about what binding x will use when x_square() is called. It's the same technique you would use to write an accumulator generator (that is, a function that returns accumulator functions). See the appendix of Paul Graham's essay: http://www.paulgraham.com/icad.html Jeremy PS I do owe everyone a PEP on the subject of re-binding.
I was using this example more generally to show that if you have a late-binding genexpr, you have to worry about not changing the local variables it uses, which looks like all uses of genexprs with free variables are subtle bugs waiting to appear whenever someone modifies the function anywhere after the genexpr. The same argument applies to the other stdlib cases I mentioned (which make somewhat less than 10% of all uses).
That's not new, we all know that that's the downside of late binding. My counter to that is that *any* use of genexps where the consumer doesn't consume (or discard) the iterator before the next line is reached is extremely advanced use. --Guido van Rossum (home page: http://www.python.org/~guido/)
Hello Guido, On Fri, Apr 30, 2004 at 04:12:59PM -0700, Guido van Rossum wrote:
That's not new, we all know that that's the downside of late binding.
My counter to that is that *any* use of genexps where the consumer doesn't consume (or discard) the iterator before the next line is reached is extremely advanced use.
My argument was that late-binding make it even more mysterious, not less suprizing, because in the minority of cases where a listcomp is not used immediately but still need a genexpr for performance, you have to think carefully. Well, I insist a bit because I cannot helping thinking that the arguments for early-binding are "that's useful in a very small number of cases", and the arguments for late-binding are "you don't want to do that anyway so let's pick the semantic of free vars in functions". I don't immediately think of genexps as similar to functions, rather closer to listcomps, even if I know that they are functions under the hood. Armin
Armin Rigo <arigo@tunes.org>:
Early binding is closer to the idea that turning a listcomp into a genexprs should just work if you only iterate once on the result.
I'm not sure that's an idea we should be promulgating in the first place. The original motivation for generator expressions was things like total = sum(x**2 for x in stuff) where not only will the sequence be used just once, but it will be used *immediately* before doing anything else. Those are the use cases we need to advertise this as targeting, I think, not "any listcomp that's only used once". Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
Armin Rigo <arigo@tunes.org>:
Early binding is closer to the idea that turning a listcomp into a genexprs should just work if you only iterate once on the result.
Greg Ewing:
I'm not sure that's an idea we should be promulgating in the first place. The original motivation for generator expressions was things like
total = sum(x**2 for x in stuff)
where not only will the sequence be used just once, but it will be used *immediately* before doing anything else.
Those are the use cases we need to advertise this as targeting, I think, not "any listcomp that's only used once".
Right -- thanks for saying it so clearly! Early binding seems to be trying to solve a problem with a use for genexprs that's questionable at best. --Guido van Rossum (home page: http://www.python.org/~guido/)
At 21:08 02.05.2004 -0700, Guido van Rossum wrote:
Armin Rigo <arigo@tunes.org>:
Early binding is closer to the idea that turning a listcomp into a genexprs should just work if you only iterate once on the result.
Greg Ewing:
I'm not sure that's an idea we should be promulgating in the first place. The original motivation for generator expressions was things like
total = sum(x**2 for x in stuff)
where not only will the sequence be used just once, but it will be used *immediately* before doing anything else.
yes, but the question is how much easy is to grasp what "used *immediately*" means for a genexp, or IOW how much education is needed. this looks innocent enough: l = list_of_my_strings() for pfx in discard_prefixes: l = ( x for x in l if not x.startswith(pfx) ) # this is not using l *immediately* use_strings(l)
Those are the use cases we need to advertise this as targeting, I think, not "any listcomp that's only used once".
Right -- thanks for saying it so clearly! Early binding seems to be trying to solve a problem with a use for genexprs that's questionable at best.
Hello Greg, On Mon, May 03, 2004 at 03:03:11PM +1200, Greg Ewing wrote:
I'm not sure that's an idea we should be promulgating in the first place. The original motivation for generator expressions was things like
total = sum(x**2 for x in stuff)
where not only will the sequence be used just once, but it will be used *immediately* before doing anything else.
I understand perfecty well that anything else is advanced stuff, and that people shouldn't do advanced stuff if they are not aware of all the issues. I also know that any kind of laziness together with mutating objects creates delicate problems. What I am looking for is a genexpr implementation that would, as a bonus, be useful in advanced cases too instead of being just unusable beyond these 95% of cases where the binding model doesn't matter. The problem I have with late-bound names is that small, semi-advanced, "innocent enough" examples like Samuele's keep popping up that would just give wrong answers (not just crash). Personally when I look at all these examples I think they are elegant ways to express things that I could want to do, and I could easily read and write this kind of code -- but it just doesn't do that I expect it to. The real answer is much more convoluted. The only advantage I see to late binding is that they prevent people from doing any advanced stuff with genexprs, and thus beginners never have to read code containing advanced uses of genexprs (even clean-looking one like Samuele's). Armin
The problem I have with late-bound names is that small, semi-advanced, "innocent enough" examples like Samuele's keep popping up that would just give wrong answers (not just crash).
Hm. I don't think I've seen a single example that wasn't specifically concocted by early-binding advocates (yours included -- you took random listcomps and mused on what would happen if they'd be genexprs instead).
Personally when I look at all these examples I think they are elegant ways to express things that I could want to do, and I could easily read and write this kind of code -- but it just doesn't do that I expect it to. The real answer is much more convoluted.
Remember that your brain is several times larger than that of most Python users.
The only advantage I see to late binding is that they prevent people from doing any advanced stuff with genexprs, and thus beginners never have to read code containing advanced uses of genexprs (even clean-looking one like Samuele's).
My problem with early binding is that it intruduces a new set of behind-the-scenes semantics that you cannot always ignore -- while the examples where it doesn't DWYM are even more obscure than the examples that it is intended to save, it means that you still have to think about the new semantics when doing advanced stuff -- and these new semantics are unlike anything seen before in Python: transformations that require a definition of "free variables" are not seen anywhere else (nested scopes use a different concept). --Guido van Rossum (home page: http://www.python.org/~guido/)
Hello Guido, On Mon, May 03, 2004 at 10:14:04AM -0700, Guido van Rossum wrote:
(...) these new semantics are unlike anything seen before in Python: transformations that require a definition of "free variables" are not seen anywhere else (nested scopes use a different concept).
I think I see your point, although I'm not sure I see how nested scopes work if not with free variables ( = variables used but not bound in the current scope). Armin
[Guido]
Hm. I don't think I've seen a single example that wasn't specifically concocted by early-binding advocates (yours included -- you took random listcomps and mused on what would happen if they'd be genexprs instead).
That's a bit of a revisionist stretch: the examples I trotted out last October *made* me an early-binding advocate at the time. For example, http://mail.python.org/pipermail/python-dev/2003-October/039323.html That example doesn't have lists of generators either, which is another thing you don't remember ever seeing <wink>. It's still the case that I haven't seen a plausible example where late-binding semantics both make a difference and are desirable, or where early-binding semantics both make a difference and are undesirable -- although the magical implicitness of both schemes leaves me queasy. ...
My problem with early binding is that it intruduces a new set of behind-the-scenes semantics that you cannot always ignore -- while the examples where it doesn't DWYM are even more obscure than the examples that it is intended to save, it means that you still have to think about the new semantics when doing advanced stuff -- and these new semantics are unlike anything seen before in Python: transformations that require a definition of "free variables" are not seen anywhere else (nested scopes use a different concept).
Late-binding in genexps can't be ignored either when doing advanced stuff, and are also unlike anything seen before in Python: transforming what looks like an expression into a delayed function body is also unprecendented (in the absence of a lambda). The fact that it "looks like" an expression is exactly what's so surprising about that "the current values" of p and pipe don't get used in the above example's # add a filter over the current pipe, and call that the new pipe pipe = (e for e in pipe if p(e)) It didn't strike me as an advanced use at the time, just as an example of something that seemed like it should be natural. It sounds like, today, "the current value" of pipe will be used after all, but the evaluation of p will get delayed? If so, that's not notably easy to explain either (or so it seems to me).
At 11:26 23.04.2004 -0700, Guido van Rossum wrote:
My reasons for (still!) preferring late binding are:
(a) I find it hard to defend automatic variable capture given Python's general late-binding semantics
I have been thinking about reason (a), the point is that a genexp is not simply a new way to introduce a closure, it is not simply sugar over lambda or def. A genexp is sugar for defining on the fly a generator closure and second instatiating it! Now there are two parts to that, it is true that closures in Python have late-bindings semantics but when they are deployed people in most cases use idioms that circuvent and neutralize late-bindings. def make_show_action(msg): def action(evt,msg): show(msg) return action used as for act,msg in msgs: pnl.add(Button(act,action=make_show_action(msg)) or for act,msg in msgs: pnl.add(Button(act,lambda msg=msg: show(msg))) in fact here lambda has late-bindings semantics and we eschew them by not having any free variable at all. Of course here enters (b), mixing for and genexp is advanced use. That's really a gut-feelings call at this point. For most practical cases, for closures, late-bindings sematics seems something to keep under control, it's an implementation detail to allow recursive and mutally recursive nested functions. So to repeat genexp have no simple direct correspondence to a lambda or a def. They are sugar for some more involved code, so I think it is more a call of what is the general idiomatic form for that kind of code, also maybe considering that the genexp is hiding the details of what is really going on (kind of supplying sheep's clothes to a wolf). Maybe the sane thing is late-bindings, maybe early. But I don't think that as suggested by (a) that there is some deep design logic internal to Python at stake here. At most the argument could be that late-bindings make things easier to explain because they are kind of like lambdas. Maybe, maybe not. Then there is (b).
(b) I believe that use cases needing early binding are uncommon and strained: they all involve creating a list of generator expressions, which IMO is a pretty unusual thing to do
regards
participants (11)
-
Andrew Koenig
-
Armin Rigo
-
Barry Warsaw
-
David Eppstein
-
Greg Ewing
-
Guido van Rossum
-
Jeremy Hylton
-
Michael Walter
-
Russell E. Owen
-
Samuele Pedroni
-
Tim Peters