RE: [Python-Dev] PEP 289 - Generator Expressions - Let's Move Forward
I'd like to get generator expressions checked into CVS.
Regarding the early-or-late binding issue, here's what I'd like to see happen: I'd like the late-binding (i.e. non-capture) version checked in and released with Python 2.4a1 and a2. If we find there are real problems with these semantics, we can switch to early-binding semantics in 2.4b1
Sounds like a good plan, but I have one question: Given the extent to which alpha releases are tested (and subtracting out those who use them only to check old code for compatibility and don't try out the new features) are we going to be able to tell if there are problems with the semantics? My best idea is that those who currently favor the early-binding approach (and I'm sure you've heard from plenty of them) should be encouraged to try it out in the alpha and report any issues, along with actual use cases. Someone collects these responses, then you (Guido) look over the results (with a grain of salt since it's mostly being tested by folks who object) and decide if it's a problem. Will that work? Anyone have a better idea? How do we go about asking people to try it out (is a mention by Guido on comp.lang.python sufficient)? -- Michael Chermside
On Fri, 2004-04-23 at 15:02, Michael Chermside wrote:
I'd like to get generator expressions checked into CVS.
Regarding the early-or-late binding issue, here's what I'd like to see happen: I'd like the late-binding (i.e. non-capture) version checked in and released with Python 2.4a1 and a2. If we find there are real problems with these semantics, we can switch to early-binding semantics in 2.4b1
Sounds like a good plan, but I have one question: Given the extent to which alpha releases are tested (and subtracting out those who use them only to check old code for compatibility and don't try out the new features) are we going to be able to tell if there are problems with the semantics?
We need reports from people writing real code with generator expressions. It's hard to guess if we'll get enough substantial feedback in the alpha releases. The kind of issues we're dealing with usually manifest themselves in real programs rather than toy examples; not sure that an alpha will get that kind of use. On the other hand, a generator expression is nearly equivalent to a list comprehension. So it should be easy for people to experiment with generator expressions, because they can swap them with list comprehensions in many cases. That leads me to wonder what exactly the rationale for generator expressions is. The PEP says that "time, clarity, and memory are conserved by using an generator expression" but I can only see how memory is conserved. That is, I don't find them any easier to read than list comprehensions and I don't understand the performance implications very well. It's not obvious to me that their faster.
My best idea is that those who currently favor the early-binding approach (and I'm sure you've heard from plenty of them) should be encouraged to try it out in the alpha and report any issues, along
I'm not sure what you mean by "it." If you mean try the alpha with late-binding semantics and see how it goes, I think that's a good plan.
with actual use cases. Someone collects these responses, then you (Guido) look over the results (with a grain of salt since it's mostly being tested by folks who object) and decide if it's a problem.
Jeremy
At 03:41 PM 4/23/04 -0400, Jeremy Hylton wrote:
That leads me to wonder what exactly the rationale for generator expressions is. The PEP says that "time, clarity, and memory are conserved by using an generator expression" but I can only see how memory is conserved. That is, I don't find them any easier to read than list comprehensions and I don't understand the performance implications very well. It's not obvious to me that their faster.
Alex Martelli previously posted some tests that showed them to be quite a bit faster for large lists.
On Fri, 2004-04-23 at 15:59, Phillip J. Eby wrote:
At 03:41 PM 4/23/04 -0400, Jeremy Hylton wrote:
That leads me to wonder what exactly the rationale for generator expressions is. The PEP says that "time, clarity, and memory are conserved by using an generator expression" but I can only see how memory is conserved. That is, I don't find them any easier to read than list comprehensions and I don't understand the performance implications very well. It's not obvious to me that their faster.
Alex Martelli previously posted some tests that showed them to be quite a bit faster for large lists.
Anyone know where to find these numbers. I've done some searching, but I can't find them. It would be good to include something concrete in the PEP. Jeremy
[Jeremy]
... That leads me to wonder what exactly the rationale for generator expressions is. The PEP says that "time, clarity, and memory are conserved by using an generator expression" but I can only see how memory is conserved. That is, I don't find them any easier to read than list comprehensions
They're not, although they can be more clear than code that defines helper generating functions to get some of the same memory benefits.
and I don't understand the performance implications very well. It's not obvious to me that their faster.
When you've got an iterator producing a billion elements, it becomes obvious at once <wink>. Really, *when* they're faster than listcomps, it's mostly a consequence of not creating in whole, then crawling over, a giant memory object. For short sequences, I expect listcomps are faster (and earlier timings have shown that). genexps require an additional frame suspend/resume per element, and while cheap (esp. compared to a function call) it's not free. For long sequences, avoiding the creation of a giant list becomes an overwhelming advantage.
Anyone know where to find these numbers. I've done some searching, but I can't find them. It would be good to include something concrete in the PEP.
This varies so wildly across platforms, timing procedure, and test case, that I expect concrete numbers would do more harm than good. The qualitative argument is easy to grasp. BTW, the most recent round of this was in the "genexps slow?" thread, started last month and spilling into April. Here's the start of it: http://mail.python.org/pipermail/python-dev/2004-March/043777.html
Jeremy Hylton wrote:
...
That leads me to wonder what exactly the rationale for generator expressions is. The PEP says that "time, clarity, and memory are conserved by using an generator expression" but I can only see how memory is conserved. That is, I don't find them any easier to read than list comprehensions and I don't understand the performance implications very well. It's not obvious to me that their faster.
I think that there is a robustness argument to be made as well. It is very common to run into programs that work really well with small data sets and then completely run out of steam with large ones. Sometimes the algorithm is at fault. But it could also be that some Python programs fail to scale primarily because of the wastefulness of listcomps. Whether or not this is true in practice, it is certainly true that when you write programs that are meant to scale, you must consider every listcomp and think whether the dataset is going to get large or not. If so, you must switch to some more obfuscated syntax. Paul Prescod
We need reports from people writing real code with generator expressions. It's hard to guess if we'll get enough substantial feedback in the alpha releases. The kind of issues we're dealing with usually manifest themselves in real programs rather than toy examples; not sure that an alpha will get that kind of use.
An interesting idea might be to hack the standard library replacing every occurrence of list comprehensions with generator expressions, and check the respective test cases and/or applications using them. -- Gustavo Niemeyer http://niemeyer.net
On Fri, Apr 23, 2004, Jeremy Hylton wrote:
That leads me to wonder what exactly the rationale for generator expressions is. The PEP says that "time, clarity, and memory are conserved by using an generator expression" but I can only see how memory is conserved. That is, I don't find them any easier to read than list comprehensions and I don't understand the performance implications very well. It's not obvious to me that their faster.
I've been skimming due to being out of town and catching up, but I haven't seen a direct response to Jeremy's question about the rationale. Jeremy, do you still want an answer? -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "I used to have a .sig but I found it impossible to please everyone..." --SFJ
That leads me to wonder what exactly the rationale for generator expressions is. The PEP says that "time, clarity, and memory are conserved by using an generator expression" but I can only see how memory is conserved. That is, I don't find them any easier to read than list comprehensions and I don't understand the performance implications very well. It's not obvious to me that their faster.
I've been skimming due to being out of town and catching up, but I haven't seen a direct response to Jeremy's question about the rationale. Jeremy, do you still want an answer?
I can see two potentially important cases where generator expressions win big over list comprehensions: 1) Where the code that is consuming the sequence yielded by the generator expression terminates before consuming the entire sequence; 2) Where the code that is consuming the sequence is an online algorithm, and there is a potential delay between generating elements of the sequence. It is easier to find an example of the second case than of the first: foo(bar(line) for line in sys.stdin) If foo expects a generator as its input: def foo(x): for line in x: print line then using a generator expression instead of a list comprehension will cause the program to print each line of output after reading the corresponding line of input, rather than consuming all the input and then printing all the output.
participants (8)
-
Aahz
-
Andrew Koenig
-
Gustavo Niemeyer
-
Jeremy Hylton
-
Michael Chermside
-
Paul Prescod
-
Phillip J. Eby
-
Tim Peters