Re: [Python-ideas] "Iteration stopping" syntax [Was: Is this PEP-able? for X in ListY while conditionZ:]
Yes, but it only works for generator expressions and not comprehensions. My opinion of that workaround is that it’s also a step backward in terms of readability. I suspect if i < 50 else stop() would probably also work, since it throws an exception. That’s better, IMHO. On Jun 28, 2013, at 6:38 PM, Andrew Carter <acarter@g.hmc.edu> wrote:
Digging through the archives (with a quick google search) http://mail.python.org/pipermail/python-ideas/2013-January/019051.html, if you really want an expression it seems you can just do
def stop(): raise StopIteration list(i for i in range(100) if i < 50 or stop())
it seems to me that this would provide syntax that doesn't require lambdas.
On Fri, Jun 28, 2013 at 4:50 PM, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
On Fri, Jun 28, 2013 at 7:38 PM, Shane Green <shane@umbrellacode.com> wrote: .. [x until condition for x in l ...] or [x for x in l until condition]
Just to throw in one more variation:
[expr for item in iterable break if condition]
(inversion of "if" and "break"reinforces the idea that we are dealing with an expression rather than a statement - compare with "a if cond else b")
_______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
Those things said, it is a workaround that works and was the culmination of a long discussion, so you’re absolutely right that it belongs in, and maybe finishes, this discussion. On Jun 28, 2013, at 6:50 PM, Shane Green <shane@umbrellacode.com> wrote:
Yes, but it only works for generator expressions and not comprehensions. My opinion of that workaround is that it’s also a step backward in terms of readability. I suspect
if i < 50 else stop() would probably also work, since it throws an exception. That’s better, IMHO.
On Jun 28, 2013, at 6:38 PM, Andrew Carter <acarter@g.hmc.edu> wrote:
Digging through the archives (with a quick google search) http://mail.python.org/pipermail/python-ideas/2013-January/019051.html, if you really want an expression it seems you can just do
def stop(): raise StopIteration list(i for i in range(100) if i < 50 or stop())
it seems to me that this would provide syntax that doesn't require lambdas.
On Fri, Jun 28, 2013 at 4:50 PM, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
On Fri, Jun 28, 2013 at 7:38 PM, Shane Green <shane@umbrellacode.com> wrote: .. [x until condition for x in l ...] or [x for x in l until condition]
Just to throw in one more variation:
[expr for item in iterable break if condition]
(inversion of "if" and "break"reinforces the idea that we are dealing with an expression rather than a statement - compare with "a if cond else b")
_______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
On Jun 28, 2013, at 18:50, Shane Green <shane@umbrellacode.com> wrote:
Yes, but it only works for generator expressions and not comprehensions.
This is the point if options #1 and 2: make StopIteration work in comps either (1) by redefining comprehensions in terms of genexps or (2) by fiat. After some research, it turns out that these are equivalent. Replacing any [comprehension] with list(comprehension) is guaranteed by the language (and the CPython implementation) to give you exactly the same value unless (a) something in the comp raises StopIteration, or (b) something in the comp relies on reflective properties (e.g., sys._getframe().f_code.co_flags) that aren't guaranteed anyway. So, other than being 4 characters more verbose and 40% slower, there's already an answer for comprehensions. And if either of those problems is unacceptable, a patch for #1 or #2 is actually pretty easy. I've got two different proof of concepts: one actually implements the comp as passing the genexp to list, the other just wraps everything after the BUILD_LIST and before the RETURN_VALUE in a the equivalent of try: ... except StopIteration: pass. I need to add some error handling to the C code, and for #2 write sufficient tests that verify that it really does work exactly like #1, but I should have working patches to play with in a couple days.
My opinion of that workaround is that it’s also a step backward in terms of readability. I suspect.
if i < 50 else stop() would probably also work, since it throws an exception. That’s better, IMHO.
On Jun 28, 2013, at 6:38 PM, Andrew Carter <acarter@g.hmc.edu> wrote:
Digging through the archives (with a quick google search) http://mail.python.org/pipermail/python-ideas/2013-January/019051.html, if you really want an expression it seems you can just do
def stop(): raise StopIteration list(i for i in range(100) if i < 50 or stop())
it seems to me that this would provide syntax that doesn't require lambdas.
On Fri, Jun 28, 2013 at 4:50 PM, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
On Fri, Jun 28, 2013 at 7:38 PM, Shane Green <shane@umbrellacode.com> wrote:
.. [x until condition for x in l ...] or [x for x in l until condition]
Just to throw in one more variation:
[expr for item in iterable break if condition]
(inversion of "if" and "break"reinforces the idea that we are dealing with an expression rather than a statement - compare with "a if cond else b")
_______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
_______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
Thanks Andrew. My knee jerk reaction was to strongly prefer option two, which sounds like–if I understood correctly, and I’m not sure I do–it keeps both comprehensions and expressions. Rereading your points again, I must admit I didn’t see much to justify the knee jerk reaction. I do commonly use list comprehensions precisely *because* of the performance impact, and can think of a few places the 40% would be problematic. Was there a measurable performance difference with approach 2? On Jun 28, 2013, at 8:16 PM, Andrew Barnert <abarnert@yahoo.com> wrote:
On Jun 28, 2013, at 18:50, Shane Green <shane@umbrellacode.com> wrote:
Yes, but it only works for generator expressions and not comprehensions.
This is the point if options #1 and 2: make StopIteration work in comps either (1) by redefining comprehensions in terms of genexps or (2) by fiat.
After some research, it turns out that these are equivalent. Replacing any [comprehension] with list(comprehension) is guaranteed by the language (and the CPython implementation) to give you exactly the same value unless (a) something in the comp raises StopIteration, or (b) something in the comp relies on reflective properties (e.g., sys._getframe().f_code.co_flags) that aren't guaranteed anyway.
So, other than being 4 characters more verbose and 40% slower, there's already an answer for comprehensions.
And if either of those problems is unacceptable, a patch for #1 or #2 is actually pretty easy.
I've got two different proof of concepts: one actually implements the comp as passing the genexp to list, the other just wraps everything after the BUILD_LIST and before the RETURN_VALUE in a the equivalent of try: ... except StopIteration: pass. I need to add some error handling to the C code, and for #2 write sufficient tests that verify that it really does work exactly like #1, but I should have working patches to play with in a couple days.
My opinion of that workaround is that it’s also a step backward in terms of readability. I suspect.
if i < 50 else stop() would probably also work, since it throws an exception. That’s better, IMHO.
On Jun 28, 2013, at 6:38 PM, Andrew Carter <acarter@g.hmc.edu> wrote:
Digging through the archives (with a quick google search) http://mail.python.org/pipermail/python-ideas/2013-January/019051.html, if you really want an expression it seems you can just do
def stop(): raise StopIteration list(i for i in range(100) if i < 50 or stop())
it seems to me that this would provide syntax that doesn't require lambdas.
On Fri, Jun 28, 2013 at 4:50 PM, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
On Fri, Jun 28, 2013 at 7:38 PM, Shane Green <shane@umbrellacode.com> wrote: .. [x until condition for x in l ...] or [x for x in l until condition]
Just to throw in one more variation:
[expr for item in iterable break if condition]
(inversion of "if" and "break"reinforces the idea that we are dealing with an expression rather than a statement - compare with "a if cond else b")
_______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
_______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
From: Shane Green <shane@umbrellacode.com> Sent: Saturday, June 29, 2013 5:10 AM
Thanks Andrew. My knee jerk reaction was to strongly prefer option two, which sounds like–if I understood correctly, and I’m not sure I do–it keeps both comprehensions and expressions. Rereading your points again, I must admit I didn’t see much to justify the knee jerk reaction.
Sorry, it's my fault for conflating two independent choices. Let me refactor things: Nobody's talking about getting rid of comprehensions. Today, you have the choice to write [comp] instead of list(comp), and that won't change. However, today these differ in that the latter lets you exit early with StopIteration, while the former doesn't. That's what's proposed to change. Choice 1 is about the language definition. With this change, there is no remaining difference between genexps except for returning a list vs. an iterator. That means we could simplify things by defining the two concepts together (or defining one in terms of the other). I don't see any downside to doing that. Choice 2 is about the CPython implementation. We can reimplement comprehensions as wrappers around genexps, or we can just add a try…except into comprehensions. The former would simplify the compiler and the interpreter, but at a cost of up to 40% for comprehensions. The latter would leave things no simpler than they are today, but also no slower. Once put this way, I think the choices are obvious: Simplify the language, don't simplify the implementation.
I do commonly use list comprehensions precisely *because* of the performance impact, and can think of a few places the 40% would be problematic.
Usually that's a premature optimization. For anything simple enough that the iteration cost isn't swamped by your real work, the performance usually doesn't matter anyway. But "usually" isn't always, and there definitely are real-world cases where it would hurt.
Was there a measurable performance difference with approach 2?
Once I realized that the right place to put the try is just outside the loop… that makes it obvious that there is no per-iteration cost, only a constant cost. If you don't raise an exception through a listcomp, that cost is basically running one more opcode and loading a few more bytes into memory. It adds less than 1% for even a trivial comp that loops 10 times, or for a realistic but still simple comp that loops 3 times. I'll post actual numbers for local tests and for benchmarks once I get things finished (hopefully Monday).
On Jun 28, 2013, at 8:16 PM, Andrew Barnert <abarnert@yahoo.com> wrote:
On Jun 28, 2013, at 18:50, Shane Green <shane@umbrellacode.com> wrote:
Yes, but it only works for generator expressions and not comprehensions.
This is the point if options #1 and 2: make StopIteration work in comps either (1) by redefining comprehensions in terms of genexps or (2) by fiat.
After some research, it turns out that these are equivalent. Replacing any [comprehension] with list(comprehension) is guaranteed by the language (and the CPython implementation) to give you exactly the same value unless (a) something in the comp raises StopIteration, or (b) something in the comp relies on reflective properties (e.g., sys._getframe().f_code.co_flags) that aren't guaranteed anyway.
So, other than being 4 characters more verbose and 40% slower, there's already an answer for comprehensions.
And if either of those problems is unacceptable, a patch for #1 or #2 is actually pretty easy.
I've got two different proof of concepts: one actually implements the comp as passing the genexp to list, the other just wraps everything after the BUILD_LIST and before the RETURN_VALUE in a the equivalent of try: ... except StopIteration: pass. I need to add some error handling to the C code, and for #2 write sufficient tests that verify that it really does work exactly like #1, but I should have working patches to play with in a couple days.
My opinion of that workaround is that it’s also a step backward in terms of readability. I suspect.
if i < 50 else stop() would probably also work, since it throws an exception. That’s better, IMHO.
On Jun 28, 2013, at 6:38 PM, Andrew Carter <acarter@g.hmc.edu> wrote:
Digging through the archives (with a quick google search) http://mail.python.org/pipermail/python-ideas/2013-January/019051.html, if you really want an expression it seems you can just do
def stop(): raise StopIteration list(i for i in range(100) if i < 50 or stop())
it seems to me that this would provide syntax that doesn't require lambdas.
On Fri, Jun 28, 2013 at 4:50 PM, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
On Fri, Jun 28, 2013 at 7:38 PM, Shane Green <shane@umbrellacode.com> wrote:
..
[x until condition for x in l ...] or [x for x in l until condition]
Just to throw in one more variation:
[expr for item in iterable break if condition]
(inversion of "if" and "break"reinforces the idea that we are dealing with an expression rather than a statement - compare with "a if cond else b") _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
_______________________________________________
Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
On 30 June 2013 14:53, Andrew Barnert <abarnert@yahoo.com> wrote:
If you don't raise an exception through a listcomp, that cost is basically running one more opcode and loading a few more bytes into memory. It adds less than 1% for even a trivial comp that loops 10 times, or for a realistic but still simple comp that loops 3 times.
Comprehensions translate to inline loops. The fact that the "stop()" hack works for generator expressions is just a quirky result of the line between "return" and "raise StopIteration" in a generator function being extraordinarily blurry - it has nothing to with breaking out of a loop. That's why the stop() hack terminates a nested genexp completely, rather than just breaking out of the innermost loop:
def stop(): ... raise StopIteration ... list((x, y) for x in range(10) for y in range(10) if y < 3 or stop()) [(0, 0), (0, 1), (0, 2)]
def greturn(): ... for x in range(10): ... for y in range(10): ... if y >= 3: return ... yield x, y ... list(greturn()) [(0, 0), (0, 1), (0, 2)]
def graise(): ... for x in range(10): ... for y in range(10): ... if y >= 3: raise StopIteration ... yield x, y ... list(graise()) [(0, 0), (0, 1), (0, 2)]
Note how all three produce the same output, and how both loops terminate immediately when the return/raise is encountered. You can't get the same semantics for other comprehensions without introducing the yield/return distinction, which is astonishingly slow by comparison (as you need to suspend and resume the generator frame on each iteration). The overhead of that is only worth it when the cost of having the entire result in memory at the same time is prohibitive. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 06/30/2013 12:45 AM, Nick Coghlan wrote:
You can't get the same semantics for other comprehensions without introducing the yield/return distinction, which is astonishingly slow by comparison (as you need to suspend and resume the generator frame on each iteration). The overhead of that is only worth it when the cost of having the entire result in memory at the same time is prohibitive.
It seems to me that if we were to rewrite comprehensions as generators yielding into the container. the generator would be private. It can't be taken out and reused, never needs to be suspended, and always yields to the same container. Because it is private, it can be altered to make it more efficient by removing the unneeded parts and/or substituting alternative byte code that does the same thing in a faster way. Wouldn't they then be equivalent to the current comprehensions with the generator exception handling added? Cheers, Ron
From: Ron Adam <ron3200@gmail.com> Sent: Sunday, June 30, 2013 12:01 AM
On 06/30/2013 12:45 AM, Nick Coghlan wrote:
You can't get the same semantics for other comprehensions without introducing the yield/return distinction, which is astonishingly slow by comparison (as you need to suspend and resume the generator frame on each iteration). The overhead of that is only worth it when the cost of having the entire result in memory at the same time is prohibitive.
It seems to me that if we were to rewrite comprehensions as generators yielding into the container. the generator would be private. It can't be taken out and reused, never needs to be suspended, and always yields to the same container.
Because it is private, it can be altered to make it more efficient by removing the unneeded parts and/or substituting alternative byte code that does the same thing in a faster way.
I tried to come up with an efficient way to do that, but failed—and learned a lot from my failure. See http://stupidpythonideas.blogspot.com/2013/06/can-you-optimize-listgenexp.ht... for the gory details, but I'll summarize here. Ultimately, it's the generator feeding a separate FOR_ITER that makes the StopIteration work. (Raising StopIteration in a generator basically feeds a value to the calling FOR_ITER, just like returning or yielding.) But that's exactly the part that makes it slow. Optimizing out the other stuff (mainly the list call) doesn't do much good; you need some way to inline the generator into the outer FOR_ITER so you can just jump back and forth instead of suspending and resuming a generator. Which means adding new FAST_FOR_ITER, FAST_YIELD, and FAST_RETURN opcodes. Making that work in general is very hard (you need some way to let FAST_FOR_ITER know whether the inlined generator did a yield, a return, or an exception), but in the special case where you don't care about the return value and the generator doesn't do anything useful after yielding and doesn't return anything useful (which is the case with an inlined genexpr), it's doable. So, you can directly use the genexpr compiler code for comps and scrap the existing comp-specific code… but you have to make the genexpr code more complicated and do if (genexpr) else in multiple places to make it inlinable, add in a comp-specific wrapper that's longer and more complicated than the code you scrapped, and add 3 new opcodes. Definitely not a win for simplicity. And if you trace through what it's doing, it ends up as just a tangled-up version of the exception-handling listcomp function—exactly equivalent, but with a whole bunch of extra jumping around to slow things down. You can optimize it by straightening everything out, but then you're not sharing existing code anymore—and in fact, when you're done, you've just added a SETUP_EXCEPT to the listcomp code. It's possible that I missed something stupid that would offer a better way to do this, but I think trying to directly optimize list(genexpr) by inlining a "private generator" ends up being just a much more complicated way to add StopIteration handling to the listcomp code. So, you might as well do the easier one.
From: Nick Coghlan <ncoghlan@gmail.com> Sent: Saturday, June 29, 2013 10:45 PM
On 30 June 2013 14:53, Andrew Barnert <abarnert@yahoo.com> wrote:
If you don't raise an exception through a listcomp, that cost is basically running one more opcode and loading a few more bytes into memory. It adds less than 1% for even a trivial comp that loops 10 times, or for a realistic but still simple comp that loops 3 times.
Comprehensions translate to inline loops. The fact that the "stop()" hack works for generator expressions is just a quirky result of the line between "return" and "raise StopIteration" in a generator function being extraordinarily blurry - it has nothing to with breaking out of a loop.
I think it's a pretty obvious consequence of the fact that a genexp builds an iterator, and raising StopIteration stops any iterator. And that's why the stop() hack terminates a nested genexp completely rather than just one loop: because it's the entire genexp that's an iterator, not each individual nested clause.
You can't get the same semantics for other comprehensions without
introducing the yield/return distinction, which is astonishingly slow by comparison (as you need to suspend and resume the generator frame on each iteration).
Sure you can. All you have to do is put a try/except around the outer loop. Taking your example: [(x, y) for x in range(10) for y in range(10) if y < 3 or stop()] This would now map to: a = [] try: for x in range(10): for y in range(10): if y < 3 or stop(): a.append((x, y)) except StopIteration: pass And that change is enough to give it _exactly_ the same semantics as list((x, y) for x in range(10) for y in range(10) if y < 3 or stop()) … but without the 40% performance hit from the yields. The only overhead added is a single SETUP_EXCEPT. But really, despite the origin of the idea, my reason for liking it has little to do with iteration stopping, and much more to do with making the language simpler. The key is that this change would mean that [comp] becomes semantically and behaviorally identical to list(genexp), except for being 40% faster. Put another way: I think the language would be better if, instead of documenting comprehensions and generator expressions largely independently and in parallel, we simply said that [comp] has the same effect as list(genexp) except that if you raise StopIteration with a comp it passes through. (The fact that this is true today is by no means obvious, but that's part of the problem I want to solve, not an argument against solving it.) If we removed that one difference, it would be even simpler. I don't think the difference buys us anything, and the cost of eliminating it is a relatively simple patch with minimal performance impact. (As a minor side benefit, that would also mean you could use the StopIteration hack in comps, but I still don't think we'd want to recommend doing that.)
On 30 June 2013 17:26, Andrew Barnert <abarnert@yahoo.com> wrote:
If we removed that one difference, it would be even simpler. I don't think the difference buys us anything, and the cost of eliminating it is a relatively simple patch with minimal performance impact. (As a minor side benefit, that would also mean you could use the StopIteration hack in comps, but I still don't think we'd want to recommend doing that.)
An interesting rationale, especially along with your reply to Ron about how much simpler that approach is than attempting to optimize list(genexp), while still yielding a semantically equivalent end result. It still raises my "behavioural change without adequate justification" hackles, but I'm only -0 now, whereas I was definitely -1 earlier. It's definitely a much smaller change than the scoping one we introduced in the Python 3 migration, which had the dual motivation of moving the iteration variable into a private scope, and better aligning the other semantics of "[x for x in y]" vs "list(x for x in y)". Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 06/30/2013 06:32 AM, Nick Coghlan wrote:
If we removed that one difference, it would be even simpler. I don't think the difference buys us anything, and the cost of eliminating it is a relatively simple patch with minimal performance impact. (As a minor side benefit, that would also mean you could use the StopIteration hack in comps, but I still don't think we'd want to recommend doing that.) An interesting rationale, especially along with your reply to Ron about how much simpler that approach is than attempting to optimize
On 30 June 2013 17:26, Andrew Barnert<abarnert@yahoo.com> wrote: list(genexp), while still yielding a semantically equivalent end result.
It still raises my "behavioural change without adequate justification" hackles, but I'm only -0 now, whereas I was definitely -1 earlier. It's definitely a much smaller change than the scoping one we introduced in the Python 3 migration, which had the dual motivation of moving the iteration variable into a private scope, and better aligning the other semantics of "[x for x in y]" vs "list(x for x in y)".
Yes, that was also my point. It's the same as in-lineing the generator parts into the iterator that is driving it. We don't need to do that because we already have an optimised version of that. It just needs to catch the StopIteration to be the same. I'm +1 for this. I still would like to see actual time comparisons, and have it pass pythons test suit. I don't think there would be any issues with either of those. I think that it's not uncommon for people to think this is how list comps work. And I think it is surprising for them that the StopIteration isn't caught. Cheers, Ron
I apologize, this thread was too long for me to follow. Is the issue the following?
def stopif(x): ... if x: raise StopIteration ... return True ... [i for i in range(10) if stopif(i==3)] Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 1, in <listcomp> File "<stdin>", line 2, in stopif StopIteration list(i for i in range(10) if stopif(i==3)) [0, 1, 2]
I.e. the difference between list(<genexp>) and [<genexp>] is that if <genexp> raises StopIteration, list(...) returns the elements up to that point but [...] passes the exception out? That seems a bug to me inherited from the Python 2 implementation of list comprehensions and I'm fine with fixing it in 3.4. The intention of the changes to comprehensions in Python 3 was that these two forms would be completely equivalent. The difficulty has always been that CPython comprehensions were traditionally faster than generator expressions and we're reluctant to give that up. But it's still a bug. -- --Guido van Rossum (python.org/~guido)
On 30 June 2013 21:52, Guido van Rossum <guido@python.org> wrote:
I apologize, this thread was too long for me to follow. Is the issue the following?
def stopif(x): ... if x: raise StopIteration ... return True ... [i for i in range(10) if stopif(i==3)] Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 1, in <listcomp> File "<stdin>", line 2, in stopif StopIteration list(i for i in range(10) if stopif(i==3)) [0, 1, 2]
I.e. the difference between list(<genexp>) and [<genexp>] is that if <genexp> raises StopIteration, list(...) returns the elements up to that point but [...] passes the exception out?
That seems a bug to me inherited from the Python 2 implementation of list comprehensions and I'm fine with fixing it in 3.4. The intention of the changes to comprehensions in Python 3 was that these two forms would be completely equivalent. The difficulty has always been that CPython comprehensions were traditionally faster than generator expressions and we're reluctant to give that up. But it's still a bug.
But which way is the bug? Personally, the list comprehension has it right. I'd prefer if (raise_stopiteration() for _ in [0]) actually had the StopIteration fall through.
On 1 Jul 2013 07:01, "Guido van Rossum" <guido@python.org> wrote:
I apologize, this thread was too long for me to follow. Is the issue the following?
def stopif(x): ... if x: raise StopIteration ... return True ... [i for i in range(10) if stopif(i==3)] Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 1, in <listcomp> File "<stdin>", line 2, in stopif StopIteration list(i for i in range(10) if stopif(i==3)) [0, 1, 2]
I.e. the difference between list(<genexp>) and [<genexp>] is that if <genexp> raises StopIteration, list(...) returns the elements up to that point but [...] passes the exception out?
That seems a bug to me inherited from the Python 2 implementation of list comprehensions and I'm fine with fixing it in 3.4. The intention of the changes to comprehensions in Python 3 was that these two forms would be completely equivalent. The difficulty has always been that CPython comprehensions were traditionally faster than generator expressions and we're reluctant to give that up. But it's still a bug.
Yep, and Andrew pointed out the overhead of fixing it is actually quite low - we just have to tweak comprehensions to wrap the entire loop in a try/except that ignores the StopIteration exception. That brings them into line with the generator form where it doesn't matter if the exception comes directly from the generator code or is raised by the interpreter due to the frame terminating, the loop implicit in the list call will treat it as indicating the end of the generator. Cheers, Nick.
-- --Guido van Rossum (python.org/~guido) _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
On Sun, Jun 30, 2013 at 3:28 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 1 Jul 2013 07:01, "Guido van Rossum" <guido@python.org> wrote:
I apologize, this thread was too long for me to follow. Is the issue the following?
def stopif(x): ... if x: raise StopIteration ... return True ... [i for i in range(10) if stopif(i==3)] Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 1, in <listcomp> File "<stdin>", line 2, in stopif StopIteration list(i for i in range(10) if stopif(i==3)) [0, 1, 2]
I.e. the difference between list(<genexp>) and [<genexp>] is that if <genexp> raises StopIteration, list(...) returns the elements up to that point but [...] passes the exception out?
That seems a bug to me inherited from the Python 2 implementation of list comprehensions and I'm fine with fixing it in 3.4. The intention of the changes to comprehensions in Python 3 was that these two forms would be completely equivalent. The difficulty has always been that CPython comprehensions were traditionally faster than generator expressions and we're reluctant to give that up. But it's still a bug.
Yep, and Andrew pointed out the overhead of fixing it is actually quite low - we just have to tweak comprehensions to wrap the entire loop in a try/except that ignores the StopIteration exception. That brings them into line with the generator form where it doesn't matter if the exception comes directly from the generator code or is raised by the interpreter due to the frame terminating, the loop implicit in the list call will treat it as indicating the end of the generator.
Thanks, sounds good. -- --Guido van Rossum (python.org/~guido)
Ron Adam wrote:
It's the same as in-lineing the generator parts into the iterator that is driving it.
We don't need to do that because we already have an optimised version of that. It just needs to catch the StopIteration to be the same.
I think that it's not uncommon for people to think this is how list comps work. And I think it is surprising for them that the StopIteration isn't caught.
I tend to feel that the fact that raising StopIteration in a generator has the same effect as returning from the generator is a quirk of the implementation that shouldn't be relied on. I'm not sure we should be giving it official status by going out of our way to make listcomps behave the same. -- Greg
On Jun 30, 2013, at 15:08, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Ron Adam wrote:
It's the same as in-lineing the generator parts into the iterator that is driving it. We don't need to do that because we already have an optimised version of that. It just needs to catch the StopIteration to be the same. I think that it's not uncommon for people to think this is how list comps work. And I think it is surprising for them that the StopIteration isn't caught.
I tend to feel that the fact that raising StopIteration in a generator has the same effect as returning from the generator is a quirk of the implementation that shouldn't be relied on. I'm not sure we should be giving it official status by going out of our way to make listcomps behave the same.
What other effect could it possibly have? The genexp doesn't do anything special with a StopIteration--it passes it through like any other exception. And this is the same as for an explicit generator function. The calling code--whether it's calling next(), using a for loop, or iterating in C--sees StopIteration for a generator return by definition, and sees StopIteration if explicitly raised because that's how exceptions and generators work. Unless you add an extra new rule that says raising StopIteration inside a generator is illegal and will raise a TypeError or something, there's no other way a valid implementation of Python could possibly work. I think a lot of people are looking at this wrong. In list(genexp), it's not the genexp part that's doing anything here; it's the list function. There's no way that it could distinguish between an explicit StopIteration and an implicit one, so it treats them the same way. If a comprehension is supposed to be the same as list(genexp) it should act like list--or like any other iteration mechanism you can write in Python or even in C--rather than acting magically as it currently does.
On 30 June 2013 23:24, Andrew Barnert <abarnert@yahoo.com> wrote:
I think a lot of people are looking at this wrong. In list(genexp), it's not the genexp part that's doing anything here; it's the list function. There's no way that it could distinguish between an explicit StopIteration and an implicit one, so it treats them the same way. If a comprehension is supposed to be the same as list(genexp) it should act like list--or like any other iteration mechanism you can write in Python or even in C--rather than acting magically as it currently does.
Damnit, you're so obviously right. Yeah, fine. Whatever.
On 01/07/13 08:08, Greg Ewing wrote:
Ron Adam wrote:
It's the same as in-lineing the generator parts into the iterator that is driving it.
We don't need to do that because we already have an optimised version of that. It just needs to catch the StopIteration to be the same.
I think that it's not uncommon for people to think this is how list comps work. And I think it is surprising for them that the StopIteration isn't caught.
I tend to feel that the fact that raising StopIteration in a generator has the same effect as returning from the generator is a quirk of the implementation that shouldn't be relied on. I'm not sure we should be giving it official status by going out of our way to make listcomps behave the same.
Raising StopIteration has been one of the two official ways to halt a generator, and has been since they were introduced. I nearly wrote "has been documented as..." except it seems to me that it has never been explicitly stated in the docs or the PEP. The closest I can find is this part of the PEP that *implies*, without actually coming right out and saying so, that raising StopIteration is the official way to halt a generator: [quote] Q. Why allow "return" at all? Why not force termination to be spelled "raise StopIteration"? A. The mechanics of StopIteration are low-level details, much like the mechanics of IndexError in Python 2.1: the implementation needs to do *something* well-defined under the covers, and Python exposes these mechanisms for advanced users. That's not an argument for forcing everyone to work at that level, though. "return" means "I'm done" in any kind of function, and that's easy to explain and to use. Note that "return" isn't always equivalent to "raise StopIteration" in try/except construct, either (see the "Specification: Return" section). [end quote] http://www.python.org/dev/peps/pep-0255/ Generally the PEP talks about how generators which have already been stopped (due to some unhandled exception, or a return) will raise StopIteration, but otherwise emphasize using return rather than StopIteration. The What's New for 2.2 does explicitly say you can raise StopIteration to halt a generator, but almost as an afterthought. In contrast, StopIteration is explicitly stated to be the way to signal that an iterator is halted. http://docs.python.org/3.4/whatsnew/2.2.html#pep-234-iterators So I think the missing piece is that generators are actually iterators. Since raising StopIteration is the official way to halt an iterator, it's also the (or at least, an) official way to halt a generator, and not a quirk of the implementation. -- Steven
On 1 July 2013 10:59, Steven D'Aprano <steve@pearwood.info> wrote:
So I think the missing piece is that generators are actually iterators. Since raising StopIteration is the official way to halt an iterator, it's also the (or at least, an) official way to halt a generator, and not a quirk of the implementation.
Yeah, the reason Andrew's proposed fix to the comprehension semantics makes sense is the fact that exactly *where* StopIteration gets raised during a "__next__" invocation is supposed to be completely opaque from the point of view of the iterator protocol: while True: try: x = next(itr): except StopIteration: break # Process x... The caught "StopIteration" could come from: - a generator iterator frame terminating - a generator iterator explicitly raising StopIteration - a sequence iterator triggering IndexError - a sentinel iterator noticing the sentinel value - any other __next__ method raising StopIteration When I did the conversion to "make [x for x in y] merely an optimised version of list(x for x in y)" change for Python 3, I know I missed the fact that part of that change involved moving the evaluation of all of the subexpressions inside the implicit try/except that is part of the iterator protocol, and I don't recall anyone else bringing it up either. Even if it did come up, we must have dismissed it as introducing too much overhead to set up the almost-certainly-unnecessary try/except for each iteration. Fortunately, Andrew is right that we can avoid that overhead and use a single try/except to cover the whole comprehension, which is a nice and cheap change. Cheers, Nick.
On Jun 29, 2013, at 9:53 PM, Andrew Barnert <abarnert@yahoo.com> wrote:
From: Shane Green <shane@umbrellacode.com> Sent: Saturday, June 29, 2013 5:10 AM
Thanks Andrew. My knee jerk reaction was to strongly prefer option two, which sounds like–if I understood correctly, and I’m not sure I do–it keeps both comprehensions and expressions. Rereading your points again, I must admit I didn’t see much to justify the knee jerk reaction.
Sorry, it's my fault for conflating two independent choices. Let me refactor things:
Nobody's talking about getting rid of comprehensions. Today, you have the choice to write [comp] instead of list(comp), and that won't change. However, today these differ in that the latter lets you exit early with StopIteration, while the former doesn't. That's what's proposed to change.
Choice 1 is about the language definition. With this change, there is no remaining difference between genexps except for returning a list vs. an iterator. That means we could simplify things by defining the two concepts together (or defining one in terms of the other). I don't see any downside to doing that.
Choice 2 is about the CPython implementation. We can reimplement comprehensions as wrappers around genexps, or we can just add a try…except into comprehensions. The former would simplify the compiler and the interpreter, but at a cost of up to 40% for comprehensions. The latter would leave things no simpler than they are today, but also no slower.
Once put this way, I think the choices are obvious: Simplify the language, don't simplify the implementation.
I do commonly use list comprehensions precisely *because* of the performance impact, and can think of a few places the 40% would be problematic.
Usually that's a premature optimization.
This kind of implies that it’s likely my use of comprehensions was premature, and therefore detracts from the validity of my usage. I’ve been releasing building management frameworks built using Python since 1.5.2. When you implement like the BACnet stack, caches, schedulers, property-level access control, etc.–and many things we would not have needed to develop given if we’d needed them now–in Python to run on embedded devices, you learn exactly where your bottlenecks are. Exhibiting performance benefits similar to map and filter, a well placed loop replacement could sometimes be the difference between needing to migrate some implementation to C or not. In our processor bound system it is likely the gains were often greeter than 40%, but the bottom line is that comprehensions are sometimes introduced during performance enhancing refactors, and that those examples would be particularly hard hit by a performance loss just like they benefited from the enhancement. So it’s not a decision that should be made lightly.
For anything simple enough that the iteration cost isn't swamped by your real work, the performance usually doesn't matter anyway.
But "usually" isn't always, and there definitely are real-world cases where it would hurt.
Was there a measurable performance difference with approach 2?
Once I realized that the right place to put the try is just outside the loop… that makes it obvious that there is no per-iteration cost, only a constant cost.
Oh yeah, don’t want to setup the try/catch on every iteration.
If you don't raise an exception through a listcomp, that cost is basically running one more opcode and loading a few more bytes into memory. It adds less than 1% for even a trivial comp that loops 10 times, or for a realistic but still simple comp that loops 3 times.
That’s excellent.
I'll post actual numbers for local tests and for benchmarks once I get things finished (hopefully Monday).
On Jun 28, 2013, at 8:16 PM, Andrew Barnert <abarnert@yahoo.com> wrote:
On Jun 28, 2013, at 18:50, Shane Green <shane@umbrellacode.com> wrote:
Yes, but it only works for generator expressions and not comprehensions.
This is the point if options #1 and 2: make StopIteration work in comps either (1) by redefining comprehensions in terms of genexps or (2) by fiat.
After some research, it turns out that these are equivalent. Replacing any [comprehension] with list(comprehension) is guaranteed by the language (and the CPython implementation) to give you exactly the same value unless (a) something in the comp raises StopIteration, or (b) something in the comp relies on reflective properties (e.g., sys._getframe().f_code.co_flags) that aren't guaranteed anyway.
So, other than being 4 characters more verbose and 40% slower, there's already an answer for comprehensions.
And if either of those problems is unacceptable, a patch for #1 or #2 is actually pretty easy.
I've got two different proof of concepts: one actually implements the comp as passing the genexp to list, the other just wraps everything after the BUILD_LIST and before the RETURN_VALUE in a the equivalent of try: ... except StopIteration: pass. I need to add some error handling to the C code, and for #2 write sufficient tests that verify that it really does work exactly like #1, but I should have working patches to play with in a couple days.
My opinion of that workaround is that it’s also a step backward in terms of readability. I suspect.
if i < 50 else stop() would probably also work, since it throws an exception. That’s better, IMHO.
On Jun 28, 2013, at 6:38 PM, Andrew Carter <acarter@g.hmc.edu> wrote:
Digging through the archives (with a quick google search) http://mail.python.org/pipermail/python-ideas/2013-January/019051.html, if you really want an expression it seems you can just do
def stop(): raise StopIteration list(i for i in range(100) if i < 50 or stop())
it seems to me that this would provide syntax that doesn't require lambdas.
On Fri, Jun 28, 2013 at 4:50 PM, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
On Fri, Jun 28, 2013 at 7:38 PM, Shane Green <shane@umbrellacode.com> wrote:
.. > [x until condition for x in l ...] or > [x for x in l until condition]
Just to throw in one more variation:
[expr for item in iterable break if condition]
(inversion of "if" and "break"reinforces the idea that we are dealing with an expression rather than a statement - compare with "a if cond else b") _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
_______________________________________________
Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
On 06/28/2013 10:16 PM, Andrew Barnert wrote:
On Jun 28, 2013, at 18:50, Shane Green <shane@umbrellacode.com <mailto:shane@umbrellacode.com>> wrote:
Yes, but it only works for generator expressions and not comprehensions.
This is the point if options #1 and 2: make StopIteration work in comps either (1) by redefining comprehensions in terms of genexps or (2) by fiat.
After some research, it turns out that these are equivalent. Replacing any [comprehension] with list(comprehension) is guaranteed by the language (and the CPython implementation) to give you exactly the same value unless (a) something in the comp raises StopIteration, or (b) something in the comp relies on reflective properties (e.g., sys._getframe().f_code.co_flags) that aren't guaranteed anyway.
So, other than being 4 characters more verbose and 40% slower, there's already an answer for comprehensions.
Right.. Any solution also must not slow down the existing simpler cases. For those who haven't looked at the C code yet, there is this comment there. /* List and set comprehensions and generator expressions work by creating a nested function to perform the actual iteration. This means that the iteration variables don't leak into the current scope. The defined function is called immediately following its definition, with the result of that call being the result of the expression. The LC/SC version returns the populated container, while the GE version is flagged in symtable.c as a generator, so it returns the generator object when the function is called. This code *knows* that the loop cannot contain break, continue, or return, so it cheats and skips the SETUP_LOOP/POP_BLOCK steps used in normal loops. Possible cleanups: - iterate over the generator sequence instead of using recursion */ I don't know how much the SETUP_LOOP/POP_BLOCK costs in time. It probably only makes a big difference in nested cases.
And if either of those problems is unacceptable, a patch for #1 or #2 is actually pretty easy.
I've got two different proof of concepts: one actually implements the comp as passing the genexp to list, the other just wraps everything after the BUILD_LIST and before the RETURN_VALUE in a the equivalent of try: ... except StopIteration: pass. I need to add some error handling to the C code, and for #2 write sufficient tests that verify that it really does work exactly like #1, but I should have working patches to play with in a couple days.
My opinion of that workaround is that it’s also a step backward in terms of readability. I suspect.
if i < 50 else stop() would probably also work, since it throws an exception. That’s better, IMHO.
Once a function is added that is called on every iteration, then a regular for loop with a break (without the function call) will run quicker. I think what matters is that it's fast and is easy to explain. The first two examples here are the existing variations. The third case would be the added break case. (The exact spelling of the expression may be different.) # [x for x in seq] for x in iter: append x # LIST_APPEND byte code, not a method call # [x for x in seq if expr] for x in iter: if expr: append x # [x for x in seq if expr break] for x in iter: if expr: break append x The generator comps have YIELD_VALUE in place of LIST_APPEND, This last case is the simplest variation for an early exit. It only differs from the second case by having a BREAK_LOOP after the POP_JUMP_IF_FALSE instruction. Along with SETUP_LOOP and POP_BLOCK, before after the loops. I am curious about how many places in the library adding break to these would make a difference. If there isn't any, or only a few, then it's probably not needed. But then again, maybe it's worth a good before dismissing it. Cheers, Ron (* dis.dis seems to be adding some extra unneeded lines, a second, dead JUMP_ABSOLUTE to the top of the loop for case 2 above, and a "JUMP_FORWARD 0" in the third case. Seems odd, but these don't effect what we are talking about here.)
participants (8)
-
Andrew Barnert
-
Greg Ewing
-
Guido van Rossum
-
Joshua Landau
-
Nick Coghlan
-
Ron Adam
-
Shane Green
-
Steven D'Aprano