From: Ron Adam <ron3200@gmail.com> Sent: Sunday, June 30, 2013 12:01 AM
On 06/30/2013 12:45 AM, Nick Coghlan wrote:
You can't get the same semantics for other comprehensions without introducing the yield/return distinction, which is astonishingly slow by comparison (as you need to suspend and resume the generator frame on each iteration). The overhead of that is only worth it when the cost of having the entire result in memory at the same time is prohibitive.
It seems to me that if we were to rewrite comprehensions as generators yielding into the container. the generator would be private. It can't be taken out and reused, never needs to be suspended, and always yields to the same container.
Because it is private, it can be altered to make it more efficient by removing the unneeded parts and/or substituting alternative byte code that does the same thing in a faster way.
I tried to come up with an efficient way to do that, but failed—and learned a lot from my failure. See http://stupidpythonideas.blogspot.com/2013/06/can-you-optimize-listgenexp.ht... for the gory details, but I'll summarize here. Ultimately, it's the generator feeding a separate FOR_ITER that makes the StopIteration work. (Raising StopIteration in a generator basically feeds a value to the calling FOR_ITER, just like returning or yielding.) But that's exactly the part that makes it slow. Optimizing out the other stuff (mainly the list call) doesn't do much good; you need some way to inline the generator into the outer FOR_ITER so you can just jump back and forth instead of suspending and resuming a generator. Which means adding new FAST_FOR_ITER, FAST_YIELD, and FAST_RETURN opcodes. Making that work in general is very hard (you need some way to let FAST_FOR_ITER know whether the inlined generator did a yield, a return, or an exception), but in the special case where you don't care about the return value and the generator doesn't do anything useful after yielding and doesn't return anything useful (which is the case with an inlined genexpr), it's doable. So, you can directly use the genexpr compiler code for comps and scrap the existing comp-specific code… but you have to make the genexpr code more complicated and do if (genexpr) else in multiple places to make it inlinable, add in a comp-specific wrapper that's longer and more complicated than the code you scrapped, and add 3 new opcodes. Definitely not a win for simplicity. And if you trace through what it's doing, it ends up as just a tangled-up version of the exception-handling listcomp function—exactly equivalent, but with a whole bunch of extra jumping around to slow things down. You can optimize it by straightening everything out, but then you're not sharing existing code anymore—and in fact, when you're done, you've just added a SETUP_EXCEPT to the listcomp code. It's possible that I missed something stupid that would offer a better way to do this, but I think trying to directly optimize list(genexpr) by inlining a "private generator" ends up being just a much more complicated way to add StopIteration handling to the listcomp code. So, you might as well do the easier one.