PEP 380 alternative: A yielding function
Looking at PEP 380 (http://www.python.org/dev/peps/pep-0380/), the need for yield forwarding syntax comes from the inability to delegate yielding functionality in the usual way. For example, if you have a recurring pattern including yields, like this (this is a toy example, please don't take it for more than that): if a: yield a if b: yield b you cannot do the extract function refactoring in the usual way: def yield_if_true(x): if x: yield x yield_if_true(a) yield_if_true(b) because yield_if_true would become a generator. PEP 380 addresses this by making the workaround - "for x in yield_if_true(a): yield x" - easier to write. But suppose you could address the source instead? Suppose you could write yield_if_true in such a way that it did not become a generator despite yielding? Syntactically this could be done with a yielding *function* in addition to the yield statement/expression, either as a builtin or a function in the sys module. Let's call it 'yield_' , for lack of a better name. The function would yield the nearest generator on the call stack. Now the example would work with a slight modifiction: def yield_if_true(x): if x: yield_(x) yield_if_true(a) yield_if_true(b) The real benefits from a yield_ function come with recursive functions. A recursive tree traversal that yields from the leaf nodes currently suffers from a performance penalty: Every yield is repeated as many times as the depth of the tree, turning a O(n) traversal algorithm into an O(n lg(n)) algorithm. PEP 380 does not change that. But a yield_ function could be O(1), regardless of the forwarding depth. To be fair, a clever implementation might be able to short-circuit a 'yield from' chain and achieve the same thing. Two main drawbacks of yield_: - Difficulty of implementation. Generators would need to keep an entire stack branch alive, instead of merely a single frame, and if that somehow affects the efficiency of simple generators, that would be bad. - 'the nearest generator on the call stack' is sort of a dynamic scoping thing, which has its problems. For example, if you forget to make the relevant function a generator (the "if 0: yield None" trick might have been needed but was forgotten), then the yield would trickle up to some random generator higher up, with confusing results. And if you use yield_ in a callback, well, let's just say that an interesting case too. All the same, if a yield_ function is practical, I think it is a better way to address the problems that motivate PEP 380. I'm guessing you could implement 'yield from' as a pure-Python function using yield_, making yield_ strictly more powerful, although I couldn't say for sure as I haven't studied the enhanced generator protocol. regards, Anders
Have you really thought through the implementation in CPython? When a non-generator function is called, there's usually a C stack frame in the way which prevents yielding. --Guido On Tue, Jul 27, 2010 at 6:18 PM, Anders J. Munch <2010@jmunch.dk> wrote:
Looking at PEP 380 (http://www.python.org/dev/peps/pep-0380/), the need for yield forwarding syntax comes from the inability to delegate yielding functionality in the usual way. For example, if you have a recurring pattern including yields, like this (this is a toy example, please don't take it for more than that):
if a: yield a if b: yield b
you cannot do the extract function refactoring in the usual way:
def yield_if_true(x): if x: yield x yield_if_true(a) yield_if_true(b)
because yield_if_true would become a generator.
PEP 380 addresses this by making the workaround - "for x in yield_if_true(a): yield x" - easier to write.
But suppose you could address the source instead? Suppose you could write yield_if_true in such a way that it did not become a generator despite yielding?
Syntactically this could be done with a yielding *function* in addition to the yield statement/expression, either as a builtin or a function in the sys module. Let's call it 'yield_' , for lack of a better name. The function would yield the nearest generator on the call stack.
Now the example would work with a slight modifiction:
def yield_if_true(x): if x: yield_(x) yield_if_true(a) yield_if_true(b)
The real benefits from a yield_ function come with recursive functions. A recursive tree traversal that yields from the leaf nodes currently suffers from a performance penalty: Every yield is repeated as many times as the depth of the tree, turning a O(n) traversal algorithm into an O(n lg(n)) algorithm. PEP 380 does not change that.
But a yield_ function could be O(1), regardless of the forwarding depth.
To be fair, a clever implementation might be able to short-circuit a 'yield from' chain and achieve the same thing.
Two main drawbacks of yield_: - Difficulty of implementation. Generators would need to keep an entire stack branch alive, instead of merely a single frame, and if that somehow affects the efficiency of simple generators, that would be bad. - 'the nearest generator on the call stack' is sort of a dynamic scoping thing, which has its problems. For example, if you forget to make the relevant function a generator (the "if 0: yield None" trick might have been needed but was forgotten), then the yield would trickle up to some random generator higher up, with confusing results. And if you use yield_ in a callback, well, let's just say that an interesting case too.
All the same, if a yield_ function is practical, I think it is a better way to address the problems that motivate PEP 380. I'm guessing you could implement 'yield from' as a pure-Python function using yield_, making yield_ strictly more powerful, although I couldn't say for sure as I haven't studied the enhanced generator protocol.
regards, Anders
_______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
-- --Guido van Rossum (python.org/~guido)
Guido van Rossum wrote:
Have you really thought through the implementation in CPython?
Not at all.
When a non-generator function is called, there's usually a C stack frame in the way which prevents yielding.
Of course. I knew there'd be good reason why you didn't do a more full coroutine implementation originally, but on a beautiful summer day reckless optimism took over and it didn't come readily to mind. On platforms with coroutines available at the C level, I think something could be devised, something like keep a pool of coroutines and switch to a different one at strategic points. But I realise that requiring all platforms to support coroutines is a deal-breaker, so there. - Anders
Anders J. Munch wrote:
But suppose you could address the source instead? Suppose you could write yield_if_true in such a way that it did not become a generator despite yielding?
I don't see how this would work. The problem isn't that yield_if_true becomes a generator -- it's that the function calling yield_if_true *doesn't* become a generator, even though it needs to.
Let's call it 'yield_' , for lack of a better name. The function would yield the nearest generator on the call stack.
But if none of the calling functions have yields anywhere else, then they're just ordinary functions, and there is *no* generator on the call stack! -- Greg
What would this code do: def yield_if_true(x): if x: yield_(x) def maybe_yield(x): if calculate_property(x): yield_if_true(x) else: return None maybe_yield(5) When maybe_yield(5) is called different things need to happen depending on whether it’s a function or a generator. If it’s a generator, it shouldn’t execute calculate_property(x) yet, because generators don’t execute their contents until someone says next(maybe_yield(5)) (or maybe_yield(5).next() in Py2.x). On the other hand, if it’s not a generator but a function, then it should run calculate_property right away. You could try starting out as a function and then switching to being a generator if anything that you call has a call to yield_ inside of it, but that strikes me as extremely clumsy and complicated, and it could lead to unpleasant surprises if calculate_property has side-effects. ISTM there’s no way to do something like PEP-380 without requiring that some special keyword is used to indicate that the def is creating a generator and not creating a function. There other directions to take this besides yield from (for example, we could replace def with gen or some such and give return a special meaning inside a gen statement), but to try to do it without any kind of keyword at the site of the caller means violating "In the face of ambiguity, refuse the temptation to guess.” -- Carl Johnson On Tue, Jul 27, 2010 at 3:19 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Anders J. Munch wrote:
But suppose you could address the source instead? Suppose you could write yield_if_true in such a way that it did not become a generator despite yielding?
I don't see how this would work. The problem isn't that yield_if_true becomes a generator -- it's that the function calling yield_if_true *doesn't* become a generator, even though it needs to.
Let's call it 'yield_' , for lack of a better name. The function would yield the nearest generator on the call stack.
But if none of the calling functions have yields anywhere else, then they're just ordinary functions, and there is *no* generator on the call stack!
-- Greg _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
On Tue, Jul 27, 2010 at 9:38 PM, Carl M. Johnson <cmjohnson.mailinglist@gmail.com> wrote:
What would this code do:
def yield_if_true(x): if x: yield_(x)
def maybe_yield(x): if calculate_property(x): yield_if_true(x) else: return None
maybe_yield(5)
When maybe_yield(5) is called different things need to happen depending on whether it’s a function or a generator. If it’s a generator, it shouldn’t execute calculate_property(x) yet, because generators don’t execute their contents until someone says next(maybe_yield(5)) (or maybe_yield(5).next() in Py2.x). On the other hand, if it’s not a generator but a function, then it should run calculate_property right away. You could try starting out as a function and then switching to being a generator if anything that you call has a call to yield_ inside of it, but that strikes me as extremely clumsy and complicated, and it could lead to unpleasant surprises if calculate_property has side-effects.
ISTM there’s no way to do something like PEP-380 without requiring that some special keyword is used to indicate that the def is creating a generator and not creating a function. There other directions to take this besides yield from (for example, we could replace def with gen or some such and give return a special meaning inside a gen statement), but to try to do it without any kind of keyword at the site of the caller means violating "In the face of ambiguity, refuse the temptation to guess.”
Well, in a statically typed language the compiler could figure it out based on the type of the things being called -- "generator-ness" could be propagated by the type system just like "thows a certain expression" is in Java. -- --Guido van Rossum (python.org/~guido)
Carl M. Johnson wrote:
What would this code do:
def yield_if_true(x): if x: yield_(x)
def maybe_yield(x): if calculate_property(x): yield_if_true(x) else: return None
maybe_yield(5)
maybe_yield is not a generator, because it does not use the yield keyword. Nothing changed about that. As there's no generator here, yield_(x) would raise some exception: "RuntimeError: No generator found for yield_" Regular yield syntax remains the preferred option - with yield_ as a supplement for when delegation is needed. Perhaps a better name for it would be 'yield_outer' or 'nonlocal_yield'. regards, Anders
On 7/27/2010 1:18 PM, Anders J. Munch wrote:
But suppose you could address the source instead? Suppose you could write yield_if_true in such a way that it did not become a generator despite yielding? ... Now the example would work with a slight modifiction:
def yield_if_true(x): if x: yield_(x) yield_if_true(a) yield_if_true(b)
The real benefits from a yield_ function come with recursive functions.
Except now yield_if_true() is not a generator. So, you have two classes of "generators" now (top-level/delegated), which breaks all sorts of things. How would you write izip() with this? You'd have to have a top-level (yield) version and a delegated (yield_()) version to satisfy all of the use-cases. I think your yield_() is actually a detriment to recursive functions. The "yield from" solution is superior in that the delegated generators are still normal generators to other call sites that are indifferent to that use case. I feel that there is no escaping that "for v in g: yield g" and small variations are an amazingly common pattern that are amazingly naive. Although for many use cases, it works just fine; the unfortunate side-effect is that anyone building more clever generators on-top find themselves victimized by other authors. Creating a "yield from" syntax gives the other authors a pleasant and explicit shorthand, and provides for the other use cases automatically.
I'm guessing you could implement 'yield from' as a pure-Python function using yield_, making yield_ strictly more powerful
PEP 380 already provides a pure python description of "yield from" using existing syntax, therefore the assertion that you could implement it using yield_() provides no measure for how "powerful" your suggestion is. -- Scott Dial scott@scottdial.com scodial@cs.indiana.edu
Scott Dial wrote:
Except now yield_if_true() is not a generator. So, you have two classes of "generators" now (top-level/delegated), which breaks all sorts of things.
Right, yield_if_true is a regular function, that's the whole point. There'd still only be one kind of generator, defined by the presence of the yield keyword. Nothing would break. The function that calls yield_if_true (or some other function up the call chain) would need to be a generator, if necessary made such using the traditional workaround if 0: yield regards, Anders
All, trust me, this idea is going nowhere. Don't waste your time. -- --Guido van Rossum (python.org/~guido)
Greg Ewing wrote:
Anders J. Munch wrote:
Right, yield_if_true is a regular function, that's the whole point.
What if it needs to call yield_() more than once? If it's just a regular function, then it has no ability to be suspended at the point of yield and resumed.
I meant a regular function from the point of view of the compiler. The implementation would be special, of course. And therein lies the rub: It's unimplementable in CPython, alas. It could work in an implementation with a non-recursive eval loop, but if I'm not much mistaken, CPython recurses the eval loop even for a pure-Python function call. regards, Anders
participants (5)
-
Anders J. Munch
-
Carl M. Johnson
-
Greg Ewing
-
Guido van Rossum
-
Scott Dial