Return from generators in Python 3.2
Hello, I want to bring up a "forbidden" topic, however, I believe I have some strong points. There are many ways of doing asynchronous programming in Python. Multiprocessing, Threads, Greenlets, Deferred Object (Callbacks) and Coroutines. The latter is quite a new approach, but it gets more and more attention. What's really fascinating about coroutines, is that a code flow of a program using them reads naturally and straight. Callbacks break that code flow making it much harder to read and understand, threads don't work good in Python, and greenlets... greenlets are too magical, and, potentially, harmful. So, coroutines are good, and that is proved by a pleiad of new frameworks that utilize them: Monocle, cogen, and many others. However, coroutines in python are a bit incomplete. There is no standard way of returning a result value, making coroutine stop. Let's take a look at the following example: ... @bus.method ... def method1(): ... # some computation ... return result ... ... @bus.method ... def method2(): ... data = yield memcache.get(...) ... # some computation ... ... # and now, the most interesting point. Time to return a result. ... # Pick the prettiest line: ... # ... # yield Return(result) ... # return_ (result) ... # raise StopIteration(result) As you can see, there is no way of simple abstraction of coroutines. How nice is the 'yield' syntax here, that clearly marks async call, and how ugly is the return code. Speaking about large amounts of a code like the above it's hard to maintain and refactor it. Adding one yield statement to some generically decorated handler will force you to fix all returns and vice versa. Moreover, lack of proper return protocol complicates the underlying code. The very straightforward solution was proposed in PEP 380, and here it is a good place to say, that PEP 380 is not all about returns. It's all about new 'yield from' statement, and the new return syntax for coroutine is the very small part of it. However, in any currently existing framework it is possible to implement 'yield from' statement (with smth like yield From(...)), but there's absolutely no way to correct the return problem, as it raises SyntaxError which is impossible to catch. Therefore, I think that we can consider the returns problem apart from PEP 380. Proposed change uses the same type of approach as was introduced in PEP 380, but in a slightly different way. Instead of attaching the return value to StopIteration exception, we can introduce another one, let's call it GeneratorReturn (derived from BaseException). Still easy to use it in frameworks, but make it impossible to break things unintentionally. For example, it will protect us from cases like the following: ... def test(): ... for i in range(10): ... yield i ... return 10 In the above, GeneratorReturn error will be propagated stopping the program execution. Strictly speaking, the proposed change is just alters the current Python behaviour, making the 'return value' statement raise catchable error (instead of SyntaxError.) Speaking about PEP 3003. I'm pretty much sure that the idea behind moratorium on serious language changes was to give alternative python interpreters a chance to catch up Python 3. Well, the proposed is a very small change in CPython, just few lines of code. It doesn't change grammar or AST tree structure, and it is fully backwards compatible. I've looked at the PyPy code and found that the change is *very* easy to port there, and I'm certain that the situation is the same for Jython and IronPython. (If this new feature would be the only problem why we don't see Jython or PyPy supporting 3.2 version we all would be more than happy.) Given all that, I think PEP 3003 is inapplicable to this proposal. Pros: - The change on the interpreter side is tiny (reducing the entropy in symtable.c!) - No affect on grammar or AST structure. - Easy to port to other interpreters. - Fully backward compatible. - On the very basic level it will change current behaviour from raising an uncatchable error to raising a catchable one. Nobody will be confused. - Another key feature of Python 3, that will probably encourage people to migrate. - Will make coroutines more attractive and stimulate the rise of new frameworks and development of new ones. - One way of doing things. The same interface in frameworks, code in coroutines look almost the same as in subroutines but with yields. Make coroutines protocol complete. If we decide to postpone this feature till Python 3.3, than we'll push it all back for *years*. The change is tiny, but it means really a lot. Those who tried to work with coroutines will understand me. Let's at least consider it. PS I'm attaching a patch to the letter; it's far from ideal state, but contains the GeneratorReturn exception, code to raise it and the corresponding unittests. - Yury
On 8/26/2010 11:00 AM, Yury Selivanov wrote:
If we decide to postpone this feature till Python 3.3, than we'll push it all back The change is tiny, but it means really a lot.
AFAICT, this change was the most controversial part of PEP 380.
PS I'm attaching a patch to the letter; it's far from ideal state, but contains the GeneratorReturn exception, code to raise it and the corresponding unittests.
I believe overloading StopIteration for this purpose was considered more acceptable than creating a new exception. BTW, attaching patches to emails on this list is generally the best way to have few look at your patch. :-p Also, this seems more appropriate for python-ideas. -- Scott Dial scott@scottdial.com scodial@cs.indiana.edu
On 2010-08-26, at 12:20 PM, Scott Dial wrote:
On 8/26/2010 11:00 AM, Yury Selivanov wrote:
If we decide to postpone this feature till Python 3.3, than we'll push it all back The change is tiny, but it means really a lot.
AFAICT, this change was the most controversial part of PEP 380.
PS I'm attaching a patch to the letter; it's far from ideal state, but contains the GeneratorReturn exception, code to raise it and the corresponding unittests.
I believe overloading StopIteration for this purpose was considered more acceptable than creating a new exception.
Whatever the Python community decides. I was trying to make several points regarding the problem, and proposed a solution which in my opinion is slightly better than one described in the pep.
BTW, attaching patches to emails on this list is generally the best way to have few look at your patch. :-p
Hm, my mailing client clearly indicates that the patch has been attached and sent. In any case, here is a direct link: http://dl.dropbox.com/u/21052/generators_return.patch - Yury
On 8/26/10 12:48 PM, Yury Selivanov wrote:
On 2010-08-26, at 12:20 PM, Scott Dial wrote:
BTW, attaching patches to emails on this list is generally the best way to have few look at your patch. :-p
Hm, my mailing client clearly indicates that the patch has been attached and sent. In any case, here is a direct link: http://dl.dropbox.com/u/21052/generators_return.patch
I think Scott means that you should open an issue and attach the patch there. At least then people can find it. -- Eric.
On 2010-08-26, at 1:10 PM, Eric Smith wrote:
On 8/26/10 12:48 PM, Yury Selivanov wrote:
On 2010-08-26, at 12:20 PM, Scott Dial wrote:
BTW, attaching patches to emails on this list is generally the best way to have few look at your patch. :-p
Hm, my mailing client clearly indicates that the patch has been attached and sent. In any case, here is a direct link: http://dl.dropbox.com/u/21052/generators_return.patch
I think Scott means that you should open an issue and attach the patch there. At least then people can find it.
Thank you Eric, I've already done that. Will know next time ;-) - Yury
On Fri, Aug 27, 2010 at 1:00 AM, Yury Selivanov <yselivanov@gmail.com> wrote:
In the above, GeneratorReturn error will be propagated stopping the program execution. Strictly speaking, the proposed change is just alters the current Python behaviour, making the 'return value' statement raise catchable error (instead of SyntaxError.)
There are fairly extensive discussions of using a new GeneratorReturn exception rather than StopIteration in the python-dev archives. As I recall, one key factor leading to the use of StopIteration was the suggestion's implied breakage of the equivalence between "return" (which would continue to raise StopIteration) and "return None" (which would raise GeneratorReturn with a value of None). Using a different exception also made all generator handling code clumsier, since it now needed to deal with two exceptions rather than just one. Since the only situations where a return value could be inadvertently ignored were those where the application clearly didn't care about the return value anyway, it was decided that sticking with a single exception type was the better approach. PEP 380 should probably mention this idea explicitly though, since using a new exception type is a fairly obvious alternative suggestion and the discussion of the idea is scattered all over the place in the archives. As for breaking the moratorium for it - no, not even close to a big enough win, since people can already write "raise CoroutineReturn(result)". Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 2010-08-26, at 6:11 PM, Nick Coghlan wrote:
On Fri, Aug 27, 2010 at 1:00 AM, Yury Selivanov <yselivanov@gmail.com> wrote:
In the above, GeneratorReturn error will be propagated stopping the program execution. Strictly speaking, the proposed change is just alters the current Python behaviour, making the 'return value' statement raise catchable error (instead of SyntaxError.)
There are fairly extensive discussions of using a new GeneratorReturn exception rather than StopIteration in the python-dev archives. As I recall, one key factor leading to the use of StopIteration was the suggestion's implied breakage of the equivalence between "return" (which would continue to raise StopIteration) and "return None" (which would raise GeneratorReturn with a value of None). Using a different exception also made all generator handling code clumsier, since it now needed to deal with two exceptions rather than just one.
Yes, I understand the point of having two different exceptions for basically two close related things - 'return None' and 'return value'. However, as I outlined in the first message, this was intended to prevent this kind of mistakes: ... def test(): ... for i in range(10): ... yield i ... return 10 Which will certainly happen, especially with people new to python. Again, this exception will be used in very specific places by a very specific software, that expects it. Otherwise, it should be propagated and crash the whole thing. As for the generator handling code -- are you sure there is that much of it?
Since the only situations where a return value could be inadvertently ignored were those where the application clearly didn't care about the return value anyway, it was decided that sticking with a single exception type was the better approach.
Good point. It's all about our level of care about beginners ;)
PEP 380 should probably mention this idea explicitly though, since using a new exception type is a fairly obvious alternative suggestion and the discussion of the idea is scattered all over the place in the archives.
As for breaking the moratorium for it - no, not even close to a big enough win, since people can already write "raise CoroutineReturn(result)".
Well, people certainly can. But the goal is to make convenient instruments for everyday use. I, for example, deal with really a lot of coroutine based code, and it's very annoying that I have to use some creepy abstractions in order to just return a value! It's especially annoying when you have normal code with normal returns and coroutines, with 'return_ (value)'. And I don't think it frustrates just me. Coroutines protocol is incomplete and there is a very little action required to fix it. All this proposal is suggesting is to replace SyntaxError with GeneratorReturn (or StopIteration). I'd classify is as a minor change than some special refactoring that may fall under the moratorium. Correct me if I'm wrong. - Yury
On Fri, Aug 27, 2010 at 8:31 AM, Yury Selivanov <yselivanov@gmail.com> wrote:
All this proposal is suggesting is to replace SyntaxError with GeneratorReturn (or StopIteration). I'd classify is as a minor change than some special refactoring that may fall under the moratorium. Correct me if I'm wrong.
It's either a new builtin or affects the API of an existing builtin, and it is moving something from a compile time error to a runtime error. Things that fall under the moratorium aren't "special refactorings" - they're anything that affects the builtins or the language syntax, so trying to separate out this one part of PEP 380 fails on both counts. Coroutine programmers have lived with the status quo for years already, putting up with it for a couple more until PEP 380 goes in isn't going to hurt them all that much. On the GeneratorReturn vs StopIteration front, adding a new builtin exception is a big deal. "Newbie programmers might not notice that their return statement isn't doing anything" isn't a particularly good justification for adding one. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Yury Selivanov wrote:
However, as I outlined in the first message, this was intended to prevent this kind of mistakes:
... def test(): ... for i in range(10): ... yield i ... return 10
Which will certainly happen, especially with people new to python.
That very problem was considered in the discussion, and it was concluded that it wasn't severe enough to be worth breaking the symmetry between return <-> StopIteration and return value <-> StopIteration(value).
Good point. It's all about our level of care about beginners ;)
While due consideration is always given to beginners, being beginner-friendly is not a good enough reason to introduce a feature that would be *unfriendly* to experienced programmers.
I, for example, deal with really a lot of coroutine based code, and it's very annoying that I have to use some creepy abstractions in order to just return a value!
Even with your proposal, you'd still have to use a 'creepy abstraction' every time one of your coroutines calls another. That's why PEP 380 deals with 'more than just return'. -- Greg
On 2010-08-26, at 8:04 PM, Greg Ewing wrote:
Even with your proposal, you'd still have to use a 'creepy abstraction' every time one of your coroutines calls another. That's why PEP 380 deals with 'more than just return'.
Nope. In almost any coroutine framework you have a scheduler or trampoline object that basically does all the work of calling, passing values and propagating exceptions. And many other things that 'yield from' won't help you with (cooperation, deferring to process/thread pools, pausing, etc.) Being a developer of one of such frameworks, I can tell you, that I can easily live without 'yield from', but dealing with weird return syntax is a pain. Especially when you use decorators like @bus.method, or @protocol.handler, that transparently wrap your callable be it generator or regular function. And after that you have to use different return syntax for them. - Yury
On Thu, Aug 26, 2010 at 5:05 PM, Yury Selivanov <yselivanov@gmail.com> wrote:
On 2010-08-26, at 8:04 PM, Greg Ewing wrote:
Even with your proposal, you'd still have to use a 'creepy abstraction' every time one of your coroutines calls another. That's why PEP 380 deals with 'more than just return'.
Nope. In almost any coroutine framework you have a scheduler or trampoline object that basically does all the work of calling, passing values and propagating exceptions. And many other things that 'yield from' won't help you with (cooperation, deferring to process/thread pools, pausing, etc.) Being a developer of one of such frameworks, I can tell you, that I can easily live without 'yield from', but dealing with weird return syntax is a pain.
That's not my experience. I wrote a trampoline myself (not released yet), and found that I had to write a lot more code to deal with the absence of yield-from than to deal with returns. In my framework, users write 'raise Return(value)' where Return is a subclass of StopIteration. The trampoline code that must be written to deal with StopIteration can be extended trivially to deal with this. The only reason I chose to use a subclass is so that I can diagnose when the return value is not used, but I could have chosen to ignore this or just diagnose whenever the argument to StopIteration is not None.
Especially when you use decorators like @bus.method, or @protocol.handler, that transparently wrap your callable be it generator or regular function. And after that you have to use different return syntax for them.
Until PEP 380 is implemented, you have to use different return syntax in generators. You have some choices: raise StopIteration(value), raise SomethingElse(value), or callSomeFunction(value) -- where callSomeFunction raises the exception. I like the raise variants because they signal to tools that the flow control stops here -- e.g. in Emacs, python-mode.el automatically dedents after a 'raise' or 'return' but not after a call (of course). -- --Guido van Rossum (python.org/~guido)
On 2010-08-26, at 8:25 PM, Guido van Rossum wrote:
On Thu, Aug 26, 2010 at 5:05 PM, Yury Selivanov <yselivanov@gmail.com> wrote:
On 2010-08-26, at 8:04 PM, Greg Ewing wrote:
Even with your proposal, you'd still have to use a 'creepy abstraction' every time one of your coroutines calls another. That's why PEP 380 deals with 'more than just return'.
Nope. In almost any coroutine framework you have a scheduler or trampoline object that basically does all the work of calling, passing values and propagating exceptions. And many other things that 'yield from' won't help you with (cooperation, deferring to process/thread pools, pausing, etc.) Being a developer of one of such frameworks, I can tell you, that I can easily live without 'yield from', but dealing with weird return syntax is a pain.
That's not my experience. I wrote a trampoline myself (not released yet), and found that I had to write a lot more code to deal with the absence of yield-from than to deal with returns. In my framework, users write 'raise Return(value)' where Return is a subclass of StopIteration. The trampoline code that must be written to deal with StopIteration can be extended trivially to deal with this. The only reason I chose to use a subclass is so that I can diagnose when the return value is not used, but I could have chosen to ignore this or just diagnose whenever the argument to StopIteration is not None.
In the framework I'm talking about (not yet released too, but I do plan to open source it one day), everything that can be yielded is an instance of a special class - Command. Generators are wrapped with a subclass of Command - Task, socket methods return Recv & Send commands etc. Regular python functions, by the way, can be also wrapped in a Task command, - the framework is smart enough to manage them automatically. Of course, wrapping of python functions is abstracted in decorators, so be it a simple @coroutine, or some asynchronous @bus.method - they all are native objects to the scheduler. And all the work of exception propagation, IO waiting, managing of timeouts and much more is a business framework (without 'yield from'.) Hence the yield statement is nothing more than a point of a code flow where some command is pushed to the scheduler for execution. And can be inserted almost everywhere. This approach differs from the one you showed in PEP 342; it's much more complicated, but is has its own strong advantages. It is not new though, for instance, almost the same idea is utilized in 'cogen' framework, and few others (can't remember all names but I did quite a big research before writing a single line of code.) All those frameworks are suffer from the inability of using native return statement in generators. Now, imagine a big project. I mean really big complicated system, with tens of thousands lines of code. Code is broken down to small methods: some of them implement some asynchronous methods on a message bus, some of them are mapped to serve responses on specific URLs and so on. In the way of writing code I'm talking about, there is no distinction between coroutines and subroutines. There are some methods which just return some value; some that query a potentially blocking code with 'yield' keyword and after that they return the result - it all doesn't matter. Abstraction is very good and simple, 'yield' statement just marks suspension points, and thats all. BUT - there is a 'return problem', so if the code got a new yield statement - you have to go and fix all returns and vice versa. It just *breaks the beauty* of the language. I've invested tons of time into it, and suffer from the weird syntax that differs from one line to another. Of course I can live with that, and people that developed other frameworks will too. But considering that the 'return' syntax is almost approved (1); almost one hundred percent it will be merged to 3.3 (2); the change is small and backwards compatible (3); one-two hours of work to port to other interpreters - so not contradict 100% with the moratorium ideas (4) - I've decided to bring this topic up. The asynchronous programming is booming now. It gets more and more attention day by day. And python has a unique combination of features that may make it one of the leaders in the field (nodejs is amateur; erlang is hard; java, ruby and family lacks 'yield' statement so you have to use callbacks - and that's ugly.) Wait for this simple feature for several years in a world that is changing that fast? I'm not sure. Probably the last point - this would be one more good advantage of py3k for python 2.x users. Sorry for such a long text, I just wanted to make my points clear and provide some examples.
Especially when you use decorators like @bus.method, or @protocol.handler, that transparently wrap your callable be it generator or regular function. And after that you have to use different return syntax for them.
Until PEP 380 is implemented, you have to use different return syntax in generators. You have some choices: raise StopIteration(value), raise SomethingElse(value), or callSomeFunction(value) -- where callSomeFunction raises the exception. I like the raise variants because they signal to tools that the flow control stops here -- e.g. in Emacs, python-mode.el automatically dedents after a 'raise' or 'return' but not after a call (of course).
I'm not asking for the whole PEP380, but for a small subset of it. So if it's not that much contradicts with moratorium - let's discuss the feature. If it is - then OK, I stop spamming ;-) - Yury
On Fri, Aug 27, 2010 at 8:25 AM, Guido van Rossum <guido@python.org> wrote:
On Thu, Aug 26, 2010 at 5:05 PM, Yury Selivanov <yselivanov@gmail.com> wrote:
On 2010-08-26, at 8:04 PM, Greg Ewing wrote:
Even with your proposal, you'd still have to use a 'creepy abstraction' every time one of your coroutines calls another. That's why PEP 380 deals with 'more than just return'.
Nope. In almost any coroutine framework you have a scheduler or trampoline object that basically does all the work of calling, passing values and propagating exceptions. And many other things that 'yield from' won't help you with (cooperation, deferring to process/thread pools, pausing, etc.) Being a developer of one of such frameworks, I can tell you, that I can easily live without 'yield from', but dealing with weird return syntax is a pain.
That's not my experience. I wrote a trampoline myself (not released yet), and found that I had to write a lot more code to deal with the absence of yield-from than to deal with returns. In my framework, users write 'raise Return(value)' where Return is a subclass of StopIteration. The trampoline code that must be written to deal with StopIteration can be extended trivially to deal with this. The only reason I chose to use a subclass is so that I can diagnose when the return value is not used, but I could have chosen to ignore this or just diagnose whenever the argument to StopIteration is not None.
A bit off-topic, but... In my experience the lack of "yield from" makes certain styles of programming both very tedious and very costly for performance. One example would be Genshi, which implements something like pipes or filters. There are many filters that will do something once (e.g. insert a doctype) and but have O(N) performance because of the function call overhead of "for x in other_generator: yield x". Nest this a few times and you'll have 10 function calls for every byte of output (not an exaggeration in the case of Trac templates). I think if implemented properly "yield from" could get rid of most of that overhead. -bob
On 08/26/2010 07:25 PM, Guido van Rossum wrote:
That's not my experience. I wrote a trampoline myself (not released yet), and found that I had to write a lot more code to deal with the absence of yield-from than to deal with returns. In my framework, users write 'raise Return(value)' where Return is a subclass of StopIteration. The trampoline code that must be written to deal with StopIteration can be extended trivially to deal with this. The only reason I chose to use a subclass is so that I can diagnose when the return value is not used, but I could have chosen to ignore this or just diagnose whenever the argument to StopIteration is not None.
I'm currently playing around with a trampoline version based on the example in PEP 342. Some of the things I found are ... * In my version I seperated the trampoline from the scheduler. Having it as a seperate class made the code cleaner and easier to read. A trampoline instance can be ran without the scheduler. (An example of this is below.) The separate Scheduler only needs a few methods in a coroutine wrapper to run it. It really doesn't matter what's inside it as long as it has a resume method that the scheduler can understand, and it returns in a timely manner so it doesn't starve other coroutines. * I've found that having a Coroutine class, that has the generator methods on it, is very useful for writing more complex coroutines and generators. * In a true trampoline, all sub coroutines are yielded out to the trampoline resume loop before their send method is called, so "yield from" isn't needed with a well behaved Trampoline runner. I think "yield from"'s value is that it emulates a trampolines performance without needing a stack to keep track of caller coroutines. It also saves a lot of looping if you want to write coroutines with sub coroutines without a trampoline runner to run them on. * Raising StopIteration(value) worked just fine for setting the last value. Getting the value from the exception just before returning it is still a bit clumsy... I currently use. return exc.args[0] if exc.args else None Maybe I've overlooked something? My version of the Trampoline class handles the return value since it already has it handy when it gets a StopIteration exception, so the user doesn't need to this, they just need to yield the last value out the same as they do anywhere else. I wonder if "yield from" may run into pythons stack limit? For example... """ Factorial Function. """ def f(n, k=1): if n != 0: return f(n-1, k*n) else: return k def factoral(n): return f(n) if __name__ == "__main__": print(factoral(1000)) This aborts with: RuntimeError: maximum recursion depth exceeded in comparison This one works just fine. """ Factorial Trampoline. """ from coroutine.scheduler import Trampoline def tramp(func): def wrap(*args, **kwds): t = Trampoline(func(*args, **kwds)) return t.run() return wrap def f(n, k=1): if n != 0: yield f(n-1, k*n) else: yield k @tramp def factoral(n): yield f(n) if __name__ == "__main__": print(factoral(10000)) # <---- extra zero too! But if I add another zero, it begins to slow to a crawl as it uses swap space. ;-) How would a "yield from" version compare? I'm basically learning this stuff by trying to break this thing, and then trying to fix what breaks as I go. That seems to be working. ;-) Cheers, Ron Adam
Ron Adam wrote:
I wonder if "yield from" may run into pythons stack limit?
My current implementation wouldn't, because nested yield-froms don't result in nested activations of Python frames. But...
if __name__ == "__main__": print(factoral(10000)) # <---- extra zero too!
But if I add another zero, it begins to slow to a crawl as it uses swap space. ;-)
How would a "yield from" version compare?
... there is still a Python frame in existence for each active invocation of the generator, so it would probably use about the same amount of memory. -- Greg
participants (8)
-
Bob Ippolito
-
Eric Smith
-
Greg Ewing
-
Guido van Rossum
-
Nick Coghlan
-
Ron Adam
-
Scott Dial
-
Yury Selivanov