Re: [Python-ideas] Revised revised revised PEP on yield-from

Guido van Rossum wrote:
There better be a pretty darn good reason to do this.
I think that making it easy to use generators as lightweight threads is a good enough reason.
It still is. It's just that if you happen to return a value, it gets attached to the StopIteration for the use of anything that wants to care. It will make no difference at all to anything already existing. Also, if a generator that returns something gets called in a context that doesn't know about generator return values, the value is simply discarded, just as with an ordinary function call that ignores the return value.
I wouldn't object to raising an exception in that case. Come to think of it, doing that would me more consistent with the idea of the caller talking directly to the subgenerator.
And that could in turn be a generator with another such slot, right?
That's right.
Yes. You could even cache bound methods for these if you wanted.
I recommend that you produce a working implementation of this; who knows what other issues you might run into
Good idea. I'll see what I can come up with. -- Greg

On Mon, Feb 16, 2009 at 8:53 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
I still expect that even with the new syntax this will be pretty cumbersome, and require the user to be aware of all sorts of oddities and restrictions. I think it may be better to leave this to libraries like Greenlets and systems like Stackless which manage to hind the mechanics much better. Also, the asymmetry between "yield expr" (which returns a value passed in by the caller using .send()) and "yield from expr" (which returns a value coming from the sub-generator) really bothers me. Finally, your PEP currently doesn't really do this use case justice; can you provide a more complete motivating example? I don't quite understand how I would write the function that is delegated to as "yield from g(x)" nor do I quite see what the caller of the outer generator should expect from successive next() or .send() calls.
So, "return" is equivalent to "raise StopIteration" and "return <value>" is equivalent to "raise StopIteration(<value>)"? I suppose I could live with that.
Sounds good. -- --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
It should be able to expect whatever would happen if the body of the delegated-to generator were inlined into the delegating generator. That's the core idea behind all of this -- being able to take a chunk of code containing yields, abstract it out and put it in another function, without the ouside world being any the wiser. We do this all the time with ordinary functions and don't ever question the utility of being able to do so. I'm at a bit of a loss to understand why people can't see the utility in being able to do the same thing with generator code. I take your point about needing a better generators- as-threads example, though, and I'll see if I can come up with something.
So, "return" is equivalent to "raise StopIteration" and "return <value>" is equivalent to "raise StopIteration(<value>)"?
Yes. -- Greg

On Mon, Feb 16, 2009 at 10:31 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
I understand that when I'm thinking of generators (as you saw in the tree traversal example I posted). My question was in the context of lightweight threads and your proposal for the value returned by "yield from". I believe I now understand what you are trying to do, but the way to think about it in this case seems very different than when you're refactoring generators. IIUC there will be some kind of "scheduler" that manages a number of lightweight threads, each represented by a suspended stack of generators, and a number of blocking resources like sockets or mutexes. The scheduler knows what resource each thread is waiting for (could also be ready to run or sleeping until a specific time) and when the resource is ready it resumes the generator passing along whatever value is required using .send(). E.g. on input, it could read the data from the socket, or it could just pass a flag indicating that the resource is ready and let the generator make the actual recv() call. When a generator wants to access a resource, it uses "yield" (not "yield from"!) to send a description of the resource needed to the scheduler. When a generator wants to call another function that might block, the other function must be written as a generator too, and it is called using "yield from". The other function uses "yield" to access blocking resources, and "return" to return a value to its "caller" (really the generator that used "yield from"). I believe that Twisted has a similar scheme that doesn't have the benefit of arbitrarily nested generators; I recall Phillip Eby talkingabout this too. I've never used lightweight threads myself -- I'm a bit "old school" and would typically either use real OS threads, like Java, or event-driven programming possibly with callbacks, like Tcl/Tk. But I can see the utility of this approach and reluctantly admit that the proposed semantics for the "yield from" return value are just right for this approach. I do think that it is still requires the user to be quite aware of what is going on behind the scenes, for example to remember when to use "yield from" (for functions that have been written to cooperate with the scheduler) and when to use regular calls (for functions that cannot block) -- messing this up is quite painful, e.g. forgetting to use "yield from" will probably produce a pretty confusing error message. Also, it would seem you cannot write functions running in lightweight threads that are also "ordinary" generators, since yield is reserved for "calling" the scheduler. I have a little example in my head that I might as well show here: suppose we have a file-like object with a readline() method that calls a read() method which in turn calls a fillbuf() function. If I want to read a line from the file, I might write (assuming I am executing inside a generator that is really used for light-weight threading, so that "yield" communicates with the scheduler): line = yield from f.readline() The readline() method could naively be implemented as: def readline(self): line = [] while True: c = self.read(1) if not c: break line.append(c) if c == '\n': break return ''.join(line) The read() method could be: def read(self, n): if len(self.buf) < n: yield from self.fillbuf(n - len(self.buf)) result, self.buf = self.buf[:n], self.buf[n:] return result I'm leaving fillbuf() to the imagination of the reader; its implementation depends on the protocol with the scheduler to actually read data. Or there might be a lower-level unbuffered read() generator that encapsulates the scheduler protocol. I don't think I could add a generator to the file-like class that would call readline() until the file is exhausted though, at least not easily; code that is processing lines will have to use a while-loop like this: while True: line = yield from f.readline() if not line: break ...process line... Trying to turn this into a generator like I can do with an ordinary file-like object doesn't work: def __iter__(self): while True: line = yield from self.readline() if not line: break yield line ## ??????? This is because lightweight threads use yield to communicate with the scheduler, and they cannot easily also use it to yield successive values to their caller. I could imagine some kind of protocol where yield always returns a tuple whose first value is a string or token indicating what kind of yield it is, e.g. "yield" when it is returning the next value from the readline-loop, and "scheduler" when it is wanting to talk to the scheduler, but the caller would have to look for this and it would become much uglier than just writing out the while-loop.
I do, I do. It's the complication with the return value that I am still questioning, since that goes beyond simply refactoring generators.
Right.
I apologize for even asking that bit, it was very clear in the PEP. -- --Guido van Rossum (home page: http://www.python.org/~guido/)

On Tue, Feb 17, 2009 at 12:47 PM, Guido van Rossum <guido@python.org> wrote:
If a scheduler is used it can treat a chained function as just another resource, either because it has a decorator or simply by default. I can't see any need for new syntax. Overloading return has more merit. The typical approach today would be "yield Return(val)" or "raise Return(val)". However, I'm quite bothered by the risk of silently swallowing the return argument when not using a scheduler. -- Adam Olsen, aka Rhamphoryncus

On Wed, Feb 18, 2009 at 3:47 AM, Guido van Rossum <guido@python.org> wrote: that is why I follow this thread with attention, looking for a better way to do it) On a side note, my library didn't use any scheduler, it was simply turning the generator into a function that can be called with two callback arguments, and start this function with the send and throw methods of the calling thread as arguments. I believe that twisted is using a similar mechanism, but I can't tell for sure for I never had a deep look at it. If some people are interested in the code, you can have a look at it here [0], there are a few use examples at the bottom of the file. [0] http://git.openmoko.org/?p=tichy.git;a=blob;f=tichy/tasklet.py; -- http://charlie137.blogspot.com/

Guido van Rossum wrote:
Yes, that's a problem. I don't have a good answer for that at the moment. BTW, I have another idea for an example (a thread scheduler and an example using it to deal with sockets asynchronously). Is anyone still interested, or have you all seen enough already? -- Greg

On Tue, Feb 17, 2009 at 7:38 PM, Jesse Noller <jnoller@gmail.com> wrote:
I think it would be a better example than the parser example you gave before. Somehow the parser example is not very convincing, perhaps because it feels a bit unnatural to see a parser as a bunch of threads or coroutines -- the conventional way to write (which you presented as a starting point) works just fine. -- --Guido van Rossum (home page: http://www.python.org/~guido/)

On Wed, Feb 18, 2009 at 12:23 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Implementing a scheduler perhaps, but that can be omitted. Just give us the usage of the scheduler and how it's better than "yield Return(val)". -- Adam Olsen, aka Rhamphoryncus

Adam Olsen wrote:
That's not really going to reveal much more than what you've seen in the previous example. I've started working on the scheduler, and it's actually turning out to be fairly simple. Including the implementation isn't going to make the example much longer, and I think it will be instructive. I'll explain what's going on at each step, so it won't be a code dump. -- Greg

On Wed, Feb 18, 2009 at 5:31 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
I'm eagerly looking forward to it. We should get some Twisted folks to buy in. After that, I really encourage you to start working on a reference implementation. (Oh, and please send your PEP to the PEP editors. See PEP 1 for how to format it, where to send it, etc.) -- --Guido van Rossum (home page: http://www.python.org/~guido/)

On Mon, Feb 16, 2009 at 8:53 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
I still expect that even with the new syntax this will be pretty cumbersome, and require the user to be aware of all sorts of oddities and restrictions. I think it may be better to leave this to libraries like Greenlets and systems like Stackless which manage to hind the mechanics much better. Also, the asymmetry between "yield expr" (which returns a value passed in by the caller using .send()) and "yield from expr" (which returns a value coming from the sub-generator) really bothers me. Finally, your PEP currently doesn't really do this use case justice; can you provide a more complete motivating example? I don't quite understand how I would write the function that is delegated to as "yield from g(x)" nor do I quite see what the caller of the outer generator should expect from successive next() or .send() calls.
So, "return" is equivalent to "raise StopIteration" and "return <value>" is equivalent to "raise StopIteration(<value>)"? I suppose I could live with that.
Sounds good. -- --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
It should be able to expect whatever would happen if the body of the delegated-to generator were inlined into the delegating generator. That's the core idea behind all of this -- being able to take a chunk of code containing yields, abstract it out and put it in another function, without the ouside world being any the wiser. We do this all the time with ordinary functions and don't ever question the utility of being able to do so. I'm at a bit of a loss to understand why people can't see the utility in being able to do the same thing with generator code. I take your point about needing a better generators- as-threads example, though, and I'll see if I can come up with something.
So, "return" is equivalent to "raise StopIteration" and "return <value>" is equivalent to "raise StopIteration(<value>)"?
Yes. -- Greg

On Mon, Feb 16, 2009 at 10:31 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
I understand that when I'm thinking of generators (as you saw in the tree traversal example I posted). My question was in the context of lightweight threads and your proposal for the value returned by "yield from". I believe I now understand what you are trying to do, but the way to think about it in this case seems very different than when you're refactoring generators. IIUC there will be some kind of "scheduler" that manages a number of lightweight threads, each represented by a suspended stack of generators, and a number of blocking resources like sockets or mutexes. The scheduler knows what resource each thread is waiting for (could also be ready to run or sleeping until a specific time) and when the resource is ready it resumes the generator passing along whatever value is required using .send(). E.g. on input, it could read the data from the socket, or it could just pass a flag indicating that the resource is ready and let the generator make the actual recv() call. When a generator wants to access a resource, it uses "yield" (not "yield from"!) to send a description of the resource needed to the scheduler. When a generator wants to call another function that might block, the other function must be written as a generator too, and it is called using "yield from". The other function uses "yield" to access blocking resources, and "return" to return a value to its "caller" (really the generator that used "yield from"). I believe that Twisted has a similar scheme that doesn't have the benefit of arbitrarily nested generators; I recall Phillip Eby talkingabout this too. I've never used lightweight threads myself -- I'm a bit "old school" and would typically either use real OS threads, like Java, or event-driven programming possibly with callbacks, like Tcl/Tk. But I can see the utility of this approach and reluctantly admit that the proposed semantics for the "yield from" return value are just right for this approach. I do think that it is still requires the user to be quite aware of what is going on behind the scenes, for example to remember when to use "yield from" (for functions that have been written to cooperate with the scheduler) and when to use regular calls (for functions that cannot block) -- messing this up is quite painful, e.g. forgetting to use "yield from" will probably produce a pretty confusing error message. Also, it would seem you cannot write functions running in lightweight threads that are also "ordinary" generators, since yield is reserved for "calling" the scheduler. I have a little example in my head that I might as well show here: suppose we have a file-like object with a readline() method that calls a read() method which in turn calls a fillbuf() function. If I want to read a line from the file, I might write (assuming I am executing inside a generator that is really used for light-weight threading, so that "yield" communicates with the scheduler): line = yield from f.readline() The readline() method could naively be implemented as: def readline(self): line = [] while True: c = self.read(1) if not c: break line.append(c) if c == '\n': break return ''.join(line) The read() method could be: def read(self, n): if len(self.buf) < n: yield from self.fillbuf(n - len(self.buf)) result, self.buf = self.buf[:n], self.buf[n:] return result I'm leaving fillbuf() to the imagination of the reader; its implementation depends on the protocol with the scheduler to actually read data. Or there might be a lower-level unbuffered read() generator that encapsulates the scheduler protocol. I don't think I could add a generator to the file-like class that would call readline() until the file is exhausted though, at least not easily; code that is processing lines will have to use a while-loop like this: while True: line = yield from f.readline() if not line: break ...process line... Trying to turn this into a generator like I can do with an ordinary file-like object doesn't work: def __iter__(self): while True: line = yield from self.readline() if not line: break yield line ## ??????? This is because lightweight threads use yield to communicate with the scheduler, and they cannot easily also use it to yield successive values to their caller. I could imagine some kind of protocol where yield always returns a tuple whose first value is a string or token indicating what kind of yield it is, e.g. "yield" when it is returning the next value from the readline-loop, and "scheduler" when it is wanting to talk to the scheduler, but the caller would have to look for this and it would become much uglier than just writing out the while-loop.
I do, I do. It's the complication with the return value that I am still questioning, since that goes beyond simply refactoring generators.
Right.
I apologize for even asking that bit, it was very clear in the PEP. -- --Guido van Rossum (home page: http://www.python.org/~guido/)

On Tue, Feb 17, 2009 at 12:47 PM, Guido van Rossum <guido@python.org> wrote:
If a scheduler is used it can treat a chained function as just another resource, either because it has a decorator or simply by default. I can't see any need for new syntax. Overloading return has more merit. The typical approach today would be "yield Return(val)" or "raise Return(val)". However, I'm quite bothered by the risk of silently swallowing the return argument when not using a scheduler. -- Adam Olsen, aka Rhamphoryncus

On Wed, Feb 18, 2009 at 3:47 AM, Guido van Rossum <guido@python.org> wrote: that is why I follow this thread with attention, looking for a better way to do it) On a side note, my library didn't use any scheduler, it was simply turning the generator into a function that can be called with two callback arguments, and start this function with the send and throw methods of the calling thread as arguments. I believe that twisted is using a similar mechanism, but I can't tell for sure for I never had a deep look at it. If some people are interested in the code, you can have a look at it here [0], there are a few use examples at the bottom of the file. [0] http://git.openmoko.org/?p=tichy.git;a=blob;f=tichy/tasklet.py; -- http://charlie137.blogspot.com/

Guido van Rossum wrote:
Yes, that's a problem. I don't have a good answer for that at the moment. BTW, I have another idea for an example (a thread scheduler and an example using it to deal with sockets asynchronously). Is anyone still interested, or have you all seen enough already? -- Greg

On Tue, Feb 17, 2009 at 7:38 PM, Jesse Noller <jnoller@gmail.com> wrote:
I think it would be a better example than the parser example you gave before. Somehow the parser example is not very convincing, perhaps because it feels a bit unnatural to see a parser as a bunch of threads or coroutines -- the conventional way to write (which you presented as a starting point) works just fine. -- --Guido van Rossum (home page: http://www.python.org/~guido/)

On Wed, Feb 18, 2009 at 12:23 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Implementing a scheduler perhaps, but that can be omitted. Just give us the usage of the scheduler and how it's better than "yield Return(val)". -- Adam Olsen, aka Rhamphoryncus

Adam Olsen wrote:
That's not really going to reveal much more than what you've seen in the previous example. I've started working on the scheduler, and it's actually turning out to be fairly simple. Including the implementation isn't going to make the example much longer, and I think it will be instructive. I'll explain what's going on at each step, so it won't be a code dump. -- Greg

On Wed, Feb 18, 2009 at 5:31 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
I'm eagerly looking forward to it. We should get some Twisted folks to buy in. After that, I really encourage you to start working on a reference implementation. (Oh, and please send your PEP to the PEP editors. See PEP 1 for how to format it, where to send it, etc.) -- --Guido van Rossum (home page: http://www.python.org/~guido/)
participants (6)
-
Adam Olsen
-
Greg Ewing
-
Guido van Rossum
-
Guillaume Chereau
-
Jesse Noller
-
Raymond Hettinger