Re: [Python-ideas] Revised revised revised PEP on yield-from
Guido van Rossum wrote:
There better be a pretty darn good reason to do this.
I think that making it easy to use generators as lightweight threads is a good enough reason.
I really don't like overloading return this way -- normally returning from a generator is equivalent to falling off the end and raises StopIteration
It still is. It's just that if you happen to return a value, it gets attached to the StopIteration for the use of anything that wants to care. It will make no difference at all to anything already existing. Also, if a generator that returns something gets called in a context that doesn't know about generator return values, the value is simply discarded, just as with an ordinary function call that ignores the return value.
I'm not sure I like this interpretation of .send() -- it looks asymmetrical with the way .send() to non-generator iterators is treated in other contexts, where it is an error.
I wouldn't object to raising an exception in that case. Come to think of it, doing that would me more consistent with the idea of the caller talking directly to the subgenerator.
And that could in turn be a generator with another such slot, right?
That's right.
Hopefully the testing for the presence of .throw, .send and .close could be done once at the start of the yield-from and represented as a set of flags.
Yes. You could even cache bound methods for these if you wanted.
I recommend that you produce a working implementation of this; who knows what other issues you might run into
Good idea. I'll see what I can come up with. -- Greg
On Mon, Feb 16, 2009 at 8:53 PM, Greg Ewing
Guido van Rossum wrote:
There better be a pretty darn good reason to do this.
I think that making it easy to use generators as lightweight threads is a good enough reason.
I still expect that even with the new syntax this will be pretty cumbersome, and require the user to be aware of all sorts of oddities and restrictions. I think it may be better to leave this to libraries like Greenlets and systems like Stackless which manage to hind the mechanics much better. Also, the asymmetry between "yield expr" (which returns a value passed in by the caller using .send()) and "yield from expr" (which returns a value coming from the sub-generator) really bothers me. Finally, your PEP currently doesn't really do this use case justice; can you provide a more complete motivating example? I don't quite understand how I would write the function that is delegated to as "yield from g(x)" nor do I quite see what the caller of the outer generator should expect from successive next() or .send() calls.
I really don't like overloading return this way -- normally returning from a generator is equivalent to falling off the end and raises StopIteration
It still is. It's just that if you happen to return a value, it gets attached to the StopIteration for the use of anything that wants to care. It will make no difference at all to anything already existing.
So, "return" is equivalent to "raise StopIteration" and "return <value>" is equivalent to "raise StopIteration(<value>)"? I suppose I could live with that.
Also, if a generator that returns something gets called in a context that doesn't know about generator return values, the value is simply discarded, just as with an ordinary function call that ignores the return value.
I'm not sure I like this interpretation of .send() -- it looks asymmetrical with the way .send() to non-generator iterators is treated in other contexts, where it is an error.
I wouldn't object to raising an exception in that case. Come to think of it, doing that would me more consistent with the idea of the caller talking directly to the subgenerator.
And that could in turn be a generator with another such slot, right?
That's right.
Hopefully the testing for the presence of .throw, .send and .close could be done once at the start of the yield-from and represented as a set of flags.
Yes. You could even cache bound methods for these if you wanted.
I recommend that you produce a working implementation of this; who knows what other issues you might run into
Good idea. I'll see what I can come up with.
Sounds good. -- --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum wrote:
I don't quite understand how I would write the function that is delegated to as "yield from g(x)" nor do I quite see what the caller of the outer generator should expect from successive next() or .send() calls.
It should be able to expect whatever would happen if the body of the delegated-to generator were inlined into the delegating generator. That's the core idea behind all of this -- being able to take a chunk of code containing yields, abstract it out and put it in another function, without the ouside world being any the wiser. We do this all the time with ordinary functions and don't ever question the utility of being able to do so. I'm at a bit of a loss to understand why people can't see the utility in being able to do the same thing with generator code. I take your point about needing a better generators- as-threads example, though, and I'll see if I can come up with something.
So, "return" is equivalent to "raise StopIteration" and "return <value>" is equivalent to "raise StopIteration(<value>)"?
Yes. -- Greg
[Greg Ewing]
It should be able to expect whatever would happen if the body of the delegated-to generator were inlined into the delegating generator.
That's the core idea behind all of this -- being able to take a chunk of code containing yields, abstract it out and put it in another function, without the ouside world being any the wiser.
That's a very nice synopsis. It should probably be right at the top of the PEP. Raymond
On Mon, Feb 16, 2009 at 10:31 PM, Greg Ewing
Guido van Rossum wrote:
I don't quite understand how I would write the function that is delegated to as "yield from g(x)" nor do I quite see what the caller of the outer generator should expect from successive next() or .send() calls.
It should be able to expect whatever would happen if the body of the delegated-to generator were inlined into the delegating generator.
I understand that when I'm thinking of generators (as you saw in the tree traversal example I posted). My question was in the context of lightweight threads and your proposal for the value returned by "yield from". I believe I now understand what you are trying to do, but the way to think about it in this case seems very different than when you're refactoring generators. IIUC there will be some kind of "scheduler" that manages a number of lightweight threads, each represented by a suspended stack of generators, and a number of blocking resources like sockets or mutexes. The scheduler knows what resource each thread is waiting for (could also be ready to run or sleeping until a specific time) and when the resource is ready it resumes the generator passing along whatever value is required using .send(). E.g. on input, it could read the data from the socket, or it could just pass a flag indicating that the resource is ready and let the generator make the actual recv() call. When a generator wants to access a resource, it uses "yield" (not "yield from"!) to send a description of the resource needed to the scheduler. When a generator wants to call another function that might block, the other function must be written as a generator too, and it is called using "yield from". The other function uses "yield" to access blocking resources, and "return" to return a value to its "caller" (really the generator that used "yield from"). I believe that Twisted has a similar scheme that doesn't have the benefit of arbitrarily nested generators; I recall Phillip Eby talkingabout this too. I've never used lightweight threads myself -- I'm a bit "old school" and would typically either use real OS threads, like Java, or event-driven programming possibly with callbacks, like Tcl/Tk. But I can see the utility of this approach and reluctantly admit that the proposed semantics for the "yield from" return value are just right for this approach. I do think that it is still requires the user to be quite aware of what is going on behind the scenes, for example to remember when to use "yield from" (for functions that have been written to cooperate with the scheduler) and when to use regular calls (for functions that cannot block) -- messing this up is quite painful, e.g. forgetting to use "yield from" will probably produce a pretty confusing error message. Also, it would seem you cannot write functions running in lightweight threads that are also "ordinary" generators, since yield is reserved for "calling" the scheduler. I have a little example in my head that I might as well show here: suppose we have a file-like object with a readline() method that calls a read() method which in turn calls a fillbuf() function. If I want to read a line from the file, I might write (assuming I am executing inside a generator that is really used for light-weight threading, so that "yield" communicates with the scheduler): line = yield from f.readline() The readline() method could naively be implemented as: def readline(self): line = [] while True: c = self.read(1) if not c: break line.append(c) if c == '\n': break return ''.join(line) The read() method could be: def read(self, n): if len(self.buf) < n: yield from self.fillbuf(n - len(self.buf)) result, self.buf = self.buf[:n], self.buf[n:] return result I'm leaving fillbuf() to the imagination of the reader; its implementation depends on the protocol with the scheduler to actually read data. Or there might be a lower-level unbuffered read() generator that encapsulates the scheduler protocol. I don't think I could add a generator to the file-like class that would call readline() until the file is exhausted though, at least not easily; code that is processing lines will have to use a while-loop like this: while True: line = yield from f.readline() if not line: break ...process line... Trying to turn this into a generator like I can do with an ordinary file-like object doesn't work: def __iter__(self): while True: line = yield from self.readline() if not line: break yield line ## ??????? This is because lightweight threads use yield to communicate with the scheduler, and they cannot easily also use it to yield successive values to their caller. I could imagine some kind of protocol where yield always returns a tuple whose first value is a string or token indicating what kind of yield it is, e.g. "yield" when it is returning the next value from the readline-loop, and "scheduler" when it is wanting to talk to the scheduler, but the caller would have to look for this and it would become much uglier than just writing out the while-loop.
That's the core idea behind all of this -- being able to take a chunk of code containing yields, abstract it out and put it in another function, without the ouside world being any the wiser.
We do this all the time with ordinary functions and don't ever question the utility of being able to do so. I'm at a bit of a loss to understand why people can't see the utility in being able to do the same thing with generator code.
I do, I do. It's the complication with the return value that I am still questioning, since that goes beyond simply refactoring generators.
I take your point about needing a better generators- as-threads example, though, and I'll see if I can come up with something.
Right.
So, "return" is equivalent to "raise StopIteration" and "return <value>" is equivalent to "raise StopIteration(<value>)"?
Yes.
I apologize for even asking that bit, it was very clear in the PEP. -- --Guido van Rossum (home page: http://www.python.org/~guido/)
On Tue, Feb 17, 2009 at 12:47 PM, Guido van Rossum
My question was in the context of lightweight threads and your proposal for the value returned by "yield from". I believe I now understand what you are trying to do, but the way to think about it in this case seems very different than when you're refactoring generators. IIUC there will be some kind of "scheduler" that manages a number of lightweight threads, each represented by a suspended stack of generators, and a number of blocking resources like sockets or mutexes. The scheduler knows what resource each thread is waiting for (could also be ready to run or sleeping until a specific time) and when the resource is ready it resumes the generator passing along whatever value is required using .send(). E.g. on input, it could read the data from the socket, or it could just pass a flag indicating that the resource is ready and let the generator make the actual recv() call. When a generator wants to access a resource, it uses "yield" (not "yield from"!) to send a description of the resource needed to the scheduler. When a generator wants to call another function that might block, the other function must be written as a generator too, and it is called using "yield from". The other function uses "yield" to access blocking resources, and "return" to return a value to its "caller" (really the generator that used "yield from").
If a scheduler is used it can treat a chained function as just another resource, either because it has a decorator or simply by default. I can't see any need for new syntax. Overloading return has more merit. The typical approach today would be "yield Return(val)" or "raise Return(val)". However, I'm quite bothered by the risk of silently swallowing the return argument when not using a scheduler. -- Adam Olsen, aka Rhamphoryncus
Trying to turn this into a generator like I can do with an ordinary file-like object doesn't work:
def __iter__(self): while True: line = yield from self.readline() if not line: break yield line ## ???????
This is because lightweight threads use yield to communicate with the scheduler, and they cannot easily also use it to yield successive values to their caller. I could imagine some kind of protocol where yield always returns a tuple whose first value is a string or token indicating what kind of yield it is, e.g. "yield" when it is returning the next value from the readline-loop, and "scheduler" when it is wanting to talk to the scheduler, but the caller would have to look for this and it would become much uglier than just writing out the while-loop. Yes that was the problem I had when I worked with this kind of things. My solution was to enforce that yielding an other lightweight thread means that we start it and wait for the result, and yielding anything else is a return to the caller (I am not very satisfied with this, and
On Wed, Feb 18, 2009 at 3:47 AM, Guido van Rossum
Guido van Rossum wrote:
Also, it would seem you cannot write functions running in lightweight threads that are also "ordinary" generators, since yield is reserved for "calling" the scheduler.
Yes, that's a problem. I don't have a good answer for that at the moment. BTW, I have another idea for an example (a thread scheduler and an example using it to deal with sockets asynchronously). Is anyone still interested, or have you all seen enough already? -- Greg
On Tue, Feb 17, 2009 at 10:29 PM, Greg Ewing
Guido van Rossum wrote:
Also, it would seem you cannot write functions running in lightweight threads that are also "ordinary" generators, since yield is reserved for "calling" the scheduler.
Yes, that's a problem. I don't have a good answer for that at the moment.
BTW, I have another idea for an example (a thread scheduler and an example using it to deal with sockets asynchronously). Is anyone still interested, or have you all seen enough already?
-- Greg
Oooh, oooh I am! I am! (still interested that is). jesse
On Tue, Feb 17, 2009 at 7:38 PM, Jesse Noller
On Tue, Feb 17, 2009 at 10:29 PM, Greg Ewing
wrote: Guido van Rossum wrote:
Also, it would seem you cannot write functions running in lightweight threads that are also "ordinary" generators, since yield is reserved for "calling" the scheduler.
Yes, that's a problem. I don't have a good answer for that at the moment.
BTW, I have another idea for an example (a thread scheduler and an example using it to deal with sockets asynchronously). Is anyone still interested, or have you all seen enough already?
-- Greg
Oooh, oooh I am! I am! (still interested that is).
I think it would be a better example than the parser example you gave before. Somehow the parser example is not very convincing, perhaps because it feels a bit unnatural to see a parser as a bunch of threads or coroutines -- the conventional way to write (which you presented as a starting point) works just fine. -- --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum wrote:
I think it would be a better example than the parser example you gave before. Somehow the parser example is not very convincing, perhaps because it feels a bit unnatural to see a parser as a bunch of threads or coroutines -- the conventional way to write (which you presented as a starting point) works just fine.
Yes, that's true. I wanted to start with something a bit simpler and easier to follow, though. The scheduler is going to be rather more convoluted. -- Greg
On Wed, Feb 18, 2009 at 12:23 PM, Greg Ewing
Guido van Rossum wrote:
I think it would be a better example than the parser example you gave before. Somehow the parser example is not very convincing, perhaps because it feels a bit unnatural to see a parser as a bunch of threads or coroutines -- the conventional way to write (which you presented as a starting point) works just fine.
Yes, that's true. I wanted to start with something a bit simpler and easier to follow, though. The scheduler is going to be rather more convoluted.
Implementing a scheduler perhaps, but that can be omitted. Just give us the usage of the scheduler and how it's better than "yield Return(val)". -- Adam Olsen, aka Rhamphoryncus
Adam Olsen wrote:
Implementing a scheduler perhaps, but that can be omitted. Just give us the usage of the scheduler and how it's better than "yield Return(val)".
That's not really going to reveal much more than what you've seen in the previous example. I've started working on the scheduler, and it's actually turning out to be fairly simple. Including the implementation isn't going to make the example much longer, and I think it will be instructive. I'll explain what's going on at each step, so it won't be a code dump. -- Greg
On Wed, Feb 18, 2009 at 5:31 PM, Greg Ewing
Adam Olsen wrote:
Implementing a scheduler perhaps, but that can be omitted. Just give us the usage of the scheduler and how it's better than "yield Return(val)".
That's not really going to reveal much more than what you've seen in the previous example.
I've started working on the scheduler, and it's actually turning out to be fairly simple. Including the implementation isn't going to make the example much longer, and I think it will be instructive.
I'll explain what's going on at each step, so it won't be a code dump.
I'm eagerly looking forward to it. We should get some Twisted folks to buy in. After that, I really encourage you to start working on a reference implementation. (Oh, and please send your PEP to the PEP editors. See PEP 1 for how to format it, where to send it, etc.) -- --Guido van Rossum (home page: http://www.python.org/~guido/)
participants (6)
-
Adam Olsen
-
Greg Ewing
-
Guido van Rossum
-
Guillaume Chereau
-
Jesse Noller
-
Raymond Hettinger