[Python-Dev] Minimal 'stackless' PEP using generators?

Mon Aug 23 19:18:28 CEST 2004

At 12:34 PM 8/23/04 -0400, Clark C. Evans wrote:
>On Mon, Aug 23, 2004 at 11:56:04AM -0400, Phillip J. Eby wrote:
>| It doesn't seem to me to actually help anything.  You can already do this
>| using a simple wrapper object that maintains a stack of active
>| generators, as I do in 'peak.events'.
>
>Could you provide an example?  The problem this proposal solves is
>straight-foward -- it is tedious and slow to have intermediate
>generators do stuff like:
>
>     def middle():
>     """ intermediate generator _only_ sees one and two """
>     for x in top():
>!       if isinstance(x,X):
>!           yield x
>         print "middle", x
>         yield x
>
>This extra step is tedious and also slow; especially if one has lots of
>yield statements that cooperate.

'peak.events' uses "Task" objects that maintain a stack of active 
generators.  The Task receives yields from the "innermost" generator 
directly, without them being passed through by intermediate generators.  If 
the value yielded is *not* a control value, the Task object pops the 
generator stack, and resumes the previously suspended generator.  A "magic" 
function, 'events.resume()' retrieves the value from the Task inside the 
stopped generator.

Basically, this mechanism doesn't pass control values through multiple 
tests and generator frames: control values are consumed immediately by the 
Task.  This makes it easy to suspend nested generators while waiting for 
some event, such as socket readability, a timeout, a Twisted "Deferred", 
etc.  Yielding an "event" object like one of the aforementioned items 
causes the Task to return to its caller (the event loop) after requesting a 
callback for the appropriate event.  When the callback re-invokes the 
thread, it saves the value associated with the event, if any, for 
'events.resume()' to retrieve when the topmost generator is resumed.

Also, 'events.resume()' supports passing errors from one generator to the 
next, so that it's "as if" the generators execute in a nested fashion.  The 
drawback is that you must invoke events.resume() after each yield, but this 
is *much* less intrusive than requiring generators to pass through results 
from all nested generators.  Take a look at:

     http://cvs.eby-sarna.com/PEAK/src/peak/events/

In particular, the 'interfaces' and 'event_threads' modules.  Here's a 
usage example, a simple Task procedure:

     @events.taskFactory

     def monitorBusy(self):

         # get a "readable" event on this socket
         untilReadable = self.eventLoop.readable(self)

         while True:
             # Wait until we have stream activity
             yield untilReadable; events.resume()

             # Is everybody busy?
             if self.busyCount()==self.childCount():
                 self.supervisor.requestStart()

             # Wait until the child or busy count changes before proceeding
             yield events.AnyOf(self.busyCount,self.childCount); 
events.resume()

This task waits until a listener socket is readable (i.e. an incoming 
connection is pending), and then asks the process supervisor to start more 
processes if all the child processes are busy.  It then waits until either 
the busy count or the child process count changes, before it waits for 
another incoming connection.

Basically, if you're invoking a sub-generator, you do:

     yield subGenerator(arguments); result=events.resume()

This is if you're calling a sub-generator that only returns one "real" 
result.  You needn't worry about passing through control values, because 
the current generator won't be resumed until the subgenerator yields a 
non-control value.

If you're invoking a sub-generator that you intend to *iterate over*, 
however, and that generator can suspend on events, it's a bit more complex:

     iterator = subGenerator(arguments)

     while True:
         yield iterator; nextItem = events.resume()

         if nextItem is NOT_GIVEN:   # sentinel value
              break

         # body of loop goes here, using 'nextItem'

This is not very convenient, but I don't find it all that common to have 
data I'm iterating over in such a fashion, because 'peak.events' programs 
tends to have "infinite" streams that are organized as event sources in 
"pipe and filter" fashion.  So, you tend to end up with Tasks that only 
have one generator running anyway, except for things that are more like 
"subroutines" than real generators, because you only expect one real return 
value from them, anyway.

peak.events can work with Twisted, by the way, if you have it 
installed.  For example, this:

     yield aDeferred; result = events.resume()

suspends the generator until the Deferred fires, and then the result will 
be placed in 'result' upon resumption of the generator.  If the Deferred 
triggers an "errback", the call to 'events.resume()' will reraise the 
error, inside the current generator.

It would be nice if there were some way to "accept" data and exceptions 
within a generator that didn't require the 'events.resume' hack, e.g.:

     result = yield aDeferred

would be really nice, especially if 'result' could cause an exception to be 
raised.  I was hoping that this was something along the lines of what you 
were proposing.  E.g. if generator-iterators could take arguments to 
'next()' that would let you do this.  I believe there's already a rejected 
PEP covering the issue of communicating "into" generators.

Perhaps there should be a "simple coroutines" PEP, that doesn't try to 
extend generators into coroutines, but instead treats coroutines as a 
first-class animal that just happens to be implemented using some of the 
same techniques "under the hood".