[Python-Dev] cooperative generators

Mon Aug 18 06:28:22 EDT 2003

I would like to get some feedback on some thoughts that I've been 
having about cooperative multitasking using generators.  After some 
attempts to change that 'flow' module [1] into a C module, I got thinking 
about how to better support my requirements with a relatively small 
change to the Python interpreter.

Anyway, if you would humor me a bit, perhaps you can see what I'm 
getting at; perhaps the change to Python could be relatively localized 
and simple.  And Python would be quite more useful in cooperative 
multitasking contexts.

First, let me provide an application context.  

  Suppose you are building a webpage in response to an HTTP request.  
  Further suppose that the information for this web page comes from 
  two sources, a PostgreSQL database and an OpenLDAP directory.   Now 
  assume that you want to make your page building modular, that is, 
  broken into several operations with sub operations.   Here is, 
  perhaps a call graph of how the page would be done:

   buildPage
    |       buildHeader
    |       |    writeTopOfHeader
    |       |    writeMetaKeywords
    |       |       queryLDAP* (returns iteratable keyword sequence)
    |       |    writeRestOfHeader
    |       buildNavigator
    |       |    writeTopOfNavigator
    |       |    writeNavigatorRows
    |       |         queryLDAP* (to get items for the Navigator)
    |       |    writeRestOfNavigator
    |       buildGrid
    |       |    writeTopOfGrid
    |       |    writeRows
    |       |       queryDatabase* (returns sequence of rows )
    |       writeFooter

  Assume that each of these items is a generator, and that queryLDAP,
  and queryDatabase generators either yield a value (in the sequence
  that they return) or return a special sentinel 'Cooperate'.   Then,
  assume that you have a reactor which is building N pages at a time.
  To make this magic work, each one of the intermediate generators
  (buildPage, buildGrid, writeRows) on the way to a leaf generator
  (queryDatabase) must explicitly check for this 'Cooperate' special
  value, and then yield themselves with this same value.   In this
  way, you can 'pause' the whole chain of generators.   The flow
  module [1] tries to make this easy, and makes the logic for 
  'Cooperate' to work by scheduling each page to be build.

Second, let me describe the problem (or irritant)

  Well, what happens in the flow module, is that each one of the 
  intermediate generators needs to be 'wrapped' to handle exceptions 
  and other concerns.  This wrapper slows things down substantially,
  especially in tight loops.   Furthermore, each intermediate generator
  must be made 'aware' that it may be paused (an unnecessary ickyness)
  for example, the following code:

      for x in parentGenerator:
           # do something with X

  has to be rewritten as:

     parentGenerator = flow.wrap(parentGenerator)
     yield parentGenerator
     for x in parentGenerator:
         # do something with X
         yeild parentGenerator

  It works, but it is just stuff that I think could be done much 
  better in the guts of python itself.  And without all of the 
  tricks needed to coax exceptions into working as you'd expect.

Third, let me describe what I 'think' would be a nice solution

  I'd like a special value, lets call it Pause, which can be given
  to a yield statement.   When this is encountered by the Python
  interpreter, it bypasses all of the intermediate generators
  and goes to the most recent non-generator.  For example, in the
  a queryDatabase yield of Pause would bypass the chain 
  (buildPage, buildGrid, writeRows) and the caller of buildPage's
  next() method would receive the Pause object.  Then, when the 
  next call to buildPage's next() method is invoked, it would resume 
  the top level generator (queryDatabase) and if that generator 
  yields a non-Pause value, the stack would unwind as normal.  

  So, in an attempt to be more explicit:

  - Let a generator chain C be composed of three generators, g, g', g''
    which are calling each other.   Let f be a function which is 
    iterating over the generator g.

  - Let the generator g'' yield a value p, which is a subclass of a 
    special 'Pause' built-in class.

  - At this point, the Python evaluator goes down the stack frame to
    find the first non-generator in the stack, in this case, f.

  - And then the evaluator creates a new instance zzz of a PauseIter
    object.  Then, every instance of g in f is replaced with zzz.
    At this point, zzz is initialized with g and g'', as the head and
    tail of the paused generator.

  - The function f is given the value p as its result of g.next()

  - When the function f calls g.next() again (which actually calls
    zzz.next() instead), the generator g'' is resumed.   

    - If g'' yield another Pause object, then this object is passed 
      back to the function f, and things continue

    - If g'' yield a non-Pause object, then f() is rewritten to 
      link to g again, and the non-Pause object is handed to g'
      so that normal processing proceeds

    - If g'' creates an exception, then f() is rewritten to 
      link to g again, and the exception object is percolated
      down to g' for normal processing.

Fourth, in conclusion

  I think with a chance similar to above, Cooperative Multitasking
  in Python with the ability to break-down generators into 
  sub-generators becomes easy to manage (a function f can simply
  keep each micro-thread in a queue, etc.).

  I'm not sure if I understand the implications of all of this,
  and in particular, how hard a ceval.c would be, but I'd be
  very interested to hear your opinion.

Kind Regards,

Clark