[Python-Dev] cooperative generators
Clark C. Evans
cce at clarkevans.com
Mon Aug 18 06:28:22 EDT 2003
I would like to get some feedback on some thoughts that I've been
having about cooperative multitasking using generators. After some
attempts to change that 'flow' module  into a C module, I got thinking
about how to better support my requirements with a relatively small
change to the Python interpreter.
Anyway, if you would humor me a bit, perhaps you can see what I'm
getting at; perhaps the change to Python could be relatively localized
and simple. And Python would be quite more useful in cooperative
First, let me provide an application context.
Suppose you are building a webpage in response to an HTTP request.
Further suppose that the information for this web page comes from
two sources, a PostgreSQL database and an OpenLDAP directory. Now
assume that you want to make your page building modular, that is,
broken into several operations with sub operations. Here is,
perhaps a call graph of how the page would be done:
| | writeTopOfHeader
| | writeMetaKeywords
| | queryLDAP* (returns iteratable keyword sequence)
| | writeRestOfHeader
| | writeTopOfNavigator
| | writeNavigatorRows
| | queryLDAP* (to get items for the Navigator)
| | writeRestOfNavigator
| | writeTopOfGrid
| | writeRows
| | queryDatabase* (returns sequence of rows )
Assume that each of these items is a generator, and that queryLDAP,
and queryDatabase generators either yield a value (in the sequence
that they return) or return a special sentinel 'Cooperate'. Then,
assume that you have a reactor which is building N pages at a time.
To make this magic work, each one of the intermediate generators
(buildPage, buildGrid, writeRows) on the way to a leaf generator
(queryDatabase) must explicitly check for this 'Cooperate' special
value, and then yield themselves with this same value. In this
way, you can 'pause' the whole chain of generators. The flow
module  tries to make this easy, and makes the logic for
'Cooperate' to work by scheduling each page to be build.
Second, let me describe the problem (or irritant)
Well, what happens in the flow module, is that each one of the
intermediate generators needs to be 'wrapped' to handle exceptions
and other concerns. This wrapper slows things down substantially,
especially in tight loops. Furthermore, each intermediate generator
must be made 'aware' that it may be paused (an unnecessary ickyness)
for example, the following code:
for x in parentGenerator:
# do something with X
has to be rewritten as:
parentGenerator = flow.wrap(parentGenerator)
for x in parentGenerator:
# do something with X
It works, but it is just stuff that I think could be done much
better in the guts of python itself. And without all of the
tricks needed to coax exceptions into working as you'd expect.
Third, let me describe what I 'think' would be a nice solution
I'd like a special value, lets call it Pause, which can be given
to a yield statement. When this is encountered by the Python
interpreter, it bypasses all of the intermediate generators
and goes to the most recent non-generator. For example, in the
a queryDatabase yield of Pause would bypass the chain
(buildPage, buildGrid, writeRows) and the caller of buildPage's
next() method would receive the Pause object. Then, when the
next call to buildPage's next() method is invoked, it would resume
the top level generator (queryDatabase) and if that generator
yields a non-Pause value, the stack would unwind as normal.
So, in an attempt to be more explicit:
- Let a generator chain C be composed of three generators, g, g', g''
which are calling each other. Let f be a function which is
iterating over the generator g.
- Let the generator g'' yield a value p, which is a subclass of a
special 'Pause' built-in class.
- At this point, the Python evaluator goes down the stack frame to
find the first non-generator in the stack, in this case, f.
- And then the evaluator creates a new instance zzz of a PauseIter
object. Then, every instance of g in f is replaced with zzz.
At this point, zzz is initialized with g and g'', as the head and
tail of the paused generator.
- The function f is given the value p as its result of g.next()
- When the function f calls g.next() again (which actually calls
zzz.next() instead), the generator g'' is resumed.
- If g'' yield another Pause object, then this object is passed
back to the function f, and things continue
- If g'' yield a non-Pause object, then f() is rewritten to
link to g again, and the non-Pause object is handed to g'
so that normal processing proceeds
- If g'' creates an exception, then f() is rewritten to
link to g again, and the exception object is percolated
down to g' for normal processing.
Fourth, in conclusion
I think with a chance similar to above, Cooperative Multitasking
in Python with the ability to break-down generators into
sub-generators becomes easy to manage (a function f can simply
keep each micro-thread in a queue, etc.).
I'm not sure if I understand the implications of all of this,
and in particular, how hard a ceval.c would be, but I'd be
very interested to hear your opinion.
More information about the Python-Dev