[Python-Dev] Generator details

Tim Peters tim_one@email.msn.com
Fri, 9 Jul 1999 03:47:36 -0400


Picking up where we left off, I like Guido's vision of generators fine.  The
"one frame" version I've described is in fact what Icon provides, and what
Guido is doing requires using coroutines instead in that language.  Guido's
is more flexible, and I'm not opposed to that <wink>.

OTOH, I *have* seen many a person (including me!) confused by the semantics
of coroutines in Icon, so I don't know how much of the additional
flexibility converts into additional confusion.  One thing I am sure of:
having debated the fine points of continuations recently, I'm incapable of
judging it harshly today <0.5 wink>.

> ...
>     def inorder(node):
>         if node.left: inorder(node.left)
>         suspend node
>         if node.right: inorder(node.right)

The first thing that struck me there is that I'm not sure to whom the
suspend transfers control.  In the one-frame flavor of generator, it's
always to the caller of the function that (lexically) contains the
"suspend".  Is it possible to keep this all straight if the "suspend" above
is changed to e.g.

    pass_it_back(node)

where

def pass_it_back(x):
    suspend x

?  I'm vaguely picturing some kind of additional frame state, a pointer to
the topmost frame that's "expecting" to receive a suspend.  (I see you
resolve this in a different way later, though.)

> ...
> I thought that tree traversal was one of Tim's first examples of
> generators; would I really have to use an explicit stack to create
> the traversal?

As before, still no <wink>, but the one-frame version does require an
unbroken *implicit* chain back to the intended receiver, with an explicit
"suspend" at every step back to that.  Let me rewrite the one-frame version
in a way that assumes less semantics from "suspend", instead building on the
already-assumed new smarts in "for":

def inorder(node):
    if node:
        for child in inorder(node.left):
            suspend child
        suspend node
        for child in inorder(node.right):
            suspend child

I hope this makes it clearer that the one-frame version spawns two *new*
generators for every non-None node, and in purely stack-like fashion (both
"recursing down" and "suspending up").

> Next, I want more clarity about the initialization and termination
> conditions.

Good idea.

> The Demo/thread/Generator.py version is very explicit about
> initialization: you instantiate the Generator class, passing it a
> function that takes a Generator instance as an argument; the function
> executes in a new thread.  (I guess I would've used a different
> interface now -- perhaps inheriting from the Generator class
> overriding a run() method.)

I would change my coroutine implementation similarly.

> For termination, the normal way to stop seems to be for the generator
> function to return (rather than calling g.put()), the consumer then gets
> an EOFError exception the next time it calls g.get().  There's also a
> way for either side to call g.kill() to stop the generator prematurely.

A perfectly serviceable interface, but "feels clumsy" in comparison to
normal for loops and e.g. reading lines from a file, where *visible*
exceptions aren't raised at the end.  I expect most sequences to terminate
before I do <wink>, so (visible) try/except isn't the best UI here.

> Let me try to translate that to a threadless implementation.  We could
> declare a simple generator as follows:
>
>     generator reverse(seq):
>         i = len(seq)
>         while i > 0:
>             i = i-1
>             suspend seq[i]
>
> This could be translated by the Python translator into the following,
> assuming a system class generator which provides the machinery for
> generators:
>
>     class reverse(generator):
>         def run(self, seq):
>             i = len(seq)
>             while i > 0:
>                 i = i-1
>                 self.suspend(seq[i])
>
> (Perhaps the identifiers generator, run and suspend would be spelled
> with __...__, but that's just clutter for now.)
>
> Now where Tim was writing examples like this:
>
>     for c in reverse("Hello world"):
>         print c,
>     print
>
> I'd like to guess what the underlying machinery would look like.  For
> argument's sake, let's assume the for loop recognizes that it's using
> a generator (or better, it always needs a generator, and when it's not
> a generator it silently implies a sequence-iterating generator).

In the end I expect these concepts could be unified, e.g. via a new class
__iterate__ method.  Then

for i in 42:

could fail simply because ints don't have a value in that slot, while lists
and tuples could inherit from SequenceIterator, pushing the generation of
the index range into the type instead of explicitly constructed by the eval
loop.

> So the translator could generate the following:
>
>     g = reverse("Hello world") # instantiate class reverse
>     while 1:
>         try:
>             c = g.resume()
>         except EOGError: # End Of Generator
>             break
>         print c,
>     print
>
> (Where g should really be a unique temporary local variable.)
>
> In this model, the g.resume() and g.suspend() calls have all the magic.
> They should not be accessible to the user.

This seems at odds with the later:

> (The user may write this code explicitly if they want to consume the
> generated elements in a different way than through a for loop.)

Whether it's at odds or not, I like the latter better.  When the machinery
is clean & well-designed, expose it!  Else in 2002 we'll be subjected to a
generatorhacks module <wink>.

> They are written in C so they can play games with frame objects.
>
> I guess that the *first* call to g.resume(), for a particular
> generator instance, should start the generator's run() method; run()
> is not activated by the instantiation of the generator.

This can work either way.  If it's more convenient to begin run() as part of
instantiation, the code for run() can start with an equivalent of

    if self.first_time:
        self.first_time = 0
        return

where self.first_time is set true by the constructor.  Then "the frame" will
exist from the start.  The first resume() will skip over that block and
launch into the code, while subsequent resume()s will never even see this
block:  almost free.

> Then run() runs until the first suspend() call, which causes the return
> from the resume() call to happen.  Subsequent resume() calls know that
> there's already is a frame (it's stored in the generator instance) and
simply
> continue its execution where it was.  If the run() method returns from
> the frame, the resume() call is made to raise EOGError (blah, bogus
> name) which signals the end of the loop.  (The user may write this
> code explicitly if they want to consume the generated elements in a
> different way than through a for loop.)

Yes, that parenthetical comment bears repeating <wink>.

> Looking at this machinery, I think the recursive generator that I
> wanted could be made to work, by explicitly declaring a generator
> subclass (instead of using the generator keyword, which is just
> syntactic sugar) and making calls to methods of self, e.g.:
>
>     class inorder(generator):
>         def run(self, node):
>             if node.left: self.run(node.left)
>             self.suspend(node)
>             if node.right: self.run(node.right)

Going way back to the top, this implies the

def pass_it_back(x):
    suspend x

indirection couldn't work -- unless pass_it_back were also a method of
inorder.  Not complaining, just trying to understand.  Once you generalize,
it's hard to know when to stop.

> The generator machinery would (ab)use the fact that Python frames
> don't necessarily have to be linked in a strict stack order;

If you call *this* abuse, what words remain to vilify what Christian is
doing <wink>?

> the generator gets a pointer to the frame to resume from resume(),

Ah!  That addresses my first question.  Are you implicitly assuming a
"stackless" eval loop here?  Else resuming the receiving frame would appear
to push another C stack frame for each value delivered, ever deeper.  The
"one frame" version of generators doesn't have this headache (since a
suspend *returns* to its immediate caller there -- it doesn't *resume* its
caller).

> and there's a "bottom" frame which, when hit, raises the EOGError
> exception.

Although desribed at the end, this is something set up at the start, right?
To trap a plain return from the topmost invocation of the generator.

> All currently active frames belonging to the generator stay alive
> while another resume() is possible.

And those form a linear chain from the most-recent suspend() back to the
primal resume().  Which appears to address an earlier issue not brought up
in this message:  this provides a well-defined & intuitively clear path for
exceptions to follow, yes?  I'm not sure about coroutines, but there's
something wrong with a generator implementation if the guy who kicks it off
can't see errors raised by the generator's execution!  This doesn't appear
to be a problem here.

> All this is possible by the introduction of an explicit generator
> object.  I think Tim had an implementation in mind where the standard
> return pointer in the frame is the only thing necessary; actually, I
> think the return pointer is stored in the calling frame, not in the
> called frame

What I've had in mind is what Majewski implemented 5 years ago, but lost
interest in because it couldn't be extended to those blasted continuations
<wink>.  The called frame points back to the calling frame via f->f_back (of
course), and I think that's all the return info the one-frame version needs.
I expect I'm missing your meaning here.

> (Christian?  Is this so in your version?).  That shouldn't make a
> difference, except that it's not clear to me how to reference the frame
> (in the explicitly coded version, which has to exist at least at the
> bytecode level).

"The" frame being which frame specifically, and refrenced from where?
Regardless, it must be solvable, since if Christian can (& he thinks he can,
& I believe him <wink>) expose a call/cc variant, the generator class could
be coded entirely in Python.

> With classic coroutines, I believe that there's no difference between
> the first call and subsequent calls to the coroutine.  This works in
> the Knuth world where coroutines and recursion don't go together;

That's also a world where co-transfers are implemented via funky
self-modifying assembler, custom-crafted for the exact number of coroutines
you expect to be using -- I don't recommend Knuth as a guide to
*implementing* these beasts <0.3 wink>.

That said, yes, provided the coroutines objects all exist, there's nothing
special about the first call.  About "provided that":  if your coroutine
objects A and B have "run" methods, you dare not invoke A.run() before B has
been constructed (else the first instance of B.transfer() in A chokes --
there's no object to transfer *to*).  So, in practice, I think instantiation
is still divorced from initiation.  One possibility is to hide all that in a

    cobegin(list_of_coroutine_classes_to_instantiate_and_run)

function.  But then naming the instances is a puzzle.

> but at least for generators I would hope that it's possible for multiple
> instances of the same generator to be active simultaneously (e.g. I
> could be reversing over a list of files and then reverse each of the
> lines in the file; this uses separate instances of the reverse()
> generator).

Since that's the trick the "one frame" generators *rely* on for recursion,
it's surely not a problem in your stronger version.

Note that my old coroutine implementation did allow for multiple instances
of a coroutine, although the examples posted with it didn't illustrate that.

The weakness of coroutines in practice is (in my experience) the requirement
that you *name* the target of a transfer.  This is brittle; e.g., in the
pipeline example I posted, each stage had to know the names of the stages on
either side of it.  By adopting a

    target.transfer(optional_value)

primitive it's possible to *pass in* the target object as an argument to the
coroutine doing the transfer.  Then "the names" are all in the setup, and
don't pollute the bodies of the coroutines (e.g., each coroutine in the
pipeline example could have arguments named "stdin" and "stdout").  I
haven't seen a system that *does* this, but it's so obviously the right
thing to do it's not worth saying any more about <wink>.

> So we need a way to reference the generator instance separately from
> the generator constructor.  The machinery I sketched above solves this.
>
> After Tim has refined or rebutted this, I think I'll be able to
> suggest what to do for coroutines.

Please do.  Whether or not it's futile, it's fun <wink>.

hmm-haven't-had-enough-of-that-lately!-ly y'rs  - tim