[Python-ideas] Revised revised revised PEP on yield-from

Tue Feb 17 20:47:13 CET 2009

On Mon, Feb 16, 2009 at 10:31 PM, Greg Ewing
<greg.ewing at canterbury.ac.nz> wrote:
> Guido van Rossum wrote:
>
>> I don't quite
>> understand how I would write the function that is delegated to as
>> "yield from g(x)" nor do I quite see what the caller of the outer
>> generator should expect from successive next() or .send() calls.
>
> It should be able to expect whatever would happen if the
> body of the delegated-to generator were inlined into the
> delegating generator.

I understand that when I'm thinking of generators (as you saw in the
tree traversal example I posted).

My question was in the context of lightweight threads and your
proposal for the value returned by "yield from". I believe I now
understand what you are trying to do, but the way to think about it in
this case seems very different than when you're refactoring
generators. IIUC there will be some kind of "scheduler" that manages a
number of lightweight threads, each represented by a suspended stack
of generators, and a number of blocking resources like sockets or
mutexes. The scheduler knows what resource each thread is waiting for
(could also be ready to run or sleeping until a specific time) and
when the resource is ready it resumes the generator passing along
whatever value is required using .send(). E.g. on input, it could read
the data from the socket, or it could just pass a flag indicating that
the resource is ready and let the generator make the actual recv()
call. When a generator wants to access a resource, it uses "yield"
(not "yield from"!) to send a description of the resource needed to
the scheduler. When a generator wants to call another function that
might block, the other function must be written as a generator too,
and it is called using "yield from". The other function uses "yield"
to access blocking resources, and "return" to return a value to its
"caller" (really the generator that used "yield from").

I believe that Twisted has a similar scheme that doesn't have the
benefit of arbitrarily nested generators; I recall Phillip Eby
talkingabout this too. I've never used lightweight threads myself --
I'm a bit "old school" and would typically either use real OS threads,
like Java, or event-driven programming possibly with callbacks, like
Tcl/Tk. But I can see the utility of this approach and reluctantly
admit that the proposed semantics for the "yield from" return value
are just right for this approach. I do think that it is still requires
the user to be quite aware of what is going on behind the scenes, for
example to remember when to use "yield from" (for functions that have
been written to cooperate with the scheduler) and when to use regular
calls (for functions that cannot block) -- messing this up is quite
painful, e.g. forgetting to use "yield from" will probably produce a
pretty confusing error message. Also, it would seem you cannot write
functions running in lightweight threads that are also "ordinary"
generators, since yield is reserved for "calling" the scheduler.

I have a little example in my head that I might as well show here:
suppose we have a file-like object with a readline() method that calls
a read() method which in turn calls a fillbuf() function. If I want to
read a line from the file, I might write (assuming I am executing
inside a generator that is really used for light-weight threading, so
that "yield" communicates with the scheduler):

line = yield from f.readline()

The readline() method could naively be implemented as:

  def readline(self):
    line = []
    while True:
      c = self.read(1)
      if not c: break
      line.append(c)
      if c == '\n': break
    return ''.join(line)

The read() method could be:

  def read(self, n):
    if len(self.buf) < n:
      yield from self.fillbuf(n - len(self.buf))
    result, self.buf = self.buf[:n], self.buf[n:]
    return result

I'm leaving fillbuf() to the imagination of the reader; its
implementation depends on the protocol with the scheduler to actually
read data. Or there might be a lower-level unbuffered read() generator
that encapsulates the scheduler protocol.

I don't think I could add a generator to the file-like class that
would call readline() until the file is exhausted though, at least not
easily; code that is processing lines will have to use a while-loop
like this:

while True:
  line = yield from f.readline()
  if not line: break
  ...process line...

Trying to turn this into a generator like I can do with an ordinary
file-like object doesn't work:

def __iter__(self):
  while True:
    line = yield from self.readline()
    if not line: break
    yield line   ## ???????

This is because lightweight threads use yield to communicate with the
scheduler, and they cannot easily also use it to yield successive
values to their caller. I could imagine some kind of protocol where
yield always returns a tuple whose first value is a string or token
indicating what kind of yield it is, e.g. "yield" when it is returning
the next value from the readline-loop, and "scheduler" when it is
wanting to talk to the scheduler, but the caller would have to look
for this and it would become much uglier than just writing out the
while-loop.

> That's the core idea behind all of this -- being able to
> take a chunk of code containing yields, abstract it out
> and put it in another function, without the ouside world
> being any the wiser.
>
> We do this all the time with ordinary functions and
> don't ever question the utility of being able to do so.
> I'm at a bit of a loss to understand why people can't
> see the utility in being able to do the same thing
> with generator code.

I do, I do. It's the complication with the return value that I am
still questioning, since that goes beyond simply refactoring
generators.

> I take your point about needing a better generators-
> as-threads example, though, and I'll see if I can come
> up with something.

Right.

>> So, "return" is equivalent to "raise StopIteration" and "return
>> <value>" is equivalent to "raise StopIteration(<value>)"?
>
> Yes.

I apologize for even asking that bit, it was very clear in the PEP.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)