[Python-ideas] Revised**10 PEP on Yield-From

Wed Apr 15 20:29:31 CEST 2009

Greg,

Please forgive me for hooking into this discussion so late.  Below are
my late comments to your original PEP, and below those some new stuff.
 I have been writing weightless/compose which does exactly what your
PEP is trying to accomplish.  I'll check my stuff against this PEP.

I really appreciate your initiative!  It helps me a lot.

2009/4/15 Greg Ewing <greg.ewing at canterbury.ac.nz>:
> Draft 11 of the PEP.
>
> Changes in this version:
>
> - GeneratorExit always calls close() and is always
>  reraised.
>
> - Special handling of thrown-in StopIterations
>  removed, since Guido doesn't think you should be
>  doing that in the first place.
>
> - Expansion uses next(_i) instead of _i.next() and
>  doesn't mention cacheing of methods.
>
> --
> Greg
>
> PEP: XXX
> Title: Syntax for Delegating to a Subgenerator
> Version: $Revision$
> Last-Modified: $Date$
> Author: Gregory Ewing <greg.ewing at canterbury.ac.nz>
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 13-Feb-2009
> Python-Version: 3.x
> Post-History:
>
>
> Abstract
> ========
>
> A syntax is proposed for a generator to delegate part of its
> operations to another generator. This allows a section of code
> containing 'yield' to be factored out and placed in another
> generator. Additionally, the subgenerator is allowed to return with a
> value, and the value is made available to the delegating generator.
>
> The new syntax also opens up some opportunities for optimisation when
> one generator re-yields values produced by another.
>
>
> Motivation
> ==========
>
> A Python generator is a form of coroutine, but has the limitation that
> it can only yield to its immediate caller.  This means that a piece of
> code containing a ``yield`` cannot be factored out and put into a
> separate function in the same way as other code.  Performing such a
> factoring causes the called function to itself become a generator, and
> it is necessary to explicitly iterate over this second generator and
> re-yield any values that it produces.
>
> If yielding of values is the only concern, this can be performed without
> much difficulty using a loop such as
>
> ::
>
>    for v in g:
>        yield v
>
> However, if the subgenerator is to interact properly with the caller
> in the case of calls to ``send()``, ``throw()`` and ``close()``, things
> become considerably more difficult.  As will be seen later, the necessary
> code is very complicated, and it is tricky to handle all the corner cases
> correctly.
>
> A new syntax will be proposed to address this issue. In the simplest
> use cases, it will be equivalent to the above for-loop, but it will also
> handle the full range of generator behaviour, and allow generator code
> to be refactored in a simple and straightforward way.
>
>
> Proposal
> ========
>
> The following new expression syntax will be allowed in the body of a
> generator:
>
> ::
>
>    yield from <expr>
>

These are the exact problems I can't solve neatly in weightless/compose:

> where <expr> is an expression evaluating to an iterable, from which an
> iterator is extracted. The iterator is run to exhaustion, during which
> time it yields and receives values directly to or from the caller of
> the generator containing the ``yield from`` expression (the
> "delegating generator").

this allows a programmer to express the intention of just returning a
generator or wanting to delegate the work to a 'subgenerator'.
Weightless/compose now just descends into every generator, while this
is certainly not always wanted.  Great I think, I like the syntax.

> Furthermore, when the iterator is another generator, the subgenerator
> is allowed to execute a ``return`` statement with a value, and that
> value becomes the value of the ``yield from`` expression.

In Weightless/compose, after several different tries I settled for
mimicking returning a value by using raise StopIteration(returnvalue).
 As return in a generator raises StopIteration(), I think it is very
natural to use return like this in a generator (if fact I wished it
would be possible sometimes, not being aware of python-ideas).  So I
like it too.

> The full semantics of the ``yield from`` expression can be described
> in terms of the generator protocol as follows:
>
>    * Any values that the iterator yields are passed directly to the
>      caller.

Clear.

>    * Any values sent to the delegating generator using ``send()``
>      are passed directly to the iterator. If the sent value is None,
>      the iterator's ``next()`` method is called. If the sent value is
>      not None, the iterator's ``send()`` method is called. Any exception
>      resulting from attempting to call ``next`` or ``send`` is raised
>      in the delegating generator.

Clear. I have implemented this by just calling send(...) either with
None or with a value.  The VM dispatches that to next() when the value
is None, I assume.

>    * Exceptions other than GeneratorExit passed to the ``throw()`` method
>      of the delegating generator are forwarded to the ``throw()`` method of
>      the iterator. Any exception resulting from attempting to call ``throw()``
>      are propagated to the delegating generator.

I let any Exception propagate using the throw() method.  I believe
this will not correctly handle GeneratorExit as outlined in the
discussion before.  I'll have to change this I think.

>    * If a GeneratorExit exception is thrown into the delegating generator,
>      the ``close()`` method of the iterator is called if it has one. If this
>      call results in an exception, it is propagated to the delegating generator.
>      Otherwise, the GeneratorExit is reraised in the delegating generator.

I have a hard time understanding what this would mean in a pure python
implementation.  I added both bullets to my unittests to work it out
later.

>      The implicit GeneratorExit resulting from closing the delegating
>      generator is treated as though it were passed in using ``throw()``.

By "closing the delegating generator" you mean "from the outside, call
close() on it"?  It then will raise the GeneratorExit exception, and I
understand it. I added a unittest as well.

>
>    * The value of the ``yield from`` expression is the first argument
>      to the ``StopIteration`` exception raised by the iterator when it
>      terminates.
>
>    * ``return expr`` in a generator causes ``StopIteration(expr)`` to
>      be raised.

I assume that 'return 1 2 3' will have one return value being a tuple
(1,2,3) which is one argument to StopIteration(), and which is
unpacked when 'yield from' returns?

> Enhancements to StopIteration
> -----------------------------
>
> For convenience, the ``StopIteration`` exception will be given a
> ``value`` attribute that holds its first argument, or None if there
> are no arguments.

I am using StopIteration's 'args' atrribute?  But after reading the
motivation below, it could indeed confuse other generators, and a
separate StopIteration would be better, I think.

>
> Formal Semantics
> ----------------
>
> Python 3 syntax is used in this section.
>
> 1. The statement
>
> ::
>
>    RESULT = yield from EXPR
>
> is semantically equivalent to
>
> ::
>
>    _i = iter(EXPR)
>    try:
>        _y = next(_i)
>    except StopIteration as _e:
>        _r = _e.value
>    else:
>        while 1:
>            try:
>                _s = yield _y
>            except GeneratorExit:
>                _m = getattr(_i, 'close', None)
>                if _m is not None:
>                    _m()
>                raise
>            except:
>                _m = getattr(_i, 'throw', None)
>                if _m is not None:
>                    _y = _m(*sys.exc_info())
>                else:
>                    raise
>            else:
>                try:
>                    if _s is None:
>                        _y = next(_i)
>                    else:
>                        _y = _i.send(_s)
>                except StopIteration as _e:
>                    _r = _e.value
>                    break
>    RESULT = _r
>

I'll take this one with me, as I really need some time to compare it
to my own code. I'll come back to it later.

> 2. In a generator, the statement
>
> ::
>
>    return value
>
> is semantically equivalent to
>
> ::
>
>    raise StopIteration(value)
>
> except that, as currently, the exception cannot be caught by ``except``
> clauses within the returning generator.

Clear.

> 3. The StopIteration exception behaves as though defined thusly:
>
> ::
>
>   class StopIteration(Exception):
>
>       def __init__(self, *args):
>           if len(args) > 0:
>               self.value = args[0]
>           else:
>               self.value = None
>           Exception.__init__(self, *args)
>

I probably miss the point, could you explain why this is needed?

> Rationale
> =========
>
> The Refactoring Principle
> -------------------------
>
> The rationale behind most of the semantics presented above stems from
> the desire to be able to refactor generator code. It should be possible
> to take an section of code containing one or more ``yield`` expressions,
> move it into a separate function (using the usual techniques to deal
> with references to variables in the surrounding scope, etc.), and
> call the new function using a ``yield from`` expression.
>
> The behaviour of the resulting compound generator should be, as far as
> possible, exactly the same as the original unfactored generator in all
> situations, including calls to ``next()``, ``send()``, ``throw()`` and
> ``close()``.
>
> The semantics in cases of subiterators other than generators has been
> chosen as a reasonable generalization of the generator case.

Yes!  Exactly.  I just call this supporting 'program decomposition'.
For clearity, you could probably add the name of the refactoring, it
is called 'extract method' isn't it?

> Finalization
> ------------
>
> There was some debate as to whether explicitly finalizing the delegating
> generator by calling its ``close()`` method while it is suspended at a
> ``yield from`` should also finalize the subiterator. An argument against
> doing so is that it would result in premature finalization of the
> subiterator if references to it exist elsewhere.
>
> Consideration of non-refcounting Python implementations led to the
> decision that this explicit finalization should be performed, so that
> explicitly closing a factored generator has the same effect as doing
> so to an unfactored one in all Python implementations.
>
> The assumption made is that, in the majority of use cases, the subiterator
> will not be shared. The rare case of a shared subiterator can be
> accommodated by means of a wrapper that blocks ``throw()`` and ``close()``
> calls, or by using a means other than ``yield from`` to call the
> subiterator.

I agree completely.  I went through some lenght to get proper
clean-up, and I solved it similarly.

> Generators as Threads
> ---------------------
>
> A motivation for generators being able to return values concerns the
> use of generators to implement lightweight threads.  When using
> generators in that way, it is reasonable to want to spread the
> computation performed by the lightweight thread over many functions.
> One would like to be able to call a subgenerator as though it were an
> ordinary function, passing it parameters and receiving a returned
> value.
>
> Using the proposed syntax, a statement such as
>
> ::
>
>    y = f(x)
>
> where f is an ordinary function, can be transformed into a delegation
> call
>
> ::
>
>    y = yield from g(x)
>
> where g is a generator. One can reason about the behaviour of the
> resulting code by thinking of g as an ordinary function that can be
> suspended using a ``yield`` statement.
>
> When using generators as threads in this way, typically one is not
> interested in the values being passed in or out of the yields.
> However, there are use cases for this as well, where the thread is
> seen as a producer or consumer of items. The ``yield from``
> expression allows the logic of the thread to be spread over as
> many functions as desired, with the production or consumption of
> items occuring in any subfunction, and the items are automatically
> routed to or from their ultimate source or destination.
>
> Concerning ``throw()`` and ``close()``, it is reasonable to expect
> that if an exception is thrown into the thread from outside, it should
> first be raised in the innermost generator where the thread is suspended,
> and propagate outwards from there; and that if the thread is terminated
> from outside by calling ``close()``, the chain of active generators
> should be finalised from the innermost outwards.

Yes, I believe you make sure that:

try:
    x = yield from y()
except SomeError:
   return 'HELP'

actually does catch the SomeError exception when raised in y(), or one
it its descendants?

>
>
> Syntax
> ------
>
> The particular syntax proposed has been chosen as suggestive of its
> meaning, while not introducing any new keywords and clearly standing
> out as being different from a plain ``yield``.
>

Next section I skipped, I you don't mind.

>
> Optimisations
> -------------
>
> Using a specialised syntax opens up possibilities for optimisation
> when there is a long chain of generators.  Such chains can arise, for
> instance, when recursively traversing a tree structure.  The overhead
> of passing ``next()`` calls and yielded values down and up the chain
> can cause what ought to be an O(n) operation to become, in the worst
> case, O(n\*\*2).
>
> A possible strategy is to add a slot to generator objects to hold a
> generator being delegated to.  When a ``next()`` or ``send()`` call is
> made on the generator, this slot is checked first, and if it is
> nonempty, the generator that it references is resumed instead.  If it
> raises StopIteration, the slot is cleared and the main generator is
> resumed.
>
> This would reduce the delegation overhead to a chain of C function
> calls involving no Python code execution.  A possible enhancement would
> be to traverse the whole chain of generators in a loop and directly
> resume the one at the end, although the handling of StopIteration is
> more complicated then.
>
>
> Use of StopIteration to return values
> -------------------------------------
>
> There are a variety of ways that the return value from the generator
> could be passed back. Some alternatives include storing it as an
> attribute of the generator-iterator object, or returning it as the
> value of the ``close()`` call to the subgenerator. However, the proposed
> mechanism is attractive for a couple of reasons:
>
> * Using a generalization of the StopIteration exception makes it easy
>  for other kinds of iterators to participate in the protocol without
>  having to grow an extra attribute or a close() method.
>
> * It simplifies the implementation, because the point at which the
>  return value from the subgenerator becomes available is the same
>  point at which the exception is raised. Delaying until any later
>  time would require storing the return value somewhere.
>
> Originally it was proposed to simply extend StopIteration to accept
> a value. However, it was felt desirable by some to have a mechanism
> for detecting the erroneous use of a value-returning generator in a
> context that is not aware of generator return values. Using an
> exception that is a superclass of StopIteration means that code
> knowing about generator return values only has one exception to
> catch, and code that does not know about them will fail to catch
> the new exception.

I agree. And I begin to understand the need for that value attribute. Ok.

> [...]

For now I have one more fundamental question left over.

Will the the delegating generator remain on the call-stack or not?

The current behaviour is that a stack-frame is created for a
generator, which is not disposed when next()/send() returns, but kept
somewere.  When a new call to next()/send() happens, the same
stack-frame is put back on the call-stack.  This is crucial because it
acts as the o-so-valuable closure.

I came across this problem because I was writing code that traverses
the call-stack in order to find some place to put 'generator-local'
variables (like thread-local).  My implementation in
weightless/compose does not physically keep the generators on the
call-stack (I don't know how to do that in Python), but keep it's own
stack.  I would have to extend 'compose' to not only search on the
real call-stack but also traverse it own semi-call-stack if/when it
finds an instance of itself on the real call-stack.

I am writing code like:

def x():
    somevar = 10
    yield from y()

then it I write in y:

def y():
        frame = currentframe().f_back
        while 'somevar' not in frame.f_locals:
            frame = frame.f_back
        return frame.f_locals['somevar']

would this find the variable in x?

Best regards,
Erik