[Python-ideas] Revised**10 PEP on Yield-From
Erik Groeneveld
erik at cq2.org
Wed Apr 15 20:29:31 CEST 2009
Greg,
Please forgive me for hooking into this discussion so late. Below are
my late comments to your original PEP, and below those some new stuff.
I have been writing weightless/compose which does exactly what your
PEP is trying to accomplish. I'll check my stuff against this PEP.
I really appreciate your initiative! It helps me a lot.
2009/4/15 Greg Ewing <greg.ewing at canterbury.ac.nz>:
> Draft 11 of the PEP.
>
> Changes in this version:
>
> - GeneratorExit always calls close() and is always
> reraised.
>
> - Special handling of thrown-in StopIterations
> removed, since Guido doesn't think you should be
> doing that in the first place.
>
> - Expansion uses next(_i) instead of _i.next() and
> doesn't mention cacheing of methods.
>
> --
> Greg
>
> PEP: XXX
> Title: Syntax for Delegating to a Subgenerator
> Version: $Revision$
> Last-Modified: $Date$
> Author: Gregory Ewing <greg.ewing at canterbury.ac.nz>
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 13-Feb-2009
> Python-Version: 3.x
> Post-History:
>
>
> Abstract
> ========
>
> A syntax is proposed for a generator to delegate part of its
> operations to another generator. This allows a section of code
> containing 'yield' to be factored out and placed in another
> generator. Additionally, the subgenerator is allowed to return with a
> value, and the value is made available to the delegating generator.
>
> The new syntax also opens up some opportunities for optimisation when
> one generator re-yields values produced by another.
>
>
> Motivation
> ==========
>
> A Python generator is a form of coroutine, but has the limitation that
> it can only yield to its immediate caller. This means that a piece of
> code containing a ``yield`` cannot be factored out and put into a
> separate function in the same way as other code. Performing such a
> factoring causes the called function to itself become a generator, and
> it is necessary to explicitly iterate over this second generator and
> re-yield any values that it produces.
>
> If yielding of values is the only concern, this can be performed without
> much difficulty using a loop such as
>
> ::
>
> for v in g:
> yield v
>
> However, if the subgenerator is to interact properly with the caller
> in the case of calls to ``send()``, ``throw()`` and ``close()``, things
> become considerably more difficult. As will be seen later, the necessary
> code is very complicated, and it is tricky to handle all the corner cases
> correctly.
>
> A new syntax will be proposed to address this issue. In the simplest
> use cases, it will be equivalent to the above for-loop, but it will also
> handle the full range of generator behaviour, and allow generator code
> to be refactored in a simple and straightforward way.
>
>
> Proposal
> ========
>
> The following new expression syntax will be allowed in the body of a
> generator:
>
> ::
>
> yield from <expr>
>
These are the exact problems I can't solve neatly in weightless/compose:
> where <expr> is an expression evaluating to an iterable, from which an
> iterator is extracted. The iterator is run to exhaustion, during which
> time it yields and receives values directly to or from the caller of
> the generator containing the ``yield from`` expression (the
> "delegating generator").
this allows a programmer to express the intention of just returning a
generator or wanting to delegate the work to a 'subgenerator'.
Weightless/compose now just descends into every generator, while this
is certainly not always wanted. Great I think, I like the syntax.
> Furthermore, when the iterator is another generator, the subgenerator
> is allowed to execute a ``return`` statement with a value, and that
> value becomes the value of the ``yield from`` expression.
In Weightless/compose, after several different tries I settled for
mimicking returning a value by using raise StopIteration(returnvalue).
As return in a generator raises StopIteration(), I think it is very
natural to use return like this in a generator (if fact I wished it
would be possible sometimes, not being aware of python-ideas). So I
like it too.
> The full semantics of the ``yield from`` expression can be described
> in terms of the generator protocol as follows:
>
> * Any values that the iterator yields are passed directly to the
> caller.
Clear.
> * Any values sent to the delegating generator using ``send()``
> are passed directly to the iterator. If the sent value is None,
> the iterator's ``next()`` method is called. If the sent value is
> not None, the iterator's ``send()`` method is called. Any exception
> resulting from attempting to call ``next`` or ``send`` is raised
> in the delegating generator.
Clear. I have implemented this by just calling send(...) either with
None or with a value. The VM dispatches that to next() when the value
is None, I assume.
> * Exceptions other than GeneratorExit passed to the ``throw()`` method
> of the delegating generator are forwarded to the ``throw()`` method of
> the iterator. Any exception resulting from attempting to call ``throw()``
> are propagated to the delegating generator.
I let any Exception propagate using the throw() method. I believe
this will not correctly handle GeneratorExit as outlined in the
discussion before. I'll have to change this I think.
> * If a GeneratorExit exception is thrown into the delegating generator,
> the ``close()`` method of the iterator is called if it has one. If this
> call results in an exception, it is propagated to the delegating generator.
> Otherwise, the GeneratorExit is reraised in the delegating generator.
I have a hard time understanding what this would mean in a pure python
implementation. I added both bullets to my unittests to work it out
later.
> The implicit GeneratorExit resulting from closing the delegating
> generator is treated as though it were passed in using ``throw()``.
By "closing the delegating generator" you mean "from the outside, call
close() on it"? It then will raise the GeneratorExit exception, and I
understand it. I added a unittest as well.
>
> * The value of the ``yield from`` expression is the first argument
> to the ``StopIteration`` exception raised by the iterator when it
> terminates.
>
> * ``return expr`` in a generator causes ``StopIteration(expr)`` to
> be raised.
I assume that 'return 1 2 3' will have one return value being a tuple
(1,2,3) which is one argument to StopIteration(), and which is
unpacked when 'yield from' returns?
> Enhancements to StopIteration
> -----------------------------
>
> For convenience, the ``StopIteration`` exception will be given a
> ``value`` attribute that holds its first argument, or None if there
> are no arguments.
I am using StopIteration's 'args' atrribute? But after reading the
motivation below, it could indeed confuse other generators, and a
separate StopIteration would be better, I think.
>
> Formal Semantics
> ----------------
>
> Python 3 syntax is used in this section.
>
> 1. The statement
>
> ::
>
> RESULT = yield from EXPR
>
> is semantically equivalent to
>
> ::
>
> _i = iter(EXPR)
> try:
> _y = next(_i)
> except StopIteration as _e:
> _r = _e.value
> else:
> while 1:
> try:
> _s = yield _y
> except GeneratorExit:
> _m = getattr(_i, 'close', None)
> if _m is not None:
> _m()
> raise
> except:
> _m = getattr(_i, 'throw', None)
> if _m is not None:
> _y = _m(*sys.exc_info())
> else:
> raise
> else:
> try:
> if _s is None:
> _y = next(_i)
> else:
> _y = _i.send(_s)
> except StopIteration as _e:
> _r = _e.value
> break
> RESULT = _r
>
I'll take this one with me, as I really need some time to compare it
to my own code. I'll come back to it later.
> 2. In a generator, the statement
>
> ::
>
> return value
>
> is semantically equivalent to
>
> ::
>
> raise StopIteration(value)
>
> except that, as currently, the exception cannot be caught by ``except``
> clauses within the returning generator.
Clear.
> 3. The StopIteration exception behaves as though defined thusly:
>
> ::
>
> class StopIteration(Exception):
>
> def __init__(self, *args):
> if len(args) > 0:
> self.value = args[0]
> else:
> self.value = None
> Exception.__init__(self, *args)
>
I probably miss the point, could you explain why this is needed?
> Rationale
> =========
>
> The Refactoring Principle
> -------------------------
>
> The rationale behind most of the semantics presented above stems from
> the desire to be able to refactor generator code. It should be possible
> to take an section of code containing one or more ``yield`` expressions,
> move it into a separate function (using the usual techniques to deal
> with references to variables in the surrounding scope, etc.), and
> call the new function using a ``yield from`` expression.
>
> The behaviour of the resulting compound generator should be, as far as
> possible, exactly the same as the original unfactored generator in all
> situations, including calls to ``next()``, ``send()``, ``throw()`` and
> ``close()``.
>
> The semantics in cases of subiterators other than generators has been
> chosen as a reasonable generalization of the generator case.
Yes! Exactly. I just call this supporting 'program decomposition'.
For clearity, you could probably add the name of the refactoring, it
is called 'extract method' isn't it?
> Finalization
> ------------
>
> There was some debate as to whether explicitly finalizing the delegating
> generator by calling its ``close()`` method while it is suspended at a
> ``yield from`` should also finalize the subiterator. An argument against
> doing so is that it would result in premature finalization of the
> subiterator if references to it exist elsewhere.
>
> Consideration of non-refcounting Python implementations led to the
> decision that this explicit finalization should be performed, so that
> explicitly closing a factored generator has the same effect as doing
> so to an unfactored one in all Python implementations.
>
> The assumption made is that, in the majority of use cases, the subiterator
> will not be shared. The rare case of a shared subiterator can be
> accommodated by means of a wrapper that blocks ``throw()`` and ``close()``
> calls, or by using a means other than ``yield from`` to call the
> subiterator.
I agree completely. I went through some lenght to get proper
clean-up, and I solved it similarly.
> Generators as Threads
> ---------------------
>
> A motivation for generators being able to return values concerns the
> use of generators to implement lightweight threads. When using
> generators in that way, it is reasonable to want to spread the
> computation performed by the lightweight thread over many functions.
> One would like to be able to call a subgenerator as though it were an
> ordinary function, passing it parameters and receiving a returned
> value.
>
> Using the proposed syntax, a statement such as
>
> ::
>
> y = f(x)
>
> where f is an ordinary function, can be transformed into a delegation
> call
>
> ::
>
> y = yield from g(x)
>
> where g is a generator. One can reason about the behaviour of the
> resulting code by thinking of g as an ordinary function that can be
> suspended using a ``yield`` statement.
>
> When using generators as threads in this way, typically one is not
> interested in the values being passed in or out of the yields.
> However, there are use cases for this as well, where the thread is
> seen as a producer or consumer of items. The ``yield from``
> expression allows the logic of the thread to be spread over as
> many functions as desired, with the production or consumption of
> items occuring in any subfunction, and the items are automatically
> routed to or from their ultimate source or destination.
>
> Concerning ``throw()`` and ``close()``, it is reasonable to expect
> that if an exception is thrown into the thread from outside, it should
> first be raised in the innermost generator where the thread is suspended,
> and propagate outwards from there; and that if the thread is terminated
> from outside by calling ``close()``, the chain of active generators
> should be finalised from the innermost outwards.
Yes, I believe you make sure that:
try:
x = yield from y()
except SomeError:
return 'HELP'
actually does catch the SomeError exception when raised in y(), or one
it its descendants?
>
>
> Syntax
> ------
>
> The particular syntax proposed has been chosen as suggestive of its
> meaning, while not introducing any new keywords and clearly standing
> out as being different from a plain ``yield``.
>
Next section I skipped, I you don't mind.
>
> Optimisations
> -------------
>
> Using a specialised syntax opens up possibilities for optimisation
> when there is a long chain of generators. Such chains can arise, for
> instance, when recursively traversing a tree structure. The overhead
> of passing ``next()`` calls and yielded values down and up the chain
> can cause what ought to be an O(n) operation to become, in the worst
> case, O(n\*\*2).
>
> A possible strategy is to add a slot to generator objects to hold a
> generator being delegated to. When a ``next()`` or ``send()`` call is
> made on the generator, this slot is checked first, and if it is
> nonempty, the generator that it references is resumed instead. If it
> raises StopIteration, the slot is cleared and the main generator is
> resumed.
>
> This would reduce the delegation overhead to a chain of C function
> calls involving no Python code execution. A possible enhancement would
> be to traverse the whole chain of generators in a loop and directly
> resume the one at the end, although the handling of StopIteration is
> more complicated then.
>
>
> Use of StopIteration to return values
> -------------------------------------
>
> There are a variety of ways that the return value from the generator
> could be passed back. Some alternatives include storing it as an
> attribute of the generator-iterator object, or returning it as the
> value of the ``close()`` call to the subgenerator. However, the proposed
> mechanism is attractive for a couple of reasons:
>
> * Using a generalization of the StopIteration exception makes it easy
> for other kinds of iterators to participate in the protocol without
> having to grow an extra attribute or a close() method.
>
> * It simplifies the implementation, because the point at which the
> return value from the subgenerator becomes available is the same
> point at which the exception is raised. Delaying until any later
> time would require storing the return value somewhere.
>
> Originally it was proposed to simply extend StopIteration to accept
> a value. However, it was felt desirable by some to have a mechanism
> for detecting the erroneous use of a value-returning generator in a
> context that is not aware of generator return values. Using an
> exception that is a superclass of StopIteration means that code
> knowing about generator return values only has one exception to
> catch, and code that does not know about them will fail to catch
> the new exception.
I agree. And I begin to understand the need for that value attribute. Ok.
> [...]
For now I have one more fundamental question left over.
Will the the delegating generator remain on the call-stack or not?
The current behaviour is that a stack-frame is created for a
generator, which is not disposed when next()/send() returns, but kept
somewere. When a new call to next()/send() happens, the same
stack-frame is put back on the call-stack. This is crucial because it
acts as the o-so-valuable closure.
I came across this problem because I was writing code that traverses
the call-stack in order to find some place to put 'generator-local'
variables (like thread-local). My implementation in
weightless/compose does not physically keep the generators on the
call-stack (I don't know how to do that in Python), but keep it's own
stack. I would have to extend 'compose' to not only search on the
real call-stack but also traverse it own semi-call-stack if/when it
finds an instance of itself on the real call-stack.
I am writing code like:
def x():
somevar = 10
yield from y()
then it I write in y:
def y():
frame = currentframe().f_back
while 'somevar' not in frame.f_locals:
frame = frame.f_back
return frame.f_locals['somevar']
would this find the variable in x?
Best regards,
Erik
More information about the Python-ideas
mailing list