
I've re-worded things yet again to nail the semantics down as per recent discussions. Briefly: * send(None) converted to next() upon delegation * send(not_None) raises exception if no send() method * throw() and close() ignore missing methods No longer describing semantics in terms of "direct communication". Fixed a bug in the expansion (return value of throw() was getting lost). PEP: XXX Title: Syntax for Delegating to a Subgenerator Version: $Revision$ Last-Modified: $Date$ Author: Gregory Ewing <greg.ewing@canterbury.ac.nz> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 13-Feb-2009 Python-Version: 2.7 Post-History: Abstract ======== A syntax is proposed for a generator to delegate part of its operations to another generator. This allows a section of code containing 'yield' to be factored out and placed in another generator. Additionally, the subgenerator is allowed to return with a value, and the value is made available to the delegating generator. The new syntax also opens up some opportunities for optimisation when one generator re-yields values produced by another. Proposal ======== The following new expression syntax will be allowed in the body of a generator: :: yield from <expr> where <expr> is an expression evaluating to an iterable, from which an iterator is extracted. The iterator is run to exhaustion, during which time it yields and receives values directly to or from the caller of the generator containing the ``yield from`` expression (the "delegating generator"). When the iterator is another generator, the effect is the same as if the body of the subgenerator were inlined at the point of the ``yield from`` expression. Furthermore, the subgenerator is allowed to execute a ``return`` statement with a value, and that value becomes the value of the ``yield from`` expression. In general, the semantics can be understood in terms of the iterator protocol as follows: * Any values that the iterator yields are passed directly to the caller. * Any values sent to the delegating generator using ``send()`` are passed directly to the iterator. If the sent value is None, the iterator's ``next()`` method is called. If the sent value is not None, the iterator's ``send()`` method is called if it has one, otherwise an exception is raised in the delegating generator. * Calls to the ``throw()`` method of the delegating generator are forwarded to the iterator. If the iterator does not have a ``throw()`` method, the thrown-in exception is raised in the delegating generator. * If the delegating generator's ``close()`` method is called, the ``close() method of the iterator is called first if it has one, then the delegating generator is finalised. * The value of the ``yield from`` expression is the first argument to the ``StopIteration`` exception raised by the iterator when it terminates. * ``return expr`` in a generator causes ``StopIteration(expr)`` to be raised. For convenience, the ``StopIteration`` exception will be given a ``value`` attribute that holds its first argument, or None if there are no arguments. Formal Semantics ---------------- 1. The statement :: result = yield from expr is semantically equivalent to :: _i = iter(expr) try: _u = _i.next() while 1: try: _v = yield _u except Exception, _e: if hasattr(_i, 'throw'): _u = _i.throw(_e) else: raise else: if _v is None: _u = _i.next() else: _u = _i.send(_v) except StopIteration, _e: result = _e.value finally: if hasattr(_i, 'close'): _i.close() 2. In a generator, the statement :: raise value is semantically equivalent to raise StopIteration(value) except that, as currently, the exception cannot be caught by 'except' clauses within the returning generator. 3. The StopIteration exception behaves as though defined thusly: :: class StopIteration(Exception): def __init__(self, *args): if len(args) > 0: self.value = args[0] else: self.value = None Exception.__init__(self, *args) Rationale ========= A Python generator is a form of coroutine, but has the limitation that it can only yield to its immediate caller. This means that a piece of code containing a ``yield`` cannot be factored out and put into a separate function in the same way as other code. Performing such a factoring causes the called function to itself become a generator, and it is necessary to explicitly iterate over this second generator and re-yield any values that it produces. If yielding of values is the only concern, this is not very arduous and can be performed with a loop such as :: for v in g: yield v However, if the subgenerator is to interact properly with the caller in the case of calls to ``send()``, ``throw()`` and ``close()``, things become considerably more complicated. As the formal expansion presented above illustrates, the necessary code is very longwinded, and it is tricky to handle all the corner cases correctly. In this situation, the advantages of a specialised syntax should be clear. Generators as Threads --------------------- A motivating use case for generators being able to return values concerns the use of generators to implement lightweight threads. When using generators in that way, it is reasonable to want to spread the computation performed by the lightweight thread over many functions. One would like to be able to call a subgenerator as though it were an ordinary function, passing it parameters and receiving a returned value. Using the proposed syntax, a statement such as :: y = f(x) where f is an ordinary function, can be transformed into a delegation call :: y = yield from g(x) where g is a generator. One can reason about the behaviour of the resulting code by thinking of g as an ordinary function that can be suspended using a ``yield`` statement. When using generators as threads in this way, typically one is not interested in the values being passed in or out of the yields. However, there are use cases for this as well, where the thread is seen as a producer or consumer of items. The ``yield from`` expression allows the logic of the thread to be spread over as many functions as desired, with the production or consumption of items occuring in any subfunction, and the items are automatically routed to or from their ultimate source or destination. Concerning ``throw()`` and ``close()``, it is reasonable to expect that if an exception is thrown into the thread from outside, it should first be raised in the innermost generator where the thread is suspended, and propagate outwards from there; and that if the thread is terminated from outside by calling ``close()``, the chain of active generators should be finalised from the innermost outwards. Syntax ------ The particular syntax proposed has been chosen as suggestive of its meaning, while not introducing any new keywords and clearly standing out as being different from a plain ``yield``. Optimisations ------------- Using a specialised syntax opens up possibilities for optimisation when there is a long chain of generators. Such chains can arise, for instance, when recursively traversing a tree structure. The overhead of passing ``next()`` calls and yielded values down and up the chain can cause what ought to be an O(n) operation to become O(n\*\*2). A possible strategy is to add a slot to generator objects to hold a generator being delegated to. When a ``next()`` or ``send()`` call is made on the generator, this slot is checked first, and if it is nonempty, the generator that it references is resumed instead. If it raises StopIteration, the slot is cleared and the main generator is resumed. This would reduce the delegation overhead to a chain of C function calls involving no Python code execution. A possible enhancement would be to traverse the whole chain of generators in a loop and directly resume the one at the end, although the handling of StopIteration is more complicated then. Use of StopIteration to return values ------------------------------------- There are a variety of ways that the return value from the generator could be passed back. Some alternatives include storing it as an attribute of the generator-iterator object, or returning it as the value of the ``close()`` call to the subgenerator. However, the proposed mechanism is attractive for a couple of reasons: * Using the StopIteration exception makes it easy for other kinds of iterators to participate in the protocol without having to grow extra attributes or a close() method. * It simplifies the implementation, because the point at which the return value from the subgenerator becomes available is the same point at which StopIteration is raised. Delaying until any later time would require storing the return value somewhere. Criticisms ========== Under this proposal, the value of a ``yield from`` expression would be derived in a very different way from that of an ordinary ``yield`` expression. This suggests that some other syntax not containing the word ``yield`` might be more appropriate, but no acceptable alternative has so far been proposed. It has been suggested that some mechanism other than ``return`` in the subgenerator should be used to establish the value returned by the ``yield from`` expression. However, this would interfere with the goal of being able to think of the subgenerator as a suspendable function, since it would not be able to return values in the same way as other functions. The use of an argument to StopIteration to pass the return value has been criticised as an "abuse of exceptions", without any concrete justification of this claim. In any case, this is only one suggested implementation; another mechanism could be used without losing any essential features of the proposal. It has been suggested that a different exception, such as GeneratorReturn, should be used instead of StopIteration to return a value. However, no convincing practical reason for this has been put forward, and the addition of a ``value`` attribute to StopIteration mitigates any difficulties in extracting a return value from a StopIteration exception that may or may not have one. Also, using a different exception would mean that, unlike ordinary functions, 'return' without a value in a generator would not be equivalent to 'return None'. Alternative Proposals ===================== Proposals along similar lines have been made before, some using the syntax ``yield *`` instead of ``yield from``. While ``yield *`` is more concise, it could be argued that it looks too similar to an ordinary ``yield`` and the difference might be overlooked when reading code. To the author's knowledge, previous proposals have focused only on yielding values, and thereby suffered from the criticism that the two-line for-loop they replace is not sufficiently tiresome to write to justify a new syntax. By also dealing with calls to ``send()``, ``throw()`` and ``close()``, this proposal provides considerably more benefit. Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End:

I haven't been following the discussion too closely, but shouldn't it be "return value" is semantically equivalent to raise StopIteration(value) instead of "raise value"? in bullet 2 of formal semantics? ===== --Ryan E. Freckleton On Sat, Feb 21, 2009 at 3:43 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:

Greg Ewing wrote:
Shouldn't this define which exception is raised? Also, raising the exception within the delegating generator will (unless caught there) finalize the generator. This may cause surprising results if the caller catches the exception and tries to continue to use the generator. Intuitively, I would expect that the delegating generator would not see this exception; as if the delegating generator itself lacked a send method. The reasoning is that the error is with the caller and not the delegating generator. Also, given that the send method may work while the delegating generator is outside of any yield from, but not work during a yield from; not raising the exception within the delegating generator gives the caller a safe way to test the waters without finalizing the delegating generator. OTOH, this may be a reason to just translate send to next for non-None values too??? Perhaps the justification in that case would be to think of it like sending to a yield *statement* (which can't accept the sent value) -- which is not an error in generators. (Is it too late to change my vote for send(non-None) generating an exception? :-)
-bruce frederiksen

Bruce Frederiksen wrote:
To put it more precisely, whatever exception results from attempting to call the non-existent send() method is propagated into the delegating generator.
This would introduce an inconsistency between delegating to a generator and delegating to some other kind of iterator. When delegating to another generator, the inlining principle requires that any exceptions raised by the subgenerator must be propagated through the delegating generator. This includes whatever exceptions might result from attempting to send values to the subgenerator. My feeling is that other iterators should behave the same way as generators, as closely as possible, when delegated to. There's also the consideration that the semantics you propose can't be expressed in terms of a Python expansion, since there's no way for a generator to throw an exception right out of itself without triggering any except or finally blocks on the way. While that's not a fatal flaw, I think it's highly desirable to be able to specify the semantics in terms of an expansion, because of its precision. Currently the expansion in the PEP is the only precise and complete specification. It's very hard to express all the nuances and ramifications in words and be sure that you've covered everything -- as witnessed by your comments above!
That's a distinct possibility. Guido pointed out that there is an existing case where send() refuses to accept anything other than None, and that's when you call it immediately after the generator starts. But that case doesn't apply here, because the first call to a delegated iterator is always made implicitly by the yield-from expression itself. So a send() that gets delegated to a subiterator is *never* the first call, and therefore it should ignore any sent values that it doesn't care about. In other words, go back to what I had in the first draft of the PEP: if hasattr(_i, 'send'): _u = _i.send(_v) else: _u = _i.next() or perhaps if _v is not None and hasattr(_i, 'send'): _u = _i.send(_v) else: _u = _i.next() Guido, what do you think about this? -- Greg

On Sun, Feb 22, 2009 at 2:25 AM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
And that would be bad -- if the yield-from is inside a try/finally, I'd expect the finally clause to be run.
I think it's all pretty academic as long as it is specified with sufficient exactness that someone else reimplementing it will arrive at the same choices. I don't particularly like the LBYL (Look Before You Leap) idiom, so let's do this: if _v is None: _u = _i.next() # Or, in Py3k, _u = next(i) else: _u = _i.send(_v) This means that sending a non-None value into a generator that delegates to a non-generator iterator will fail. I doubt there will be too many use cases that are inconvenienced by this. After all, the presence of a 'send' attribute doesn't mean it can be called anyway, and even if it can, it doesn't mean the call will succeed. -- --Guido van Rossum (home page: http://www.python.org/~guido/)

Given that the expansion is quite complicated... Imagine that we want to alter the values while passing them from the inner generator. For example: for x in expr: yield x+1 How to let this be as transparent to send/close/etc. as yield from? If the transformation is an expression, one can use a generator comprehension: yield from (x+1 for x in expr) but this does not work if we have some statements inside: seen = set() for x in expr: if x not in seen: seen.add(x) yield x (I'm not familiar with the details of how send works, but I hope the point is valid.) -- Marcin Kowalczyk qrczak@knm.org.pl http://qrnik.knm.org.pl/~qrczak/

I haven't been following the discussion too closely, but shouldn't it be "return value" is semantically equivalent to raise StopIteration(value) instead of "raise value"? in bullet 2 of formal semantics? ===== --Ryan E. Freckleton On Sat, Feb 21, 2009 at 3:43 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:

Greg Ewing wrote:
Shouldn't this define which exception is raised? Also, raising the exception within the delegating generator will (unless caught there) finalize the generator. This may cause surprising results if the caller catches the exception and tries to continue to use the generator. Intuitively, I would expect that the delegating generator would not see this exception; as if the delegating generator itself lacked a send method. The reasoning is that the error is with the caller and not the delegating generator. Also, given that the send method may work while the delegating generator is outside of any yield from, but not work during a yield from; not raising the exception within the delegating generator gives the caller a safe way to test the waters without finalizing the delegating generator. OTOH, this may be a reason to just translate send to next for non-None values too??? Perhaps the justification in that case would be to think of it like sending to a yield *statement* (which can't accept the sent value) -- which is not an error in generators. (Is it too late to change my vote for send(non-None) generating an exception? :-)
-bruce frederiksen

Bruce Frederiksen wrote:
To put it more precisely, whatever exception results from attempting to call the non-existent send() method is propagated into the delegating generator.
This would introduce an inconsistency between delegating to a generator and delegating to some other kind of iterator. When delegating to another generator, the inlining principle requires that any exceptions raised by the subgenerator must be propagated through the delegating generator. This includes whatever exceptions might result from attempting to send values to the subgenerator. My feeling is that other iterators should behave the same way as generators, as closely as possible, when delegated to. There's also the consideration that the semantics you propose can't be expressed in terms of a Python expansion, since there's no way for a generator to throw an exception right out of itself without triggering any except or finally blocks on the way. While that's not a fatal flaw, I think it's highly desirable to be able to specify the semantics in terms of an expansion, because of its precision. Currently the expansion in the PEP is the only precise and complete specification. It's very hard to express all the nuances and ramifications in words and be sure that you've covered everything -- as witnessed by your comments above!
That's a distinct possibility. Guido pointed out that there is an existing case where send() refuses to accept anything other than None, and that's when you call it immediately after the generator starts. But that case doesn't apply here, because the first call to a delegated iterator is always made implicitly by the yield-from expression itself. So a send() that gets delegated to a subiterator is *never* the first call, and therefore it should ignore any sent values that it doesn't care about. In other words, go back to what I had in the first draft of the PEP: if hasattr(_i, 'send'): _u = _i.send(_v) else: _u = _i.next() or perhaps if _v is not None and hasattr(_i, 'send'): _u = _i.send(_v) else: _u = _i.next() Guido, what do you think about this? -- Greg

On Sun, Feb 22, 2009 at 2:25 AM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
And that would be bad -- if the yield-from is inside a try/finally, I'd expect the finally clause to be run.
I think it's all pretty academic as long as it is specified with sufficient exactness that someone else reimplementing it will arrive at the same choices. I don't particularly like the LBYL (Look Before You Leap) idiom, so let's do this: if _v is None: _u = _i.next() # Or, in Py3k, _u = next(i) else: _u = _i.send(_v) This means that sending a non-None value into a generator that delegates to a non-generator iterator will fail. I doubt there will be too many use cases that are inconvenienced by this. After all, the presence of a 'send' attribute doesn't mean it can be called anyway, and even if it can, it doesn't mean the call will succeed. -- --Guido van Rossum (home page: http://www.python.org/~guido/)

Given that the expansion is quite complicated... Imagine that we want to alter the values while passing them from the inner generator. For example: for x in expr: yield x+1 How to let this be as transparent to send/close/etc. as yield from? If the transformation is an expression, one can use a generator comprehension: yield from (x+1 for x in expr) but this does not work if we have some statements inside: seen = set() for x in expr: if x not in seen: seen.add(x) yield x (I'm not familiar with the details of how send works, but I hope the point is valid.) -- Marcin Kowalczyk qrczak@knm.org.pl http://qrnik.knm.org.pl/~qrczak/
participants (5)
-
Bruce Frederiksen
-
Greg Ewing
-
Guido van Rossum
-
Marcin 'Qrczak' Kowalczyk
-
Ryan Freckleton