[Python-Dev] Breaking off Enhanced Iterators PEP from PEP 340

Steven Bethard steven.bethard at gmail.com
Fri May 6 20:16:47 CEST 2005


[Guido]
> I don't think it's necessary to separate this out into a separate PEP;
> that just seems busy-work. I agree these parts are orthogonal and
> uncontroversial; a counter-PEP can suffice by stating that it's not
> countering those items nor repeating them.

[Raymond]
> If someone volunteers to split it out for you, I think it would be
> worthwhile.  Right now, the PEP is hard to swallow in one bite.
> Improving its digestibility would be a big help when the PEP is offered
> up to the tender mercies to comp.lang.python.

Well, busy-work or not, I took the 20 minutes to split them up, so I
figured I might as well make them available.  It was actually really
easy to split them apart, and I think they both read better this way,
but I'm not sure my opinion counts for much here anyway. ;-)  (The
Enhanced Iterators PEP is first, the remainder of PEP 340 follows it.)

----------------------------------------------------------------------
PEP: XXX
Title: Enhanced Iterators
Version: 
Last-Modified: 
Author: Guido van Rossum
Status: Draft
Type: Standards Track
Content-Type: text/plain
Created: 6-May-2005
Post-History:

Introduction

    This PEP proposes a new iterator API that allows values to be
    passed into an iterator using "continue EXPR". These values are
    received in the iterator as an argument to the new __next__
    method, and can be accessed in a generator with a
    yield-expression.
    
    The content of this PEP is derived from the original content of
    PEP 340, broken off into its own PEP as the new iterator API is
    pretty much orthogonal from the anonymous block statement
    discussion.

Motivation and Summary

    ...

Use Cases

    See the Examples section near the end.

Specification: the __next__() Method

    A new method for iterators is proposed, called __next__().  It
    takes one optional argument, which defaults to None.  Calling the
    __next__() method without argument or with None is equivalent to
    using the old iterator API, next().  For backwards compatibility,
    it is recommended that iterators also implement a next() method as
    an alias for calling the __next__() method without an argument.

    The argument to the __next__() method may be used by the iterator
    as a hint on what to do next.

Specification: the next() Built-in Function

    This is a built-in function defined as follows:

        def next(itr, arg=None):
            nxt = getattr(itr, "__next__", None)
            if nxt is not None:
                return nxt(arg)
            if arg is None:
                return itr.next()
            raise TypeError("next() with arg for old-style iterator")

    This function is proposed because there is often a need to call
    the next() method outside a for-loop; the new API, and the
    backwards compatibility code, is too ugly to have to repeat in
    user code.

Specification: a Change to the 'for' Loop

    A small change in the translation of the for-loop is proposed.
    The statement

        for VAR1 in EXPR1:
            BLOCK1
        else:
            BLOCK2

    will be translated as follows:

        itr = iter(EXPR1)
        arg = None    # Set by "continue EXPR2", see below
        brk = False
        while True:
            try:
                VAR1 = next(itr, arg)
            except StopIteration:
                brk = True
                break
            arg = None
            BLOCK1
        if brk:
            BLOCK2

    (However, the variables 'itr' etc. are not user-visible and the
    built-in names used cannot be overridden by the user.)

Specification: the Extended 'continue' Statement

    In the translation of the for-loop, inside BLOCK1, the new syntax

        continue EXPR2

    is legal and is translated into

        arg = EXPR2
        continue

    (Where 'arg' references the corresponding hidden variable from the
    previous section.)

    This is also the case in the body of the block-statement proposed
    below.

    EXPR2 may contain commas; "continue 1, 2, 3" is equivalent to
    "continue (1, 2, 3)".

Specification: Generators and Yield-Expressions

    Generators will implement the new __next__() method API, as well
    as the old argument-less next() method which becomes an alias for
    calling __next__() without an argument.

    The yield-statement will be allowed to be used on the right-hand
    side of an assignment; in that case it is referred to as
    yield-expression.  The value of this yield-expression is None
    unless __next__() was called with an argument; see below.

    A yield-expression must always be parenthesized except when it
    occurs at the top-level expression on the right-hand side of an
    assignment.  So

        x = yield 42
        x = yield
        x = 12 + (yield 42)
        x = 12 + (yield)
        foo(yield 42)
        foo(yield)

    are all legal, but

        x = 12 + yield 42
        x = 12 + yield
        foo(yield 42, 12)
        foo(yield, 12)

    are all illegal.  (Some of the edge cases are motivated by the
    current legality of "yield 12, 42".)

    When __next__() is called with an argument that is not None, the
    yield-expression that it resumes will return the argument.  If it
    resumes a yield-statement, the value is ignored (this is similar
    to ignoring the value returned by a function call).  When the
    *initial* call to __next__() receives an argument that is not
    None, TypeError is raised; this is likely caused by some logic
    error.  When __next__() is called without an argument or with None
    as argument, and a yield-expression is resumed, the
    yield-expression returns None.

    Note: the syntactic extensions to yield make its use very similar
    to that in Ruby.  This is intentional.  Do note that in Python the
    block passes a value to the generator using "continue EXPR" rather
    than "return EXPR", and the underlying mechanism whereby control
    is passed between the generator and the block is completely
    different.  Blocks in Python are not compiled into thunks; rather,
    yield suspends execution of the generator's frame.  Some edge
    cases work differently; in Python, you cannot save the block for
    later use, and you cannot test whether there is a block or not.

Acknowledgements

    See Acknowledgements of PEP 340.

References

    ...

Copyright

    This document has been placed in the public domain.

----------------------------------------------------------------------
**********************************************************************
----------------------------------------------------------------------

PEP: 340
Title: Anonymous Block Statements
Version: $Revision: 1.24 $
Last-Modified: $Date: 2005/05/05 15:39:19 $
Author: Guido van Rossum
Status: Draft
Type: Standards Track
Content-Type: text/plain
Created: 27-Apr-2005
Post-History:

Introduction

    This PEP proposes a new type of compound statement which can be
    used for resource management purposes.  The new statement type
    is provisionally called the block-statement because the keyword
    to be used has not yet been chosen.

    This PEP competes with several other PEPs: PEP 288 (Generators
    Attributes and Exceptions; only the second part), PEP 310
    (Reliable Acquisition/Release Pairs), and PEP 325
    (Resource-Release Support for Generators).

    I should clarify that using a generator to "drive" a block
    statement is a separable proposal; with just the definition of
    the block statement from the PEP you could implement all the
    examples using a class (similar to example 6, which is easily
    turned into a template).

    But the key idea is using a generator to drive a block statement;
    the rest is elaboration.

Motivation and Summary

    (Thanks to Shane Hathaway -- Hi Shane!)

    Good programmers move commonly used code into reusable functions.
    Sometimes, however, patterns arise in the structure of the
    functions rather than the actual sequence of statements.  For
    example, many functions acquire a lock, execute some code specific
    to that function, and unconditionally release the lock.  Repeating
    the locking code in every function that uses it is error prone and
    makes refactoring difficult.

    Block statements provide a mechanism for encapsulating patterns of
    structure.  Code inside the block statement runs under the control
    of an object called a block iterator.  Simple block iterators
    execute code before and after the code inside the block statement.
    Block iterators also have the opportunity to execute the
    controlled code more than once (or not at all), catch exceptions,
    or receive data from the body of the block statement.

    A convenient way to write block iterators is to write a generator
    (PEP 255).  A generator looks a lot like a Python function, but
    instead of returning a value immediately, generators pause their
    execution at "yield" statements.  When a generator is used as a
    block iterator, the yield statement tells the Python interpreter
    to suspend the block iterator, execute the block statement body,
    and resume the block iterator when the body has executed.

    The Python interpreter behaves as follows when it encounters a
    block statement based on a generator.  First, the interpreter
    instantiates the generator and begins executing it.  The generator
    does setup work appropriate to the pattern it encapsulates, such
    as acquiring a lock, opening a file, starting a database
    transaction, or starting a loop.  Then the generator yields
    execution to the body of the block statement using a yield
    statement.  When the block statement body completes, raises an
    uncaught exception, or sends data back to the generator using a
    continue statement, the generator resumes.  At this point, the
    generator can either clean up and stop or yield again, causing the
    block statement body to execute again.  When the generator
    finishes, the interpreter leaves the block statement.

Use Cases

    See the Examples section near the end.

Specification: the __exit__() Method

    An optional new method for iterators is proposed, called
    __exit__().  It takes up to three arguments which correspond to
    the three "arguments" to the raise-statement: type, value, and
    traceback.  If all three arguments are None, sys.exc_info() may be
    consulted to provide suitable default values.

Specification: the Anonymous Block Statement

    A new statement is proposed with the syntax

        block EXPR1 as VAR1:
            BLOCK1

    Here, 'block' and 'as' are new keywords; EXPR1 is an arbitrary
    expression (but not an expression-list) and VAR1 is an arbitrary
    assignment target (which may be a comma-separated list).

    The "as VAR1" part is optional; if omitted, the assignments to
    VAR1 in the translation below are omitted (but the expressions
    assigned are still evaluated!).

    The choice of the 'block' keyword is contentious; many
    alternatives have been proposed, including not to use a keyword at
    all (which I actually like).  PEP 310 uses 'with' for similar
    semantics, but I would like to reserve that for a with-statement
    similar to the one found in Pascal and VB.  (Though I just found
    that the C# designers don't like 'with' [2], and I have to agree
    with their reasoning.)  To sidestep this issue momentarily I'm
    using 'block' until we can agree on the right keyword, if any.

    Note that the 'as' keyword is not contentious (it will finally be
    elevated to proper keyword status).

    Note that it is up to the iterator to decide whether a
    block-statement represents a loop with multiple iterations; in the
    most common use case BLOCK1 is executed exactly once.  To the
    parser, however, it is always a loop; break and continue return
    transfer to the block's iterator (see below for details).

    The translation is subtly different from a for-loop: iter() is
    not called, so EXPR1 should already be an iterator (not just an
    iterable); and the iterator is guaranteed to be notified when
    the block-statement is left, regardless if this is due to a
    break, return or exception:

        itr = EXPR1  # The iterator
        ret = False  # True if a return statement is active
        val = None   # Return value, if ret == True
        exc = None   # sys.exc_info() tuple if an exception is active
        while True:
            try:
                if exc:
                    ext = getattr(itr, "__exit__", None)
                    if ext is not None:
                        VAR1 = ext(*exc)   # May re-raise *exc
                    else:
                        raise exc[0], exc[1], exc[2]
                else:
                    VAR1 = itr.next()  # May raise StopIteration
            except StopIteration:
                if ret:
                    return val
                break
            try:
                ret = False
                val = exc = None
                BLOCK1
            except:
                exc = sys.exc_info()

    (Again, the variables and built-ins are hidden from the user.)

    Inside BLOCK1, the following special translations apply:

    - "break" is always legal; it is translated into:

        exc = (StopIteration, None, None)
        continue

    - "return EXPR3" is only legal when the block-statement is
      contained in a function definition; it is translated into:

        exc = (StopIteration, None, None)
        ret = True
        val = EXPR3
        continue

    The net effect is that break and return behave much the same as
    if the block-statement were a for-loop, except that the iterator
    gets a chance at resource cleanup before the block-statement is
    left, through the optional __exit__() method. The iterator also
    gets a chance if the block-statement is left through raising an
    exception.  If the iterator doesn't have an __exit__() method,
    there is no difference with a for-loop (except that a for-loop
    calls iter() on EXPR1).

    Note that a yield-statement in a block-statement is not treated
    differently.  It suspends the function containing the block
    *without* notifying the block's iterator.  The block's iterator
    is entirely unaware of this yield, since the local control flow
    doesn't actually leave the block. In other words, it is *not*
    like a break or return statement.  When the loop that was
    resumed by the yield calls next(), the block is resumed right
    after the yield.  The generator finalization semantics described
    below guarantee (within the limitations of all finalization
    semantics) that the block will be resumed eventually.

    Unlike the for-loop, the block-statement does not have an
    else-clause.  I think it would be confusing, and emphasize the
    "loopiness" of the block-statement, while I want to emphasize its
    *difference* from a for-loop.  In addition, there are several
    possible semantics for an else-clause, and only a very weak use
    case.

Specification: Generator Exit Handling

    Generators will implement the new __exit__() method API.

    Generators will be allowed to have a yield statement inside a
    try-finally statement.

    The expression argument to the yield-statement will become
    optional (defaulting to None).

    When __exit__() is called, the generator is resumed but at the
    point of the yield-statement the exception represented by the
    __exit__ argument(s) is raised.  The generator may re-raise this
    exception, raise another exception, or yield another value,
    except that if the exception passed in to __exit__() was
    StopIteration, it ought to raise StopIteration (otherwise the
    effect would be that a break is turned into continue, which is
    unexpected at least).  When the *initial* call resuming the
    generator is an __exit__() call instead of a next() call, the
    generator's execution is aborted and the exception is re-raised
    without passing control to the generator's body.

    When a generator that has not yet terminated is garbage-collected
    (either through reference counting or by the cyclical garbage
    collector), its __exit__() method is called once with
    StopIteration as its first argument.  Together with the
    requirement that a generator ought to raise StopIteration when
    __exit__() is called with StopIteration, this guarantees the
    eventual activation of any finally-clauses that were active when
    the generator was last suspended.  Of course, under certain
    circumstances the generator may never be garbage-collected.  This
    is no different than the guarantees that are made about finalizers
    (__del__() methods) of other objects.

Alternatives Considered and Rejected

    - Many alternatives have been proposed for 'block'.  I haven't
      seen a proposal for another keyword that I like better than
      'block' yet.  Alas, 'block' is also not a good choice; it is a
      rather popular name for variables, arguments and methods.
      Perhaps 'with' is the best choice after all?

    - Instead of trying to pick the ideal keyword, the block-statement
      could simply have the form:

        EXPR1 as VAR1:
            BLOCK1

      This is at first attractive because, together with a good choice
      of function names (like those in the Examples section below)
      used in EXPR1, it reads well, and feels like a "user-defined
      statement".  And yet, it makes me (and many others)
      uncomfortable; without a keyword the syntax is very "bland",
      difficult to look up in a manual (remember that 'as' is
      optional), and it makes the meaning of break and continue in the
      block-statement even more confusing.

    - Phillip Eby has proposed to have the block-statement use
      an entirely different API than the for-loop, to differentiate
      between the two.  A generator would have to be wrapped in a
      decorator to make it support the block API.  IMO this adds more
      complexity with very little benefit; and we can't relly deny
      that the block-statement is conceptually a loop -- it supports
      break and continue, after all.

    - This keeps getting proposed: "block VAR1 = EXPR1" instead of
      "block EXPR1 as VAR1".  That would be very misleading, since
      VAR1 does *not* get assigned the value of EXPR1; EXPR1 results
      in a generator which is assigned to an internal variable, and
      VAR1 is the value returned by successive calls to the __next__()
      method of that iterator.

    - Why not change the translation to apply iter(EXPR1)?  All the
      examples would continue to work.  But this makes the
      block-statement *more* like a for-loop, while the emphasis ought
      to be on the *difference* between the two.  Not calling iter()
      catches a bunch of misunderstandings, like using a sequence as
      EXPR1.

Comparison to Thunks

    Alternative semantics proposed for the block-statement turn the
    block into a thunk (an anonymous function that blends into the
    containing scope).

    The main advantage of thunks that I can see is that you can save
    the thunk for later, like a callback for a button widget (the
    thunk then becomes a closure).  You can't use a yield-based block
    for that (except in Ruby, which uses yield syntax with a
    thunk-based implementation).  But I have to say that I almost see
    this as an advantage: I think I'd be slightly uncomfortable seeing
    a block and not knowing whether it will be executed in the normal
    control flow or later.  Defining an explicit nested function for
    that purpose doesn't have this problem for me, because I already
    know that the 'def' keyword means its body is executed later.

    The other problem with thunks is that once we think of them as the
    anonymous functions they are, we're pretty much forced to say that
    a return statement in a thunk returns from the thunk rather than
    from the containing function.  Doing it any other way would cause
    major weirdness when the thunk were to survive its containing
    function as a closure (perhaps continuations would help, but I'm
    not about to go there :-).

    But then an IMO important use case for the resource cleanup
    template pattern is lost.  I routinely write code like this:

       def findSomething(self, key, default=None):
           self.lock.acquire()
           try:
                for item in self.elements:
                    if item.matches(key):
                        return item
                return default
           finally:
              self.lock.release()

    and I'd be bummed if I couldn't write this as:

       def findSomething(self, key, default=None):
           block locking(self.lock):
                for item in self.elements:
                    if item.matches(key):
                        return item
                return default

    This particular example can be rewritten using a break:

       def findSomething(self, key, default=None):
           block locking(self.lock):
                for item in self.elements:
                    if item.matches(key):
                        break
                else:
                    item = default
            return item

    but it looks forced and the transformation isn't always that easy;
    you'd be forced to rewrite your code in a single-return style
    which feels too restrictive.

    Also note the semantic conundrum of a yield in a thunk -- the only
    reasonable interpretation is that this turns the thunk into a
    generator!

    Greg Ewing believes that thunks "would be a lot simpler, doing
    just what is required without any jiggery pokery with exceptions
    and break/continue/return statements.  It would be easy to explain
    what it does and why it's useful."

    But in order to obtain the required local variable sharing between
    the thunk and the containing function, every local variable used
    or set in the thunk would have to become a 'cell' (our mechanism
    for sharing variables between nested scopes).  Cells slow down
    access compared to regular local variables: access involves an
    extra C function call (PyCell_Get() or PyCell_Set()).

    Perhaps not entirely coincidentally, the last example above
    (findSomething() rewritten to avoid a return inside the block)
    shows that, unlike for regular nested functions, we'll want
    variables *assigned to* by the thunk also to be shared with the
    containing function, even if they are not assigned to outside the
    thunk.

    Greg Ewing again: "generators have turned out to be more powerful,
    because you can have more than one of them on the go at once. Is
    there a use for that capability here?"

    I believe there are definitely uses for this; several people have
    already shown how to do asynchronous light-weight threads using
    generators (e.g. David Mertz quoted in PEP 288, and Fredrik
    Lundh[3]).

    And finally, Greg says: "a thunk implementation has the potential
    to easily handle multiple block arguments, if a suitable syntax
    could ever be devised. It's hard to see how that could be done in
    a general way with the generator implementation."

    However, the use cases for multiple blocks seem elusive.

    (Proposals have since been made to change the implementation of
    thunks to remove most of these objections, but the resulting
    semantics are fairly complex to explain and to implement, so IMO
    that defeats the purpose of using thunks in the first place.)

Examples

    1. A template for ensuring that a lock, acquired at the start of a
       block, is released when the block is left:

        def locking(lock):
            lock.acquire()
            try:
                yield
            finally:
                lock.release()

       Used as follows:

        block locking(myLock):
            # Code here executes with myLock held.  The lock is
            # guaranteed to be released when the block is left (even
            # if via return or by an uncaught exception).

    2. A template for opening a file that ensures the file is closed
       when the block is left:

        def opening(filename, mode="r"):
            f = open(filename, mode)
            try:
                yield f
            finally:
                f.close()

       Used as follows:

        block opening("/etc/passwd") as f:
            for line in f:
                print line.rstrip()

    3. A template for committing or rolling back a database
       transaction:

        def transactional(db):
            try:
                yield
            except:
                db.rollback()
                raise
            else:
                db.commit()

    4. A template that tries something up to n times:

        def auto_retry(n=3, exc=Exception):
            for i in range(n):
                try:
                    yield
                    return
                except exc, err:
                    # perhaps log exception here
                    continue
            raise # re-raise the exception we caught earlier

       Used as follows:

        block auto_retry(3, IOError):
            f = urllib.urlopen("http://python.org/peps/pep-0340.html")
            print f.read()

    5. It is possible to nest blocks and combine templates:

        def locking_opening(lock, filename, mode="r"):
            block locking(lock):
                block opening(filename) as f:
                    yield f

       Used as follows:

        block locking_opening(myLock, "/etc/passwd") as f:
            for line in f:
                print line.rstrip()

       (If this example confuses you, consider that it is equivalent
       to using a for-loop with a yield in its body in a regular
       generator which is invoking another iterator or generator
       recursively; see for example the source code for os.walk().)

    6. It is possible to write a regular iterator with the
       semantics of example 1:

        class locking:
           def __init__(self, lock):
               self.lock = lock
               self.state = 0
           def __next__(self, arg=None):
               # ignores arg
               if self.state:
                   assert self.state == 1
                   self.lock.release()
                   self.state += 1
                   raise StopIteration
               else:
                   self.lock.acquire()
                   self.state += 1
                   return None
           def __exit__(self, type, value=None, traceback=None):
               assert self.state in (0, 1, 2)
               if self.state == 1:
                   self.lock.release()
               raise type, value, traceback

       (This example is easily modified to implement the other
       examples; it shows how much simpler generators are for the same
       purpose.)

    7. Redirect stdout temporarily:

        def redirecting_stdout(new_stdout):
            save_stdout = sys.stdout
            try:
                sys.stdout = new_stdout
                yield
            finally:
                sys.stdout = save_stdout

       Used as follows:

        block opening(filename, "w") as f:
            block redirecting_stdout(f):
                print "Hello world"

    8. A variant on opening() that also returns an error condition:

        def opening_w_error(filename, mode="r"):
	    try:
	        f = open(filename, mode)
	    except IOError, err:
	        yield None, err
	    else:
                try:
		    yield f, None
		finally:
		    f.close()

       Used as follows:

        block opening_w_error("/etc/passwd", "a") as f, err:
            if err:
	        print "IOError:", err
	    else:
	        f.write("guido::0:0::/:/bin/sh\n")

Acknowledgements

    In no useful order: Alex Martelli, Barry Warsaw, Bob Ippolito,
    Brett Cannon, Brian Sabbey, Chris Ryland, Doug Landauer, Duncan
    Booth, Fredrik Lundh, Greg Ewing, Holger Krekel, Jason Diamond,
    Jim Jewett, Josiah Carlson, Ka-Ping Yee, Michael Chermside,
    Michael Hudson, Neil Schemenauer, Nick Coghlan, Paul Moore,
    Phillip Eby, Raymond Hettinger, Reinhold Birkenfeld, Samuele
    Pedroni, Shannon Behrens, Skip Montanaro, Steven Bethard, Terry
    Reedy, Tim Delaney, Aahz, and others.  Thanks all for the valuable
    contributions!

References

    [1] http://mail.python.org/pipermail/python-dev/2005-April/052821.html

    [2] http://msdn.microsoft.com/vcsharp/programming/language/ask/withstatement/

    [3] http://effbot.org/zone/asyncore-generators.htm


Copyright

    This document has been placed in the public domain.


More information about the Python-Dev mailing list