[Python-checkins] CVS: python/nondist/peps pep-0279.txt,NONE,1.1

Barry Warsaw bwarsaw@users.sourceforge.net
Thu, 31 Jan 2002 21:59:16 -0800


Update of /cvsroot/python/python/nondist/peps
In directory usw-pr-cvs1:/tmp/cvs-serv14252

Added Files:
	pep-0279.txt 
Log Message:
PEP 279, Enhanced Generators, Raymond D. Hettinger.

(Minor spell checking and formatting by BAW)


--- NEW FILE: pep-0279.txt ---
PEP: 279
Title: Enhanced Generators
Version: $Revision: 1.1 $
Last-Modified: $Date: 2002/02/01 05:59:14 $
Author: othello@javanet.com (Raymond D. Hettinger)
Status: Draft
Type: Standards Track
Created: 30-Jan-2002
Python-Version: 2.3
Post-History: 


Abstract

    This PEP introduces four orthogonal (not mutually exclusive) ideas
    for enhancing the generators as introduced in Python version 2.2
    [1].  The goal is increase the convenience, utility, and power of
    generators.


Rationale

    Starting with xrange() and xreadlines(), Python has been evolving
    toward a model that provides lazy evaluation as an alternative
    when complete evaluation is not desired because of memory
    restrictions or availability of data.

    Starting with Python 2.2, a second evolutionary direction came in
    the form of iterators and generators.  The iter() factory function
    and generators were provided as convenient means of creating
    iterators.  Deep changes were made to use iterators as a unifying
    theme throughout Python.  The unification came in the form of
    establishing a common iterable interface for mappings, sequences,
    and file objects.  In the case of mappings and file objects, lazy
    evaluation was made available.

    The next steps in the evolution of generators are:

    1. Add built-in functions which provide lazy alternatives to their
       complete evaluation counterparts and one other convenience
       function which was made possible once iterators and generators
       became available.  The new functions are xzip, xmap, xfilter,
       and indexed.

    2. Provide a generator alternative to list comprehensions [3]
       making generator creation as convenient as list creation.

    3. Extend the syntax of the 'yield' keyword to enable two way
       parameter passing.  The resulting increase in power simplifies
       the creation of consumer streams which have a complex execution
       state and/or variable state.

    4. Add a generator method to enable exceptions to be passed to a
       generator.  Currently, there is no clean method for triggering
       exceptions from outside the generator.

    All of the suggestions are designed to take advantage of the
    existing implementation and require little additional effort to
    incorporate.  Each is backward compatible and requires no new
    keywords.


Specification for new built-ins:

    def xfilter( pred, gen ):
        '''
        xfilter(...)
        xfilter(function, sequence) -> list

        Return an iterator containing those items of sequence for
        which function is true.  If function is None, return a list of
        items that are true.
        '''
        if pred is None:
            for i in gen:
                if i:
                    yield i
        else:
            for i in gen:
                if pred(i):
                    yield i

    def xmap( fun, *collections ):
        '''
        xmap(...)
        xmap(function, sequence[, sequence, ...]) -> list

        Return an iterator applying the function to the items of the
        argument collection(s).  If more than one collection is given,
        the function is called with an argument list consisting of the
        corresponding item of each collection, substituting None for
        missing values when not all collections have the same length.
        If the function is None, return a list of the items of the
        collection (or a list of tuples if more than one collection).
        '''
        gens = map(iter, collections)
        values_left = [1]
        def values():
            # Emulate map behaviour, i.e. shorter
            # sequences are padded with None when
            # they run out of values.
            values_left[0] = 0
            for i in range(len(gens)):
                iterator = gens[i]
                if iterator is None:
                    yield None
                else:
                    try:
                        yield iterator.next()
                        values_left[0] = 1
                    except StopIteration:
                        gens[i] = None
                        yield None
        while 1:
            args = tuple(values())
            if not values_left[0]:
                raise StopIteration
            yield func(*args)

    def xzip( *collections ):
        '''
        xzip(...)
        xzip(seq1 [, seq2 [...]]) -> [(seq1[0], seq2[0] ...), (...)]

        Return a iterator of tuples, where each tuple contains the
        i-th element from each of the argument sequences or iterable.
        The returned iterator is truncated in length to the length of
        the shortest argument collection.
        '''
        gens = map(iter, collections)
        while 1:
            yield tuple( [g.next() for g in gens] )

    def indexed( collection, cnt=0, limit=None ):
        'Generates an indexed series:  (0,seqn[0]), (1,seqn[1]) ...'
        gen = iter(collection)
        while limit is None or cnt<limit:
            yield (cnt, collection.next())
            cnt += 1

    Note A: PEP 212 Loop Counter Iteration [2] discussed several
    proposals for achieving indexing.  Some of the proposals only work
    for lists unlike the above function which works for any generator,
    xrange, sequence, or iterable object.  Also, those proposals were
    presented and evaluated in the world prior to Python 2.2 which did
    not include generators.


Specification for Generator Comprehensions:

    If a list comprehension starts with a 'yield' keyword, then
    express the remainder of the statement as generator.  For example:

        g = [yield (len(line),line) for line in file.readline() if len(line)>5]
        print g.next()
        print g.next()

    This would be implemented as if it had been written:

        def __temp_gen():
            for line in file.readline():
                if len(line) > 5:
                    yield (len(line), line)
        g = __temp_gen()
        print g.next()
        print g.next()

    Note A: There is a difference in the above implementation as
    compared to list comprehensions.  For a generator comprehension,
    the variables are created in a separate scope while list
    comprehensions use the enclosing scope.  If this PEP is accepted,
    the parser should generate byte code that eliminates this
    difference by passing the line variable in the enclosing scope and
    using that same variable passed by reference inside the generator.
    This will make the behavior of generator comprehension identical
    to that of list comprehensions.

    Note B: There is some debate about whether the enclosing brackets
    should be part of the syntax for generator comprehensions.  On the
    plus side, it neatly parallels list comprehensions and would be
    immediately recognizable as a similar form with similar internal
    syntax (taking maximum advantage of what people already know).
    More importantly, it sets off the generator comprehension from the
    rest of the function so as to not suggest that the enclosing
    function is a generator (currently the only cue that a function is
    really a generator is the presence of the yield keyword).  On the
    minus side, the brackets may falsely suggest that the whole
    expression returns a list.  All of the feedback received to date
    indicates that brackets do not make a false suggestion and are
    in fact helpful.


Specification for two-way Generator Parameter Passing:

    1. Allow 'yield' to assign a value as in:

        def mygen():
            while 1:
                x = yield None
                print x

    2. Let the .next() method take a value to pass to generator as in:

        g = mygen()
        g.next()       # runs the generators until the first 'yield'
        g.next(1)      # the '1' gets bound to 'x' in mygen()
        g.next(2)      # the '2' gets bound to 'x' in mygen()

    Note A: An early question arose, when would you need this?  The
    answer is that existing generators make it easy to write lazy
    producers which may have a complex execution state and/or complex
    variable state.  This proposal makes it equally easy to write lazy
    consumers which may also have a complex execution or variable
    state.

    For instance, when writing an encoder for arithmetic compression,
    a series of fractional values are sent to a function which has
    periodic output and a complex state which depends on previous
    inputs.  Also, that encoder requires a flush() function when no
    additional fractions are to be output.  It is helpful to think of
    the following parallel with file output streams:

        ostream = file('mydest.txt','w')
        ostream.write(firstdat)
        ostream.write(seconddat)
        ostream.flush()

    With the proposed extensions, it could be written like this:

        def filelike(packagename, appendOrOverwrite):
            cum = []
            if appendOrOverwrite == 'w+':
                cum.extend( packages[packagename] )
            try:
                while 1:
                    dat = yield None
                    cum.append(dat)
            except FlushStream:
                packages[packagename] = cum
        ostream = filelike('mydest','w')
        ostream.next()
        ostream.next(firstdat)
        ostream.next(seconddat)
        ostream.throw( FlushStream )        # this feature discussed below

    Note C: Almost all of the machinery necessary to implement this
    extension is already in place.  The parse syntax needs to be
    modified to accept the new x = yield None syntax and the .next()
    method needs to allow an argument.

    Note D: Some care must be used when writing a values to the
    generator because execution starts at the top of the generator not
    at the first yield.

    Consider the usual flow using .next() without an argument.

        g = mygen(p1) will bind p1 to a local variable and then return a
                generator to be bound to g and NOT run any code in mygen().
        y = g.next() runs the generator from the first line until it
                encounters a yield when it suspends execution and a returns
                a value to be bound to y

    Since the same flow applies when you are submitting values, the
    first call to .next() should have no argument since there is no
    place to put it.

        g = mygen(p1) will bind p1 to a local variable and then return a
                generator to be bound to g and NOT run any code in mygen()
        g.next() will START execution in mygen() from the first line.  Note,
        that there is nowhere to bind any potential arguments that
        might have been supplied to next().  Execution continues
        until the first yield is encountered and control is returned
        to the caller.  
        g.next(val) resumes execution at the yield and binds val to the
                left hand side of the yield assignment and continues running
                until another yield is encountered.  This makes sense because
                you submit values expecting them to be processed right away.


    Q.  Two-way generator parameter passing seems awfully bold.  To
        my mind, one of the great things about generators is that they
        meet the (very simple) definition of an iterator.  With this,
        they no longer do.  I like lazy consumers -- really I do --
        but I'd rather be conservative about putting something like
        this in the language.

    A.  If you don't use x = yield expr, then nothing changes and you
        haven't lost anything.  So, it isn't really bold.  It simply
        adds an option to pass in data as well as take it out.  Other
        generator implementations (like the thread based generator.py)
        already have provisions for two-way parameter passing so that
        consumers are put on an equal footing with producers.  Two-way
        is the norm, not the exception.
        
        Yield is not just a simple iterator creator.  It does
        something else truly wonderful -- it suspends execution and
        saves state.  It is good for a lot more than its original
        purpose.  Dr. Mertz's article [5] shows how they can be used
        to create general purpose co-routines.

        Besides, 98% of the mechanism is already in place.  Only the
        communication needs to be added.  Remember GOSUB which neither
        took nor returned data.  Routines which accepted parameters
        and returned values were a major step forward.

        When you first need to pass information into a generator, the
        existing alternative is clumsy.  It involves setting a global
        variable, calling .next(), and assigning the local from the
        global.


    Q.  Why not introduce another keyword 'accept' for lazy consumers?

    A.  To avoid conflicts with 'yield', to avoid creating a new
        keyword, and to take advantage of the explicit clarity of the
        '=' operator.


    Q.  How often does one need to write a lazy consumer or a co-routine?

    A.  Not often.  But, when you DO have to write one, this approach
        is the easiest to implement, read, and debug.

        It clearly beats using existing generators and passing data
        through global variables.  It is much clearer and easier to
        debug than an equivalent approach using threading, mutexes,
        semaphores, and data queues.  A class based approach competes
        well when there are no complex execution states or variable
        states.  When the complexity increases, generators with
        two-way communication are much simpler because they
        automatically save state unlike classes which must explicitly
        store variable and execution state in instance variables.


    Q.  Why does yield require an argument?  Isn't yield None too wordy?

    A.  It doesn't matter for the purposes of this PEP.  For
        information purposes, here is the reasoning as I understand
        it.  Though return allows an implicit None, some now consider
        this to be weak design.  There is some spirit of "Explicit is
        better than Implicit".  More importantly, in most uses of
        yield, a missing argument is more likely to be a bug than an
        intended yield None.


Specification for Generator Exception Passing:

    Add a .throw(exception) method to the resulting generator as in:

        def mygen():
            try:
                while 1:
                    x = yield None
                    print x
            except FlushStream:
                print 'Done'

        g = mygen()
        g.next(5)
        g.throw(FlushStream)

    There is no existing work around for triggering an exception
    inside a generator.  This is a true deficiency.  It is the only
    case in Python where active code cannot be excepted to or through.
    Even if .next(arg) is not adopted, we should add the .throw()
    method.

    Note A: The name of the throw method was selected for several
    reasons.  Raise is a keyword and so cannot be used as a method
    name.  Unlike raise which immediately raises an exception from the
    current execution point, throw will first return to the generator
    and then raise the exception.  The word throw is suggestive of
    putting the exception in another location.  The word throw is
    already associated with exceptions in other languages.


References
  
    [1] PEP 255 Simple Generators
        http://python.sourceforge.net/peps/pep-0255.html

    [2] PEP 212 Loop Counter Iteration
        http://python.sourceforge.net/peps/pep-0212.html

    [3] PEP 202 List Comprehensions         
        http://python.sourceforge.net/peps/pep-0202.html

    [4] There have been several discussion on comp.lang.python which helped
        tease out these proposals:

        Indexed Function    
        http://groups.google.com/groups?hl=en&th=33f778d92dd5720a

        Xmap, Xfilter, Xzip and Two-way Generator Communication
        http://groups.google.com/groups?hl=en&th=b5e576b02894bb04&rnum=1

        Two-way Generator Communication -- Revised Version
        http://groups.google.com/groups?hl=en&th=cb1d86e68850c592&rnum=1

        Generator Comprehensions
        http://groups.google.com/groups?hl=en&th=215e6e5a7bfd526&rnum=2

        
        http://groups.google.com/groups?hl=en&th=df8b5e7709957eb7  

    [5] Dr. David Mertz's draft column for Charming Python.
        href="http://gnosis.cx/publish/programming/charming_python_b5.txt


Copyright

    This document has been placed in the public domain.



Local Variables:
mode: indented-text
indent-tabs-mode: nil
fill-column: 70
End: