question about generators

Thu Aug 15 22:40:26 EDT 2002

On Thu, 15 Aug 2002 17:30:37 GMT, Andrew Koenig <ark at research.att.com> wrote:

>Aha!  I just realized part of the origin of my puzzlement.
>
>Yield actually does one of two very different things depending
>on context.  Consider:
>
>        def f():
>                yield 1
>                yield 2
>
>The "yield 1" statement does two things:
AFAIK actually not. You don't get to a yield statement by calling f().
You can get to the first one by calling f().next(), and if you want
to continue to the second one, you have to call .next() of what f()
returned a second time. E.g.,
 >>> def f():
 ...     yield 1
 ...     yield 2
 ...
 >>> fgen = f()
 >>> fgen.next()
 1
 >>> fgen.next()
 2
 >>> fgen.next()
 Traceback (most recent call last):
   File "<stdin>", line 1, in ?
 StopIteration

But successive call to f() just get you more fresh generator objects:
 >>> f()
 <generator object at 0x00852BB0>
 >>> f()
 <generator object at 0x008526D0>
 >>> f()
 <generator object at 0x00852BB0>

>
>        1) It creates a generator object and returns that object to
            ^^ f() yes, yield no
>           the caller;
>
>        2) When the caller calls the generator's "next" method,
>           it passes 1 back to the caller.
>
>The "yield 2" statement does only one thing: It passes 2 back to the
>caller.
>
>Let's call these two statements "type 1" and "type 2" yield
>statements.
AFAIK there aren't two kinds. And you don't get to any yield by calling f().

>
>How do we know which yield statement does what?  The first statement
>executed in a function is type 1, all others, including re-executions
>of that same statement, are type 2.
>
>If a function calls another function that also contains a yield
>statement, the first yield statement executed in that function is
>also type 1.
>
>There is no way of executing a type 2 yield statement in a function
>without first executing a type 1 yield statement in the same function.
>
Well, I don't think it's the yield statements per se that get executed
to do different things. Personally, I don't like f() returning a generator object,
when -- prior to insertions of one or more yields in its body -- f() would do the
usual function execution. That effectively makes def f(...) bind the name f
to a factory function instead of a function. You need a class (or maybe closure-maker) so
that a generator instance can have state and keep track of it, and the factory function
can return an instance of the generator class, but I think it's misleading to substitute
a factory function bound to the same name as the function that previously didn't have
any yield statements. I.e.,

    def foo():
        yield 1
        yield 2

logically makes foo into a factory function, not the original function. I.e.,

    foogen = foo()

does not call the apparent foo() function at all. foo() now generates a class
instance that has a method .next(), which is used to execute successive chunks
of the purported foo whose name was stolen. With a fresh foogen instance, naturally
foogen.next() executes the code in the original foo up to the first yield and returns.
The foogen state can then save a start location just past the yield for use when
foogen.next() is called next time.

I think I would rather have had the compiler introduce a __makegen__ attribute to
foo when it saw yield and wanted to make a factory function. Thus foo.__makegen__()
would return what foo() does now, leaving foo() to raise a warning exception complaining
about an unbound interator method or something like that if called by itself. 
One could speculate about contexts for dynamically associating foo with a generator
instance, and advancing its state for the result of its next yield point, but that
would be a longer discussion... 

The
    for i in foo(): ...
construct would look for foo.__makegen__() first, but lacking __makegen__ it would
of course call foo() as an ordinary function and expect an iterable object back
as it does now. This obviously has to be preserved so as to make e.g.,

    def foo(): return [1,2,3]
work as expected in
    for i in foo(): ...

But IMO implicit switcheroos using the same name are not too Pythonic ;-/
If you disassemble foo you get no hint in the byte codes that foo() is not going to
execute that code, but instead is going to return an object whose .next()
method will somehow make controlled use of the code you see disassembled.

BTW, ISTM the semantics of generator creation are very analogous to those of thread creation.
I.e., there is a computing thing whose execution state advances, and whose state needs
to be kept track of. A yield is analogous to setting a condition variable that the 'caller' is
waiting on, and suspending execution until getting a signal to resume. Except with a generator
the context switches are synchronous.

>These facts mean that yield statements break a form of abstraction
>that is commonplace in many other contexts: the ability to take a
>collection of statements and put them in a function.  For example:
>
>        def f():
>                <statement 1>
>                <statement 2>
>                <statement 3>
>                <statement 4>
>
>Under ordinary circumstances, I can rewrite this as
>
>        def f():
>                <statement 1>
>                g()
>                <statement 4>
>
>        def g():
>                <statement 2>
>                <statement 3>
>
>without changing the meaning of the program (provided that statement 2
>and statement 3 do not refer to local variables of f).
>
>However, if I use yield statements, the abstraction breaks down:
>
>        def f():
>                yield 1
>                yield 2
>                yield 3
>                yield 4
>
>is not equivalent to
>
>        def f():
>                yield 1
>                g()
>                yield 4
>
>        def g():
>                yield 2
>                yield 3
>
I think that's because a function definition is implicitly converted to
a factory function definition by putting yield(s) in the body.
(What happened to the 'explicit is good' zen ;-)

So your first call to f() doesn't give you 1, and neither does a second call to f().
It gives you a fresh object created by the factory function going by the name f.
You have to set theobj = f() and call theobj.next() to get action defined by the
original function.

>I think that Tim's "yield every g()" notion is really a way of saying
>``I want to call g, but I want every yield in g and anything it calls
>to be considered type 2, not type 1.''

I think f() and g() are always 'type 1' if there's a yield in their definition.
You have to use the objects they return to get 'type 2' results.
Successive 'type 2' (yield) results are produced by repeatedly calling
the .next() method of a single 'type 1' object returned by f() or g().
Calling f() repeatedly gives you successive freshly initialized 'type 1'
objects.

IOW "yield every g()" is nice sugar for doing the type 1 iterobj=g() call
and then yielding every available (and always type 2) iterobj.next() result.

Regards,
Bengt Richter