[Python-ideas] should there be a difference between generators and iterators?

Bruce Frederiksen dangyogi at gmail.com
Fri Sep 5 15:08:02 CEST 2008


OK, so point 5 is out.  That leaves the other 4 points...  :-)

My specific use case involves using generators to establish variable 
bindings prior to each yield.  These bindings are then undone after the 
yield in preparation for the next iteration.  When the generator is 
finished, no bindings should remain.  This is basically a "shallow 
binding" scheme (from the lisp community) using generators.  So I have a 
"finally" clause in the generator to clean things up.

And then, to complicate matters, I am not just interested in the output 
of a single application of this generator, but am interested in the 
output applied successively to the items in a tuple (the generator 
applied to each item yielding several times).  So I use 'chain' from 
itertools, with the 'from_iterable' option (which is greatly appreciated!):

   for y in chain.from_iterable(gen(x) for x in sometuple):
       ...

Now, it's possible that the for loop executes a 'break' or that an 
exception will terminate the loop prematurely.  But I still need the 
bindings undone by the last yield from 'gen' (i.e., I need its finally 
clause run).

Currently, when the 'for' loop terminates on CPython, __del__ is 
immediately called, which calls close, which runs the finally clause, 
and all is well!

But on jython and ironpython (both of which are close to releasing a 2.5 
compatible version with these extra generator functions in them for the 
first time), that doesn't work and never will.  So I need to fix this to 
not rely on the garbage collector.

I can't see any way to force the 'finally' clause to run, given that the 
last used 'gen' is hidden inside chain.

This is what prompted my point #1.  It would be nice, in more general 
terms, if using itertools didn't hide the extra behavior that generators 
provide.  It would be nice if I could still use the additional generator 
methods on the objects returned by the itertools and by map if I give 
them a generator as input.  Then I could do:

with closing(chain.from_iterable(gen(x) for x in sometuple)) as g:
   for y in g:
       ...

OK, so moving up another level (getting to point #2), what is happening 
is that the 'finally' clause in a generator isn't being honored by 'for' 
statements.  So 'finally' doesn't really mean much in generators (unless 
you're running on CPython, in which case the garbage collector covers you):

   for x in gen(y):
       ... break  # doesn't run finally in gen on jython or ironpython!

So if point #2 where also adopted (and since 'throw' also runs the 
'finally' clause in the generator) and, if 'break' sends GeneratorExit 
to 'for's iterable (via 'throw') and then ignores the exception when it 
comes back from the 'throw'; then the 'with' should never be required!  
In this case, the 'for' statement would completely honor the 'finally' 
clause of generators passed to it regardless of loop termination, and 
'finally' for generators would mean finally (like it does everywhere 
else) -- even in jython and ironpython.

Adopting point #2 does basically prohibit the straightforward use of the 
same generator in multiple 'for' loops:

   g = gen(x)
   for i in g:
       ... break
   for j in g:
       ...

But the only real use case that I can see for this is with files, which 
don't have a 'throw' method, so point #2 doesn't break that use case.  A 
wrapper that shields the generator from 'throw' would allow the use case 
above to still be done.

I'd don't see any "gotchas" implementing point #1 and a remote gotcha on 
point #2.  And I see these first two points as the important ones, 
because they fix a "bug" in the current definition of the language.  But 
adopting point #1 alone allows me to fix my specific problem using 'with 
closing'.

I do see gotchas with point 3, assuming that the above is already an 
established use case with files (hence point 4).  But I threw them out 
anyway for discussion.  I don't really expect that they'll be adopted, 
unless somebody else sees something in them, or some other way around 
the problem of multiple use of the same iterable.

If point #2 is not adopted, adding the context manager methods to 
generators (like files have) would be a nice touch too!  (point #6 in a 
later post).

I hope all of this helps!

-bruce

Guido van Rossum wrote:
> The proposal to extend all iterators with an additional protocol goes
> directly against one of the design goals for iterators, i.e., that it
> should be easy to implement a new iterator, without subclassing
> something. So any proposal that wants to add new methods to *all*
> iterators (let alone all iterables, which is a much larger set -- read
> abc.py in the 3.0 stdlib for the difference) is doomed. That said,
> having *optional* additions to the protocol might still be reasonably
> debated. On the 3rd hand, I'd like to understand more about your use
> case.
>
> On Thu, Sep 4, 2008 at 7:58 AM, Bruce Frederiksen <dangyogi at gmail.com> wrote:
>   
>> There are several points here that might or might not be adopted
>> individually.
>>
>> The specific prompt for this message is that I've run into a snag using
>> generators with 'itertools.chain.from_iterable'.  I need to be able to
>> control when the generators called by 'chain' get closed (so that their
>> 'finally' clauses are run).
>>
>> Unfortunately, chain does not propagate 'close' (or 'throw' or 'send') back
>> to the generators that it's using.
>>
>> Nor, for that matter do any of the other tools in itertools, or the builtin
>> map function (to my knowledge).
>>
>> I propose the following.  Each of these is independent of the others, but
>> all related to cleaning up how this works:
>>
>> 1.  All of the itertools and map (and I've no doubt left some others out
>> here) be extended to propagate the extra generators methods: close, throw
>> and send.
>>
>> 2.  That the 'for' loop be extended so that if an exception occurs within
>> its body, it calls 'throw' on its iterable (if it has a throw method).
>>
>> 3.  That the 'for' loop be extended to call the 'close' method on its
>> iterable (if it has a close method) when the loop terminates (either
>> normally, with break, or with an exception).
>>
>> 4.  That a 'not_closing' builtin function be added that takes an iterable
>> and shields it from a 'close' call.
>>
>> 5.  That 'close' and 'throw' be added to all iterables.
>>
>>
>> Motivation:
>>
>> The motivation for each proposal is (by their number):
>>
>> 1.  This is the one that I'm specifically stuck on.  I was relying on
>> garbage collection to do this, but this doesn't work in jython and
>> ironpython...  Since the chain function is dealing with two iterables (an
>> inner and outer iterable), I think that it makes sense for it to check
>> whether each of these have the extra methods or not.  For example, the inner
>> iterable may have a 'close', but not the outer iterable (or vise versa).
>>  This shouldn't cause an error if 'close' is called on the chain.
>>
>> This one, specifically, would be helpful for me to move to Python 3K; so the
>> sooner the better!  (Please!)
>>
>> 2.  There has been some discussion here about extending 'for' loops that has
>> touched on non-local continue/break capability.  If step 2 is provided, this
>> capability could be provided as follows:
>>
>>   class funky:
>>       ...
>>       def continue_(self): raise ContinueError(self)
>>       def break_(self): raise BreakError(self)
>>       def throw(self, type, value, tb):
>>           if issubclass(type, ContinueError) and value.who is self:
>>               return next(self)
>>           if issubclass(type, BreakError) and value.who is self:
>>               raise StopIteration
>>
>>   top = funky(iterable1)
>>   for x in top:
>>       middle = funky(iterable2)
>>       for y in middle:
>>           bottom = funky(iterable3)
>>           for z in bottom:
>>               ...
>>               middle.continue_()
>>
>> 3.  But this is a problem with:
>>
>>    for line in filex:
>>        if test1(line): break
>>    for line in filex:
>>        ...
>>
>>    which brings us to:
>>
>> 4.  A solution to the above:
>>
>>    for line in not_closing(filex):
>>        if test1(line): break
>>    for line in filex:
>>        ...
>>
>> 5.  I thought that I may as well throw this in for discussion...  This might
>> cause some consternation to those who has written their own iterables...
>>
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> http://mail.python.org/mailman/listinfo/python-ideas
>>
>>     
>
>
>
>   




More information about the Python-ideas mailing list