should there be a difference between generators and iterators?
There are several points here that might or might not be adopted individually. The specific prompt for this message is that I've run into a snag using generators with 'itertools.chain.from_iterable'. I need to be able to control when the generators called by 'chain' get closed (so that their 'finally' clauses are run). Unfortunately, chain does not propagate 'close' (or 'throw' or 'send') back to the generators that it's using. Nor, for that matter do any of the other tools in itertools, or the builtin map function (to my knowledge). I propose the following. Each of these is independent of the others, but all related to cleaning up how this works: 1. All of the itertools and map (and I've no doubt left some others out here) be extended to propagate the extra generators methods: close, throw and send. 2. That the 'for' loop be extended so that if an exception occurs within its body, it calls 'throw' on its iterable (if it has a throw method). 3. That the 'for' loop be extended to call the 'close' method on its iterable (if it has a close method) when the loop terminates (either normally, with break, or with an exception). 4. That a 'not_closing' builtin function be added that takes an iterable and shields it from a 'close' call. 5. That 'close' and 'throw' be added to all iterables. Motivation: The motivation for each proposal is (by their number): 1. This is the one that I'm specifically stuck on. I was relying on garbage collection to do this, but this doesn't work in jython and ironpython... Since the chain function is dealing with two iterables (an inner and outer iterable), I think that it makes sense for it to check whether each of these have the extra methods or not. For example, the inner iterable may have a 'close', but not the outer iterable (or vise versa). This shouldn't cause an error if 'close' is called on the chain. This one, specifically, would be helpful for me to move to Python 3K; so the sooner the better! (Please!) 2. There has been some discussion here about extending 'for' loops that has touched on non-local continue/break capability. If step 2 is provided, this capability could be provided as follows: class funky: ... def continue_(self): raise ContinueError(self) def break_(self): raise BreakError(self) def throw(self, type, value, tb): if issubclass(type, ContinueError) and value.who is self: return next(self) if issubclass(type, BreakError) and value.who is self: raise StopIteration top = funky(iterable1) for x in top: middle = funky(iterable2) for y in middle: bottom = funky(iterable3) for z in bottom: ... middle.continue_() 3. But this is a problem with: for line in filex: if test1(line): break for line in filex: ... which brings us to: 4. A solution to the above: for line in not_closing(filex): if test1(line): break for line in filex: ... 5. I thought that I may as well throw this in for discussion... This might cause some consternation to those who has written their own iterables...
From: "Bruce Frederiksen"
1. All of the itertools and map (and I've no doubt left some others out here) be extended to propagate the extra generators methods: close, throw and send.
Try wrapping each input iterator with an autocloser: def closer(it): for elem in it: yield elem if hasattr(it, 'close'): it.close() map(somefunc, closer(someiterable)) Raymond
Raymond Hettinger wrote:
From: "Bruce Frederiksen"
1. All of the itertools and map (and I've no doubt left some others out here) be extended to propagate the extra generators methods: close, throw and send.
Try wrapping each input iterator with an autocloser:
def closer(it): for elem in it: yield elem if hasattr(it, 'close'): it.close()
map(somefunc, closer(someiterable)) I don't think that this adds anything. If the 'it' iterable is run to completion (and, presumably, does what 'close' will do anyway), this adds nothing. And if the 'it' iterable is not run to completion, then neither is closer, and so it still adds nothing. (And placing the 'if hasattr' clause under a 'finally' doesn't help, because the 'finally' is only run if the generator runs to completion or 'close' is called, and nobody is ever going to call close here on closer). What I'm looking for is making the following work:
with contextlib.closing(map(somefunc, someiterable)) as it: for i in it: if somecondition: break # at this point someiterable.close should have been run Intuitively, I would think that the above should work, but it doesn't. Intuitively, I would think that if Python defines 'close' and 'throw' methods to allow generators to properly clean up after themselves, that itertools and the 'for' statement would both honor these so that the 'with' statement isn't even required here. I can see arguments for not calling 'close' automatically in 'for' statements (though none for not calling 'throw' automatically); but I don't see any arguments against itertools honoring these methods. If any of the itertools is passing values back from a generator (rather than a simple iterator), it would be very nice to retain the full semantics of the generator. Now, for map, you could suggest that Python programmers understand these subtleties and write instead: with contextlib.closing(someiterable) as it: for i in map(somefunc, it): and this works for map. But what about chain? with contextlib.closing(chain.from_iterable(map(somegenerator, someiterable))) as it: for i in it: This is my specific situation, and I need 'close' to be called on the currently active somegenerator within chain before the line following 'with' is executed. In my particular case, I don't think I need close on someiterable, but it seems to make sense to do this too. Currently, I'm not using the 'with' clause and am relying on CPython's garbage collector to immediately call __del__ (which is defined to call close) when the 'for' statement abandons chain. But this doesn't work in jython or ironpython... -bruce And, BTW, thank you for adding from_iterable to chain!
I guess while we're at it, I'd like to add the following to the discussion:
6. Add the context manager methods (__enter__ and __exit__) to
generators and, by extension, to itertools. This makes it easier to
use 'with' to get the clean up ('finally' clauses) done.
On Thu, Sep 4, 2008 at 10:58 AM, Bruce Frederiksen
There are several points here that might or might not be adopted individually.
[...]
I propose the following. Each of these is independent of the others, but all related to cleaning up how this works:
1. All of the itertools and map (and I've no doubt left some others out here) be extended to propagate the extra generators methods: close, throw and send.
2. That the 'for' loop be extended so that if an exception occurs within its body, it calls 'throw' on its iterable (if it has a throw method).
3. That the 'for' loop be extended to call the 'close' method on its iterable (if it has a close method) when the loop terminates (either normally, with break, or with an exception).
4. That a 'not_closing' builtin function be added that takes an iterable and shields it from a 'close' call.
5. That 'close' and 'throw' be added to all iterables.
The proposal to extend all iterators with an additional protocol goes
directly against one of the design goals for iterators, i.e., that it
should be easy to implement a new iterator, without subclassing
something. So any proposal that wants to add new methods to *all*
iterators (let alone all iterables, which is a much larger set -- read
abc.py in the 3.0 stdlib for the difference) is doomed. That said,
having *optional* additions to the protocol might still be reasonably
debated. On the 3rd hand, I'd like to understand more about your use
case.
On Thu, Sep 4, 2008 at 7:58 AM, Bruce Frederiksen
There are several points here that might or might not be adopted individually.
The specific prompt for this message is that I've run into a snag using generators with 'itertools.chain.from_iterable'. I need to be able to control when the generators called by 'chain' get closed (so that their 'finally' clauses are run).
Unfortunately, chain does not propagate 'close' (or 'throw' or 'send') back to the generators that it's using.
Nor, for that matter do any of the other tools in itertools, or the builtin map function (to my knowledge).
I propose the following. Each of these is independent of the others, but all related to cleaning up how this works:
1. All of the itertools and map (and I've no doubt left some others out here) be extended to propagate the extra generators methods: close, throw and send.
2. That the 'for' loop be extended so that if an exception occurs within its body, it calls 'throw' on its iterable (if it has a throw method).
3. That the 'for' loop be extended to call the 'close' method on its iterable (if it has a close method) when the loop terminates (either normally, with break, or with an exception).
4. That a 'not_closing' builtin function be added that takes an iterable and shields it from a 'close' call.
5. That 'close' and 'throw' be added to all iterables.
Motivation:
The motivation for each proposal is (by their number):
1. This is the one that I'm specifically stuck on. I was relying on garbage collection to do this, but this doesn't work in jython and ironpython... Since the chain function is dealing with two iterables (an inner and outer iterable), I think that it makes sense for it to check whether each of these have the extra methods or not. For example, the inner iterable may have a 'close', but not the outer iterable (or vise versa). This shouldn't cause an error if 'close' is called on the chain.
This one, specifically, would be helpful for me to move to Python 3K; so the sooner the better! (Please!)
2. There has been some discussion here about extending 'for' loops that has touched on non-local continue/break capability. If step 2 is provided, this capability could be provided as follows:
class funky: ... def continue_(self): raise ContinueError(self) def break_(self): raise BreakError(self) def throw(self, type, value, tb): if issubclass(type, ContinueError) and value.who is self: return next(self) if issubclass(type, BreakError) and value.who is self: raise StopIteration
top = funky(iterable1) for x in top: middle = funky(iterable2) for y in middle: bottom = funky(iterable3) for z in bottom: ... middle.continue_()
3. But this is a problem with:
for line in filex: if test1(line): break for line in filex: ...
which brings us to:
4. A solution to the above:
for line in not_closing(filex): if test1(line): break for line in filex: ...
5. I thought that I may as well throw this in for discussion... This might cause some consternation to those who has written their own iterables...
_______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
-- --Guido van Rossum (home page: http://www.python.org/~guido/)
OK, so point 5 is out. That leaves the other 4 points... :-) My specific use case involves using generators to establish variable bindings prior to each yield. These bindings are then undone after the yield in preparation for the next iteration. When the generator is finished, no bindings should remain. This is basically a "shallow binding" scheme (from the lisp community) using generators. So I have a "finally" clause in the generator to clean things up. And then, to complicate matters, I am not just interested in the output of a single application of this generator, but am interested in the output applied successively to the items in a tuple (the generator applied to each item yielding several times). So I use 'chain' from itertools, with the 'from_iterable' option (which is greatly appreciated!): for y in chain.from_iterable(gen(x) for x in sometuple): ... Now, it's possible that the for loop executes a 'break' or that an exception will terminate the loop prematurely. But I still need the bindings undone by the last yield from 'gen' (i.e., I need its finally clause run). Currently, when the 'for' loop terminates on CPython, __del__ is immediately called, which calls close, which runs the finally clause, and all is well! But on jython and ironpython (both of which are close to releasing a 2.5 compatible version with these extra generator functions in them for the first time), that doesn't work and never will. So I need to fix this to not rely on the garbage collector. I can't see any way to force the 'finally' clause to run, given that the last used 'gen' is hidden inside chain. This is what prompted my point #1. It would be nice, in more general terms, if using itertools didn't hide the extra behavior that generators provide. It would be nice if I could still use the additional generator methods on the objects returned by the itertools and by map if I give them a generator as input. Then I could do: with closing(chain.from_iterable(gen(x) for x in sometuple)) as g: for y in g: ... OK, so moving up another level (getting to point #2), what is happening is that the 'finally' clause in a generator isn't being honored by 'for' statements. So 'finally' doesn't really mean much in generators (unless you're running on CPython, in which case the garbage collector covers you): for x in gen(y): ... break # doesn't run finally in gen on jython or ironpython! So if point #2 where also adopted (and since 'throw' also runs the 'finally' clause in the generator) and, if 'break' sends GeneratorExit to 'for's iterable (via 'throw') and then ignores the exception when it comes back from the 'throw'; then the 'with' should never be required! In this case, the 'for' statement would completely honor the 'finally' clause of generators passed to it regardless of loop termination, and 'finally' for generators would mean finally (like it does everywhere else) -- even in jython and ironpython. Adopting point #2 does basically prohibit the straightforward use of the same generator in multiple 'for' loops: g = gen(x) for i in g: ... break for j in g: ... But the only real use case that I can see for this is with files, which don't have a 'throw' method, so point #2 doesn't break that use case. A wrapper that shields the generator from 'throw' would allow the use case above to still be done. I'd don't see any "gotchas" implementing point #1 and a remote gotcha on point #2. And I see these first two points as the important ones, because they fix a "bug" in the current definition of the language. But adopting point #1 alone allows me to fix my specific problem using 'with closing'. I do see gotchas with point 3, assuming that the above is already an established use case with files (hence point 4). But I threw them out anyway for discussion. I don't really expect that they'll be adopted, unless somebody else sees something in them, or some other way around the problem of multiple use of the same iterable. If point #2 is not adopted, adding the context manager methods to generators (like files have) would be a nice touch too! (point #6 in a later post). I hope all of this helps! -bruce Guido van Rossum wrote:
The proposal to extend all iterators with an additional protocol goes directly against one of the design goals for iterators, i.e., that it should be easy to implement a new iterator, without subclassing something. So any proposal that wants to add new methods to *all* iterators (let alone all iterables, which is a much larger set -- read abc.py in the 3.0 stdlib for the difference) is doomed. That said, having *optional* additions to the protocol might still be reasonably debated. On the 3rd hand, I'd like to understand more about your use case.
On Thu, Sep 4, 2008 at 7:58 AM, Bruce Frederiksen
wrote: There are several points here that might or might not be adopted individually.
The specific prompt for this message is that I've run into a snag using generators with 'itertools.chain.from_iterable'. I need to be able to control when the generators called by 'chain' get closed (so that their 'finally' clauses are run).
Unfortunately, chain does not propagate 'close' (or 'throw' or 'send') back to the generators that it's using.
Nor, for that matter do any of the other tools in itertools, or the builtin map function (to my knowledge).
I propose the following. Each of these is independent of the others, but all related to cleaning up how this works:
1. All of the itertools and map (and I've no doubt left some others out here) be extended to propagate the extra generators methods: close, throw and send.
2. That the 'for' loop be extended so that if an exception occurs within its body, it calls 'throw' on its iterable (if it has a throw method).
3. That the 'for' loop be extended to call the 'close' method on its iterable (if it has a close method) when the loop terminates (either normally, with break, or with an exception).
4. That a 'not_closing' builtin function be added that takes an iterable and shields it from a 'close' call.
5. That 'close' and 'throw' be added to all iterables.
Motivation:
The motivation for each proposal is (by their number):
1. This is the one that I'm specifically stuck on. I was relying on garbage collection to do this, but this doesn't work in jython and ironpython... Since the chain function is dealing with two iterables (an inner and outer iterable), I think that it makes sense for it to check whether each of these have the extra methods or not. For example, the inner iterable may have a 'close', but not the outer iterable (or vise versa). This shouldn't cause an error if 'close' is called on the chain.
This one, specifically, would be helpful for me to move to Python 3K; so the sooner the better! (Please!)
2. There has been some discussion here about extending 'for' loops that has touched on non-local continue/break capability. If step 2 is provided, this capability could be provided as follows:
class funky: ... def continue_(self): raise ContinueError(self) def break_(self): raise BreakError(self) def throw(self, type, value, tb): if issubclass(type, ContinueError) and value.who is self: return next(self) if issubclass(type, BreakError) and value.who is self: raise StopIteration
top = funky(iterable1) for x in top: middle = funky(iterable2) for y in middle: bottom = funky(iterable3) for z in bottom: ... middle.continue_()
3. But this is a problem with:
for line in filex: if test1(line): break for line in filex: ...
which brings us to:
4. A solution to the above:
for line in not_closing(filex): if test1(line): break for line in filex: ...
5. I thought that I may as well throw this in for discussion... This might cause some consternation to those who has written their own iterables...
_______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
Bruce Frederiksen wrote:
what is happening is that the 'finally' clause in a generator isn't being honored by 'for' statements.
It's not all that clear whether the for-loop should be doing anything special to force a generator to finalize if the loop is exited prematurely. The for-loop isn't necessarily the only thing using the iterator -- other code may want to carry on getting items from it, in which case you *don't* want it forcibly terminated. More generally, I'm a bit worried by all the extra complications that generators seem to be accruing. If it becomes a general expectation that an iterator based on other iterators is supposed to pass on all these special conditions, it's going to put a big burden on implementors of iterators, and turn what ought to be very simple and straightforward code into something convoluted. -- Greg
Greg Ewing wrote:
Bruce Frederiksen wrote:
what is happening is that the 'finally' clause in a generator isn't being honored by 'for' statements.
It's not all that clear whether the for-loop should be doing anything special to force a generator to finalize if the loop is exited prematurely. The for-loop isn't necessarily the only thing using the iterator -- other code may want to carry on getting items from it, in which case you *don't* want it forcibly terminated.
More generally, I'm a bit worried by all the extra complications that generators seem to be accruing. If it becomes a general expectation that an iterator based on other iterators is supposed to pass on all these special conditions, it's going to put a big burden on implementors of iterators, and turn what ought to be very simple and straightforward code into something convoluted.
It doesn't seem that difficult to pass 'close' on to your child iterator. Passing 'throw' on might require interpreting non-error results from the child iterator, which would depend on what your iterator is doing. If there are places where this is too complicated, then don't implement 'throw' for that iterator. And the same thing for 'send' as for 'throw'. But the most important one is 'close', because this runs the 'finally' clause in the generator. Looking through the itertools, a user of any of these except 'chain' can capture the input iterable(s) in a with closing clause prior to calling the itertools function. So for everything except the 'chain' function the 'close' is more of a "nice to have" rather than a "must have". But for the 'chain' function, the caller of 'chain' does not have access to the iterables generated by the iterable passed to 'chain' (specifically with 'from_iterable'). Thus, adding 'close' to 'chain' is more of a "must have". (Well, I guess you could argue that the user should write his own chain function with a 'close' capability rather than using the itertools function). So 'chain' would become equivalent to: def chain(*iterables): # chain('ABC', 'DEF') --> A B C D E F it = None try: for it in iterables: # <---- 'it' is the inaccessible iterator! for element in it: yield element finally: if hasattr(it, 'close'): it.close() if hasattr(iterables, 'close'): iterables.close() -bruce
I think it makes sense that itertools should pass on throw, etc. I'd be interested in whether anything would break on this change. I don't think I like the other suggestions. Having an exception in some random part of a loop throw *into* the iterator of the loop, just seems weird. For the examples, you give, couldn't break_ do the throw itself? for line in filex:
if test1(line): break for line in filex: ...
This example convinces me that 3 could introduce untold chaos into existing code. Wouldn't something like this make more sense? for line in filex: if test1(line): break finally: filex.close() (And yes, I know the time machine can do that with one extra word and perhaps with handles this better.) To get __enter__ and __exit__ behavior for an iterator, can't you just wrap it in class that provides that capability and calls close? You might need itertools to have some support that extended iterator class but that seems simpler. --- Bruce
Bruce Leban wrote:
I don't think I like the other suggestions. Having an exception in some random part of a loop throw *into* the iterator of the loop, just seems weird. For the examples, you give, couldn't break_ do the throw itself?
First, the easy part. Yes break could also do a throw (see my response to Guido). There are two ways to try to visualize this. First explanation: A function, say 'bar', has input and produces output. When it raises an exception, there could be two reasons for this: 1. It doesn't like its input (in which case the input might be fixed and new values provided), or 2. It's unable to produce its output. With traditional functions, the caller of 'bar' is responsible for providing its input and also receives its output: input = foo(...) output = bar(input) or, simply: output = bar(foo(...)) Since the function called to produce the input ('foo') is no longer around when 'bar' is called, the distinction above hasn't been important, because either way the caller has to deal with the problem. In the first case, the caller might produce some other input value and call 'bar' again. In the second case, the caller must proceed without the output. But when generators provide input to a function, the generator is still around when the function is run. So it makes sense, in the first case, to raise the exception in the generator and give it a chance to fix the input value. And this is exactly how the new (in 2.5) 'throw' method is defined to act on the generator side (in PEP 342). If we knew which exceptions meant "bad input" vs "output not possible", we could only raise the first kind in the generator. But we don't know this. So it makes sense to first raise all exceptions on the input side in the generator. If the generator recognizes the exception (i.e., as an 'input error' exception) and can fix the problem, then 'bar' may still be able to produce output. If not, then forward the exception on to the output side of 'bar' (as an 'output not possible' exception). Applying this logic to the 'for' statement is what leads to my point #2: for input in foo(...): output = bar(input) If 'bar' raises an exception, it should first go to 'foo' (if 'foo' has a 'throw' method), and then to the outer block containing the 'for' statement. If the generator's 'throw' method returns a value, then the 'for' statement would assign this value to 'input' and run its body again, proceeding normally (the exception has been taken care of). If the generator's 'throw' method does not handle the exception, then it is re-raised in the outer block containing the 'for' statement. Second explanation: Reading the definition of the 'throw' method for generators in PEP 342, I naturally thought that the 'for' statement would abide by this new protocol. I was surprised to learn that it didn't. Since generators are nearly always used in a 'for' statement, how is this new method to be utilized? This isn't easily done. The code ends up looking like: g = foo(...) for input in g: while True: try: output = bar(input) break # from 'while', can't easily break from 'for' anymore... except Exception: input = g.throw(*sys.exc_info()) Yikes!
To get __enter__ and __exit__ behavior for an iterator, can't you just wrap it in class that provides that capability and calls close?
Sure, contextlib.closing. But, just as it's nice that files support __enter__ and __exit__, it would be nice if other objects that need to be closed (sockets, generators, etc) did too. And, with the example set by 'file', one is lead to expect this support in these other cases... Since there is no need to clean up after iterators in general, but only for generators specifically; and since the BDFL has nixed my point #5, it makes sense to only add the __enter__ and __exit__ to generators. (And, by extension, itertools).
You might need itertools to have some support that extended iterator class but that seems simpler. I don't follow you here.
-bruce
participants (5)
-
Bruce Frederiksen
-
Bruce Leban
-
Greg Ewing
-
Guido van Rossum
-
Raymond Hettinger