[Python-ideas] Deterministic iterator cleanup

Neil Girdhar mistersheik at gmail.com
Wed Oct 19 16:29:54 EDT 2016


Ohhh, sorry, you want __iterclose__ to happen when iteration is terminated 
by a break statement as well?   Okay, I understand, and that's fair.

However, I would rather that people be explicit about when they're 
iterating (use the iteration protocol) and when they're managing a resource 
(use a context manager).  Trying to figure out where the context manager 
should go automatically (which is what it sounds like the proposal amounts 
to) is too difficult to get right, and when you get it wrong you close too 
early, and then what's the user supposed to do?  Suppress the early close 
with an even more convoluted notation?

If there is a problem with people iterating over things without a 
generator, my suggestion is to force them to use the generator.  For 
example, don't make your object iterable: make the value yielded by the 
context manager iterable.

Best,

Neil

(On preview, Re: Chris Angelico's refactoring of my code, nice!!)

On Wednesday, October 19, 2016 at 4:14:32 PM UTC-4, Neil Girdhar wrote:
>
>
>
> On Wed, Oct 19, 2016 at 2:11 PM Nathaniel Smith <njs at pobox.com> wrote:
>
>> On Wed, Oct 19, 2016 at 10:08 AM, Neil Girdhar <mistersheik at gmail.com> 
>> wrote:
>> >
>> >
>> > On Wed, Oct 19, 2016 at 11:08 AM Todd <toddrjen at gmail.com> wrote:
>> >>
>> >> On Wed, Oct 19, 2016 at 3:38 AM, Neil Girdhar <mistersheik at gmail.com>
>> >> wrote:
>> >>>
>> >>> This is a very interesting proposal.  I just wanted to share 
>> something I
>> >>> found in my quick search:
>> >>>
>> >>>
>> >>> 
>> http://stackoverflow.com/questions/14797930/python-custom-iterator-close-a-file-on-stopiteration
>> >>>
>> >>> Could you explain why the accepted answer there doesn't address this
>> >>> issue?
>> >>>
>> >>> class Parse(object):
>> >>>     """A generator that iterates through a file"""
>> >>>     def __init__(self, path):
>> >>>         self.path = path
>> >>>
>> >>>   def __iter__(self):
>> >>>         with open(self.path) as f:
>> >>>             yield from f
>>
>> BTW it may make this easier to read if we notice that it's essentially
>> a verbose way of writing:
>>
>> def parse(path):
>>     with open(path) as f:
>>         yield from f
>>
>> >>
>> >> I think the difference is that this new approach guarantees cleanup the
>> >> exact moment the loop ends, no matter how it ends.
>> >>
>> >> If I understand correctly, your approach will do cleanup when the loop
>> >> ends only if the iterator is exhausted.  But if someone zips it with a
>> >> shorter iterator, uses itertools.islice or something similar, breaks 
>> the
>> >> loop, returns inside the loop, or in some other way ends the loop 
>> before the
>> >> iterator is exhausted, the cleanup won't happen when the iterator is 
>> garbage
>> >> collected.  And for non-reference-counting python implementations, 
>> when this
>> >> happens is completely unpredictable.
>> >>
>> >> --
>> >
>> >
>> > I don't see that.  The "cleanup" will happen when collection is 
>> interrupted
>> > by an exception.  This has nothing to do with garbage collection either
>> > since the cleanup happens deterministically when the block is ended.  If
>> > this is the only example, then I would say this behavior is already 
>> provided
>> > and does not need to be added.
>>
>> I think there might be a misunderstanding here. Consider code like
>> this, that breaks out from the middle of the for loop:
>>
>> def use_that_generator():
>>     for line in parse(...):
>>         if found_the_line_we_want(line):
>>             break
>>     # -- mark --
>>     do_something_with_that_line(line)
>>
>> With current Python, what will happen is that when we reach the marked
>> line, then the for loop has finished and will drop its reference to
>> the generator object. At this point, the garbage collector comes into
>> play. On CPython, with its reference counting collector, the garbage
>> collector will immediately collect the generator object, and then the
>> generator object's __del__ method will restart 'parse' by having the
>> last 'yield' raise a GeneratorExit, and *that* exception will trigger
>> the 'with' block's cleanup. But in order to get there, we're
>> absolutely depending on the garbage collector to inject that
>> GeneratorExit. And on an implementation like PyPy that doesn't use
>> reference counting, the generator object will become collect*ible* at
>> the marked line, but might not actually be collect*ed* for an
>> arbitrarily long time afterwards. And until it's collected, the file
>> will remain open. 'with' blocks guarantee that the resources they hold
>> will be cleaned up promptly when the enclosing stack frame gets
>> cleaned up, but for a 'with' block inside a generator then you still
>> need something to guarantee that the enclosing stack frame gets
>> cleaned up promptly!
>>
>
> Yes, I understand that.  Maybe this is clearer.  This class adds an 
> iterclose to any iterator so that when iteration ends, iterclose is 
> automatically called:
>
> def my_iterclose():
>     print("Closing!")
>
>
> class AddIterclose:
>
>     def __init__(self, iterable, iterclose):
>         self.iterable = iterable
>         self.iterclose = iterclose
>
>     def __iter__(self):
>         try:
>             for x in self.iterable:
>                 yield x
>         finally:
>             self.iterclose()
>
>
> try:
>     for x in AddIterclose(range(10), my_iterclose):
>         print(x)
>         if x == 5:
>             raise ValueError
> except:
>     pass
>
>  
>
>>
>> This proposal is about providing that thing -- with __(a)iterclose__,
>> the end of the for loop immediately closes the generator object, so
>> the garbage collector doesn't need to get involved.
>>
>> Essentially the same thing happens if we replace the 'break' with a
>> 'raise'. Though with exceptions, things can actually get even messier,
>> even on CPython. Here's a similar example except that (a) it exits
>> early due to an exception (which then gets caught elsewhere), and (b)
>> the invocation of the generator function ended up being kind of long,
>> so I split the for loop into two lines with a temporary variable:
>>
>> def use_that_generator2():
>>     it = 
>> parse("/a/really/really/really/really/really/really/really/long/path")
>>     for line in it:
>>         if not valid_format(line):
>>             raise ValueError()
>>
>> def catch_the_exception():
>>     try:
>>         use_that_generator2()
>>     except ValueError:
>>         # -- mark --
>>         ...
>>
>> Here the ValueError() is raised from use_that_generator2(), and then
>> caught in catch_the_exception(). At the marked line,
>> use_that_generator2's stack frame is still pinned in memory by the
>> exception's traceback. And that means that all the local variables are
>> also pinned in memory, including our temporary 'it'. Which means that
>> parse's stack frame is also pinned in memory, and the file is not
>> closed.
>>
>> With the __(a)iterclose__ proposal, when the exception is thrown then
>> the 'for' loop in use_that_generator2() immediately closes the
>> generator object, which in turn triggers parse's 'with' block, and
>> that closes the file handle. And then after the file handle is closed,
>> the exception continues propagating. So at the marked line, it's still
>> the case that 'it' will be pinned in memory, but now 'it' is a closed
>> generator object that has already relinquished its resources.
>>
>> -n
>>
>> --
>> Nathaniel J. Smith -- https://vorpus.org
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20161019/a7310756/attachment-0001.html>


More information about the Python-ideas mailing list