[Python-ideas] Deterministic iterator cleanup

Neil Girdhar mistersheik at gmail.com
Wed Oct 19 16:14:19 EDT 2016


On Wed, Oct 19, 2016 at 2:11 PM Nathaniel Smith <njs at pobox.com> wrote:

> On Wed, Oct 19, 2016 at 10:08 AM, Neil Girdhar <mistersheik at gmail.com>
> wrote:
> >
> >
> > On Wed, Oct 19, 2016 at 11:08 AM Todd <toddrjen at gmail.com> wrote:
> >>
> >> On Wed, Oct 19, 2016 at 3:38 AM, Neil Girdhar <mistersheik at gmail.com>
> >> wrote:
> >>>
> >>> This is a very interesting proposal.  I just wanted to share something
> I
> >>> found in my quick search:
> >>>
> >>>
> >>>
> http://stackoverflow.com/questions/14797930/python-custom-iterator-close-a-file-on-stopiteration
> >>>
> >>> Could you explain why the accepted answer there doesn't address this
> >>> issue?
> >>>
> >>> class Parse(object):
> >>>     """A generator that iterates through a file"""
> >>>     def __init__(self, path):
> >>>         self.path = path
> >>>
> >>>   def __iter__(self):
> >>>         with open(self.path) as f:
> >>>             yield from f
>
> BTW it may make this easier to read if we notice that it's essentially
> a verbose way of writing:
>
> def parse(path):
>     with open(path) as f:
>         yield from f
>
> >>
> >> I think the difference is that this new approach guarantees cleanup the
> >> exact moment the loop ends, no matter how it ends.
> >>
> >> If I understand correctly, your approach will do cleanup when the loop
> >> ends only if the iterator is exhausted.  But if someone zips it with a
> >> shorter iterator, uses itertools.islice or something similar, breaks the
> >> loop, returns inside the loop, or in some other way ends the loop
> before the
> >> iterator is exhausted, the cleanup won't happen when the iterator is
> garbage
> >> collected.  And for non-reference-counting python implementations, when
> this
> >> happens is completely unpredictable.
> >>
> >> --
> >
> >
> > I don't see that.  The "cleanup" will happen when collection is
> interrupted
> > by an exception.  This has nothing to do with garbage collection either
> > since the cleanup happens deterministically when the block is ended.  If
> > this is the only example, then I would say this behavior is already
> provided
> > and does not need to be added.
>
> I think there might be a misunderstanding here. Consider code like
> this, that breaks out from the middle of the for loop:
>
> def use_that_generator():
>     for line in parse(...):
>         if found_the_line_we_want(line):
>             break
>     # -- mark --
>     do_something_with_that_line(line)
>
> With current Python, what will happen is that when we reach the marked
> line, then the for loop has finished and will drop its reference to
> the generator object. At this point, the garbage collector comes into
> play. On CPython, with its reference counting collector, the garbage
> collector will immediately collect the generator object, and then the
> generator object's __del__ method will restart 'parse' by having the
> last 'yield' raise a GeneratorExit, and *that* exception will trigger
> the 'with' block's cleanup. But in order to get there, we're
> absolutely depending on the garbage collector to inject that
> GeneratorExit. And on an implementation like PyPy that doesn't use
> reference counting, the generator object will become collect*ible* at
> the marked line, but might not actually be collect*ed* for an
> arbitrarily long time afterwards. And until it's collected, the file
> will remain open. 'with' blocks guarantee that the resources they hold
> will be cleaned up promptly when the enclosing stack frame gets
> cleaned up, but for a 'with' block inside a generator then you still
> need something to guarantee that the enclosing stack frame gets
> cleaned up promptly!
>

Yes, I understand that.  Maybe this is clearer.  This class adds an
iterclose to any iterator so that when iteration ends, iterclose is
automatically called:

def my_iterclose():
    print("Closing!")


class AddIterclose:

    def __init__(self, iterable, iterclose):
        self.iterable = iterable
        self.iterclose = iterclose

    def __iter__(self):
        try:
            for x in self.iterable:
                yield x
        finally:
            self.iterclose()


try:
    for x in AddIterclose(range(10), my_iterclose):
        print(x)
        if x == 5:
            raise ValueError
except:
    pass



>
> This proposal is about providing that thing -- with __(a)iterclose__,
> the end of the for loop immediately closes the generator object, so
> the garbage collector doesn't need to get involved.
>
> Essentially the same thing happens if we replace the 'break' with a
> 'raise'. Though with exceptions, things can actually get even messier,
> even on CPython. Here's a similar example except that (a) it exits
> early due to an exception (which then gets caught elsewhere), and (b)
> the invocation of the generator function ended up being kind of long,
> so I split the for loop into two lines with a temporary variable:
>
> def use_that_generator2():
>     it =
> parse("/a/really/really/really/really/really/really/really/long/path")
>     for line in it:
>         if not valid_format(line):
>             raise ValueError()
>
> def catch_the_exception():
>     try:
>         use_that_generator2()
>     except ValueError:
>         # -- mark --
>         ...
>
> Here the ValueError() is raised from use_that_generator2(), and then
> caught in catch_the_exception(). At the marked line,
> use_that_generator2's stack frame is still pinned in memory by the
> exception's traceback. And that means that all the local variables are
> also pinned in memory, including our temporary 'it'. Which means that
> parse's stack frame is also pinned in memory, and the file is not
> closed.
>
> With the __(a)iterclose__ proposal, when the exception is thrown then
> the 'for' loop in use_that_generator2() immediately closes the
> generator object, which in turn triggers parse's 'with' block, and
> that closes the file handle. And then after the file handle is closed,
> the exception continues propagating. So at the marked line, it's still
> the case that 'it' will be pinned in memory, but now 'it' is a closed
> generator object that has already relinquished its resources.
>
> -n
>
> --
> Nathaniel J. Smith -- https://vorpus.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20161019/28344ac0/attachment-0001.html>


More information about the Python-ideas mailing list