Ohhh, sorry, you want __iterclose__ to happen when iteration is terminated by a break statement as well?   Okay, I understand, and that's fair.

However, I would rather that people be explicit about when they're iterating (use the iteration protocol) and when they're managing a resource (use a context manager).  Trying to figure out where the context manager should go automatically (which is what it sounds like the proposal amounts to) is too difficult to get right, and when you get it wrong you close too early, and then what's the user supposed to do?  Suppress the early close with an even more convoluted notation?

If there is a problem with people iterating over things without a generator, my suggestion is to force them to use the generator.  For example, don't make your object iterable: make the value yielded by the context manager iterable.

Best,

Neil

(On preview, Re: Chris Angelico's refactoring of my code, nice!!)

On Wednesday, October 19, 2016 at 4:14:32 PM UTC-4, Neil Girdhar wrote:


On Wed, Oct 19, 2016 at 2:11 PM Nathaniel Smith <njs@pobox.com> wrote:
On Wed, Oct 19, 2016 at 10:08 AM, Neil Girdhar <mistersheik@gmail.com> wrote:
>
>
> On Wed, Oct 19, 2016 at 11:08 AM Todd <toddrjen@gmail.com> wrote:
>>
>> On Wed, Oct 19, 2016 at 3:38 AM, Neil Girdhar <mistersheik@gmail.com>
>> wrote:
>>>
>>> This is a very interesting proposal.  I just wanted to share something I
>>> found in my quick search:
>>>
>>>
>>> http://stackoverflow.com/questions/14797930/python-custom-iterator-close-a-file-on-stopiteration
>>>
>>> Could you explain why the accepted answer there doesn't address this
>>> issue?
>>>
>>> class Parse(object):
>>>     """A generator that iterates through a file"""
>>>     def __init__(self, path):
>>>         self.path = path
>>>
>>>   def __iter__(self):
>>>         with open(self.path) as f:
>>>             yield from f

BTW it may make this easier to read if we notice that it's essentially
a verbose way of writing:

def parse(path):
    with open(path) as f:
        yield from f

>>
>> I think the difference is that this new approach guarantees cleanup the
>> exact moment the loop ends, no matter how it ends.
>>
>> If I understand correctly, your approach will do cleanup when the loop
>> ends only if the iterator is exhausted.  But if someone zips it with a
>> shorter iterator, uses itertools.islice or something similar, breaks the
>> loop, returns inside the loop, or in some other way ends the loop before the
>> iterator is exhausted, the cleanup won't happen when the iterator is garbage
>> collected.  And for non-reference-counting python implementations, when this
>> happens is completely unpredictable.
>>
>> --
>
>
> I don't see that.  The "cleanup" will happen when collection is interrupted
> by an exception.  This has nothing to do with garbage collection either
> since the cleanup happens deterministically when the block is ended.  If
> this is the only example, then I would say this behavior is already provided
> and does not need to be added.

I think there might be a misunderstanding here. Consider code like
this, that breaks out from the middle of the for loop:

def use_that_generator():
    for line in parse(...):
        if found_the_line_we_want(line):
            break
    # -- mark --
    do_something_with_that_line(line)

With current Python, what will happen is that when we reach the marked
line, then the for loop has finished and will drop its reference to
the generator object. At this point, the garbage collector comes into
play. On CPython, with its reference counting collector, the garbage
collector will immediately collect the generator object, and then the
generator object's __del__ method will restart 'parse' by having the
last 'yield' raise a GeneratorExit, and *that* exception will trigger
the 'with' block's cleanup. But in order to get there, we're
absolutely depending on the garbage collector to inject that
GeneratorExit. And on an implementation like PyPy that doesn't use
reference counting, the generator object will become collect*ible* at
the marked line, but might not actually be collect*ed* for an
arbitrarily long time afterwards. And until it's collected, the file
will remain open. 'with' blocks guarantee that the resources they hold
will be cleaned up promptly when the enclosing stack frame gets
cleaned up, but for a 'with' block inside a generator then you still
need something to guarantee that the enclosing stack frame gets
cleaned up promptly!

Yes, I understand that.  Maybe this is clearer.  This class adds an iterclose to any iterator so that when iteration ends, iterclose is automatically called:

def my_iterclose():
    print("Closing!")


class AddIterclose:

    def __init__(self, iterable, iterclose):
        self.iterable = iterable
        self.iterclose = iterclose

    def __iter__(self):
        try:
            for x in self.iterable:
                yield x
        finally:
            self.iterclose()


try:
    for x in AddIterclose(range(10), my_iterclose):
        print(x)
        if x == 5:
            raise ValueError
except:
    pass

 

This proposal is about providing that thing -- with __(a)iterclose__,
the end of the for loop immediately closes the generator object, so
the garbage collector doesn't need to get involved.

Essentially the same thing happens if we replace the 'break' with a
'raise'. Though with exceptions, things can actually get even messier,
even on CPython. Here's a similar example except that (a) it exits
early due to an exception (which then gets caught elsewhere), and (b)
the invocation of the generator function ended up being kind of long,
so I split the for loop into two lines with a temporary variable:

def use_that_generator2():
    it = parse("/a/really/really/really/really/really/really/really/long/path")
    for line in it:
        if not valid_format(line):
            raise ValueError()

def catch_the_exception():
    try:
        use_that_generator2()
    except ValueError:
        # -- mark --
        ...

Here the ValueError() is raised from use_that_generator2(), and then
caught in catch_the_exception(). At the marked line,
use_that_generator2's stack frame is still pinned in memory by the
exception's traceback. And that means that all the local variables are
also pinned in memory, including our temporary 'it'. Which means that
parse's stack frame is also pinned in memory, and the file is not
closed.

With the __(a)iterclose__ proposal, when the exception is thrown then
the 'for' loop in use_that_generator2() immediately closes the
generator object, which in turn triggers parse's 'with' block, and
that closes the file handle. And then after the file handle is closed,
the exception continues propagating. So at the marked line, it's still
the case that 'it' will be pinned in memory, but now 'it' is a closed
generator object that has already relinquished its resources.

-n

--
Nathaniel J. Smith -- https://vorpus.org