[Python-ideas] Deterministic iterator cleanup

Nick Coghlan ncoghlan at gmail.com
Sat Oct 22 12:02:50 EDT 2016


On 20 October 2016 at 07:02, Nathaniel Smith <njs at pobox.com> wrote:
> The first change is to replace the outer for loop with a while/pop
> loop, so that if an exception occurs we'll know which iterables remain
> to be processed:
>
> def chain(*iterables):
>     try:
>         while iterables:
>             for element in iterables.pop(0):
>                 yield element
>     ...
>
> Now, what do we do if an exception does occur? We need to call
> iterclose on all of the remaining iterables, but the tricky bit is
> that this might itself raise new exceptions. If this happens, we don't
> want to abort early; instead, we want to continue until we've closed
> all the iterables, and then raise a chained exception. Basically what
> we want is:
>
> def chain(*iterables):
>     try:
>         while iterables:
>             for element in iterables.pop(0):
>                 yield element
>     finally:
>         try:
>             operators.iterclose(iter(iterables[0]))
>         finally:
>             try:
>                 operators.iterclose(iter(iterables[1]))
>             finally:
>                 try:
>                     operators.iterclose(iter(iterables[2]))
>                 finally:
>                     ...
>
> but of course that's not valid syntax. Fortunately, it's not too hard
> to rewrite that into real Python -- but it's a little dense:
>
> def chain(*iterables):
>     try:
>         while iterables:
>             for element in iterables.pop(0):
>                 yield element
>     # This is equivalent to the nested-finally chain above:
>     except BaseException as last_exc:
>         for iterable in iterables:
>             try:
>                 operators.iterclose(iter(iterable))
>             except BaseException as new_exc:
>                 if new_exc.__context__ is None:
>                     new_exc.__context__ = last_exc
>                 last_exc = new_exc
>         raise last_exc
>
> It's probably worth wrapping that bottom part into an iterclose_all()
> helper, since the pattern probably occurs in other cases as well.
> (Actually, now that I think about it, the map() example in the text
> should be doing this instead of what it's currently doing... I'll fix
> that.)

At this point your code is starting to look a whole lot like the code
in contextlib.ExitStack.__exit__ :)

Accordingly, I'm going to suggest that while I agree the problem you
describe is one that genuinely emerges in large production
applications and other complex systems, this particular solution is
simply far too intrusive to be accepted as a language change for
Python - you're talking a fundamental change to the meaning of
iteration for the sake of the relatively small portion of the
community that either work on such complex services, or insist on
writing their code as if it might become part of such a service, even
when it currently isn't. Given that simple applications vastly
outnumber complex ones, and always will, I think making such a change
would be a bad trade-off that didn't come close to justifying the
costs imposed on the rest of the ecosystem to adjust to it.

A potentially more fruitful direction of research to pursue for 3.7
would be the notion of "frame local resources", where each Python
level execution frame implicitly provided a lazily instantiated
ExitStack instance (or an equivalent) for resource management.
Assuming that it offered an "enter_frame_context" function that mapped
to "contextlib.ExitStack.enter_context", such a system would let us do
things like:

    from frame_resources import enter_frame_context

    def readlines_1(fname):
        return enter_frame_context(open(fname)).readlines()

    def readlines_2(fname):
        return [*enter_frame_context(open(fname))]

    def readlines_3(fname):
        return [line for line in enter_frame_context(open(fname))]

    def iterlines_1(fname):
        yield from enter_frame_context(open(fname))

    def iterlines_2(fname):
        for line in enter_frame_context(open(fname)):
            yield line

    def iterlines_3(fname):
        f = enter_frame_context(open(fname))
        while True:
            try:
                yield next(f)
            except StopIteration:
                pass

to indicate "clean up this file handle when this frame terminates,
regardless of the GC implementation used by the interpreter". Such a
feature already gets you a long way towards the determinism you want,
as frames are already likely to be cleaned up deterministically even
in Python implementations that don't use automatic reference counting
- the bit that's non-deterministic is cleaning up the local variables
referenced *from* those frames.

And then further down the track, once such a system had proven its
utility, *then* we could talk about expanding the iteration protocol
to allow for implicit registration of iterable cleanup functions as
frame local resources. With the cleanup functions not firing until the
*frame* exits, then the backwards compatibility break would be
substantially reduced (for __main__ module code there'd essentially be
no compatibility break at all, and similarly for CPython local
variables), and the level of impact on language implementations would
also be much lower (reduced to supporting the registration of cleanup
functions with frame objects, and executing those cleanup functions
when the frame terminates)

Regards,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Python-ideas mailing list