<div dir="ltr"><div>NOTE: This is my first post to this mailing list, I'm not really sure<br>      how to post a message, so I'm attempting a reply-all.<br><br>I like Nathaniel's idea for __iterclose__.<br><br>I suggest the following changes to deal with a few of the complex issues<br>he discussed.<br><br>1.  Missing __iterclose__, or a value of none, works as before,<br>    no changes.<br><br>2.  An iterator can be used in one of three ways:<br><br>    A. 'for' loop, which will call __iterclose__ when it exits<br><br>    B.  User controlled, in which case the user is responsible to use the<br>        iterator inside a with statement.<br><br>    C.  Old style.  The user is responsible for calling __iterclose__<br><br>3.  An iterator keeps track of __iter__ calls, this allows it to know<br>    when to cleanup.<br><br><br>The two key additions, above, are:<br><br>    #2B. User can use iterator with __enter__ & __exit cleanly.<br><br>    #3.  By tracking __iter__ calls, it makes complex user cases easier<br>         to handle.<br><br>Specification<br>=============<br><br>An iterator may implement the following method: __iterclose__.  A missing<br>method, or a value of None is allowed.<br><br>When the user wants to control the iterator, the user is expected to<br>use the iterator with a with clause.<br><br>The core proposal is the change in behavior of ``for`` loops. Given this<br>Python code:<br><br>  for VAR in ITERABLE:<br>      LOOP-BODY<br>  else:<br>      ELSE-BODY<br><br>we desugar to the equivalent of:<br><br>  _iter = iter(ITERABLE)<br>  _iterclose = getattr(_iter, '__iterclose__', None)<br><br>  if _iterclose is none:<br>      traditional-for VAR in _iter:<br>         LOOP-BODY<br>      else:<br>         ELSE-BODY<br>  else:<br>     _stop_exception_seen = False try:<br>         traditional-for VAR in _iter:<br>             LOOP-BODY<br>         else:<br>             _stop_exception_seen = True<br>             ELSE-BODY<br>     finally:<br>        if not _stop_exception_seen:<br>            _iterclose(_iter)<br><br>The test for 'none' allows us to skip the setup of a try/finally clause.<br><br>Also we don't bother to call __iterclose__ if the iterator threw<br>StopException at us.<br><br>Modifications to basic iterator types<br>=====================================<br><br>An iterator will implement something like the following:<br><br>  _cleanup       - Private funtion, does the following:<br><br>                        _enter_count = _itercount = -1<br><br>                        Do any neccessary cleanup, release resources, etc.<br><br>                   NOTE: Is also called internally by the iterator,<br>                   before throwing StopIterator<br><br>  _iter_count    - Private value, starts at 0.<br><br>  _enter_count   - Private value, starts at 0.<br><br>  __iter__       - if _iter_count >= 0:<br>                       _iter_count += 1<br><br>                   return self<br><br>  __iterclose__  - if _iter_count is 0:<br>                       if _enter_count is 0:<br>                           _cleanup()<br>                   elif _iter_count > 0:<br>                       _iter_count -= 1<br><br>  __enter__      - if _enter_count >= 0:<br>                       _enter_count += 1<br><br>                   Return itself.<br><br>  __exit__       - if _enter_count is > 0<br>                       _enter_count -= 1<br><br>                       if _enter_count is _iter_count is 0:<br>                            _cleanup()<br><br>The suggetions on _iter_count & _enter_count are just example; internal<br>details can differ (and better error handling).<br><br><br>Examples:<br>=========<br><br>NOTE: Example are givin using xrange() or [1, 2, 3, 4, 5, 6, 7] for<br>      simplicity.  For real use, the iterator would have resources such<br>      as open files it needs to close on cleanup.<br><br><br>1.  Simple example:<br><br>        for v in xrange(7):<br>            print v<br><br>    Creates an iterator with a _usage_count of 0.  The iterator exits<br>    normally (by throwing StopException), we don't bother to call<br>    __iterclose__<br><br><br>2.  Break example:<br><br>        for v in [1, 2, 3, 4, 5, 6, 7]:<br>            print v<br><br>            if v == 3:<br>                break<br><br>    Creates an iterator with a _usage_count of 0.<br><br>    The iterator exists after generating 4 numbers, we then call<br>    __iterclose__ & the iterator does any necessary cleanup.<br><br>3.  Convert example #2 to print the next value:<br><br>        with iter([1, 2, 3, 4, 5, 6, 7]) as seven:<br>            for v in seven:<br>                print v<br><br>                if v == 3:<br>                    break<br><br>            print 'Next value is: ', seven.next()<br><br>    This will print:<br><br>            1<br>            2<br>            3<br>            Next value is: 4<br><br>    How this works:<br><br>        1.  We create an iterator named seven (by calling list.__iter__).<br><br>        2.  We call seven.__enter__<br><br>        3.  The for loop calls: seven.next() 3 times, and then calls:<br>            seven.__iterclose__<br><br>            Since the _enter_count is 1, the iterator does not do<br>            cleanup yet.<br><br>        4.  We call seven.next()<br><br>        5.  We call seven.__exit.  The iterator does its cleanup now.<br><br>4.  More complicated example:<br><br>        with iter([1, 2, 3, 4, 5, 6, 7]) as seven:<br>            for v in seven:<br>                print v<br><br>                if v == 1:<br>                    for v in seven:<br>                        print 'stolen: ', v<br><br>                        if v == 3:<br>                            break<br><br>                if v == 5:<br>                    break<br><br>            for v in seven:<br>                print v * v<br><br>    This will print:<br><br>        1<br>        stolen: 2<br>        stolen: 3<br>        4<br>        5<br>        36<br>        49<br><br>    How this works:<br><br>        1.  Same as #3 above, cleanup is done by the __exit__<br><br>5.  Alternate way of doing #4.<br><br>        seven = iter([1, 2, 3, 4, 5, 6, 7])<br><br>        for v in seven:<br>            print v<br><br>            if v == 1:<br>                for v in seven:<br>                    print 'stolen: ', v<br><br>                    if v == 3:<br>                        break<br><br>            if v == 5:<br>                break<br><br>        for v in seven:<br>            print v * v<br>            break           #   Different from #4<br><br>        seven.__iterclose__()<br><br>    This will print:<br><br>        1<br>        stolen: 2<br>        stolen: 3<br>        4<br>        5<br>        36<br><br></div>    How this works:<br><br>        1.  We create an iterator named seven.<br><br>        2.  The for loops all call seven.__iter__, causing _iter_count<br>            to increment.<br><br>        3.  The for loops all call seven.__iterclose__ on exit, decrement<br>            _iter_count.<br><br>        4.  The user calls the final __iterclose_, which close the<br>            iterator.<br><br>    NOTE:<br>        Method #5 is NOT recommended, the 'with' syntax is better.<br><br>        However, something like itertools.zip could call __iterclose__<br>        during cleanup<br><br><br>Change to iterators<br>===================<br><br>All python iterators would need to add __iterclose__ (possibly with a<br>value of None), __enter__, & __exit__.<br><br>Third party iterators that do not implenent __iterclose__ cannot be<br>used in a with clause.  A new function could be added to itertools,<br>something like:<br><br>    with itertools.with_wrapper(third_party_iterator) as x:<br>        ...<br><br>The 'with_wrapper' would attempt to call __iterclose__ when its __exit__<br>function is called.<br></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Oct 19, 2016 at 12:38 AM, Nathaniel Smith <span dir="ltr"><<a href="mailto:njs@pobox.com" target="_blank">njs@pobox.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi all,<br>

<br>

I'd like to propose that Python's iterator protocol be enhanced to add<br>

a first-class notion of completion / cleanup.<br>

<br>

This is mostly motivated by thinking about the issues around async<br>

generators and cleanup. Unfortunately even though PEP 525 was accepted<br>

I found myself unable to stop pondering this, and the more I've<br>

pondered the more convinced I've become that the GC hooks added in PEP<br>

525 are really not enough, and that we'll regret it if we stick with<br>

them, or at least with them alone :-/. The strategy here is pretty<br>

different -- it's an attempt to dig down and make a fundamental<br>

improvement to the language that fixes a number of long-standing rough<br>

spots, including async generators.<br>

<br>

The basic concept is relatively simple: just adding a '__iterclose__'<br>

method that 'for' loops call upon completion, even if that's via break<br>

or exception. But, the overall issue is fairly complicated + iterators<br>

have a large surface area across the language, so the text below is<br>

pretty long. Mostly I wrote it all out to convince myself that there<br>

wasn't some weird showstopper lurking somewhere :-). For a first pass<br>

discussion, it probably makes sense to mainly focus on whether the<br>

basic concept makes sense? The main rationale is at the top, but the<br>

details are there too for those who want them.<br>

<br>

Also, for *right* now I'm hoping -- probably unreasonably -- to try to<br>

get the async iterator parts of the proposal in ASAP, ideally for<br>

3.6.0 or 3.6.1. (I know this is about the worst timing for a proposal<br>

like this, which I apologize for -- though async generators are<br>

provisional in 3.6, so at least in theory changing them is not out of<br>

the question.) So again, it might make sense to focus especially on<br>

the async parts, which are a pretty small and self-contained part, and<br>

treat the rest of the proposal as a longer-term plan provided for<br>

context. The comparison to PEP 525 GC hooks comes right after the<br>

initial rationale.<br>

<br>

Anyway, I'll be interested to hear what you think!<br>

<br>

-n<br>

<br>

------------------<br>

<br>

Abstract<br>

========<br>

<br>

We propose to extend the iterator protocol with a new<br>

``__(a)iterclose__`` slot, which is called automatically on exit from<br>

``(async) for`` loops, regardless of how they exit. This allows for<br>

convenient, deterministic cleanup of resources held by iterators<br>

without reliance on the garbage collector. This is especially valuable<br>

for asynchronous generators.<br>

<br>

<br>

Note on timing<br>

==============<br>

<br>

In practical terms, the proposal here is divided into two separate<br>

parts: the handling of async iterators, which should ideally be<br>

implemented ASAP, and the handling of regular iterators, which is a<br>

larger but more relaxed project that can't start until 3.7 at the<br>

earliest. But since the changes are closely related, and we probably<br>

don't want to end up with async iterators and regular iterators<br>

diverging in the long run, it seems useful to look at them together.<br>

<br>

<br>

Background and motivation<br>

=========================<br>

<br>

Python iterables often hold resources which require cleanup. For<br>

example: ``file`` objects need to be closed; the `WSGI spec<br>

<<a href="https://www.python.org/dev/peps/pep-0333/" rel="noreferrer" target="_blank">https://www.python.org/dev/<wbr>peps/pep-0333/</a>>`_ adds a ``close`` method<br>

on top of the regular iterator protocol and demands that consumers<br>

call it at the appropriate time (though forgetting to do so is a<br>

`frequent source of bugs<br>

<<a href="http://blog.dscpl.com.au/2012/10/obligations-for-calling-close-on.html" rel="noreferrer" target="_blank">http://blog.dscpl.com.au/<wbr>2012/10/obligations-for-<wbr>calling-close-on.html</a>>`_);<br>

and PEP 342 (based on PEP 325) extended generator objects to add a<br>

``close`` method to allow generators to clean up after themselves.<br>

<br>

Generally, objects that need to clean up after themselves also define<br>

a ``__del__`` method to ensure that this cleanup will happen<br>

eventually, when the object is garbage collected. However, relying on<br>

the garbage collector for cleanup like this causes serious problems in<br>

at least two cases:<br>

<br>

- In Python implementations that do not use reference counting (e.g.<br>

PyPy, Jython), calls to ``__del__`` may be arbitrarily delayed -- yet<br>

many situations require *prompt* cleanup of resources. Delayed cleanup<br>

produces problems like crashes due to file descriptor exhaustion, or<br>

WSGI timing middleware that collects bogus times.<br>

<br>

- Async generators (PEP 525) can only perform cleanup under the<br>

supervision of the appropriate coroutine runner. ``__del__`` doesn't<br>

have access to the coroutine runner; indeed, the coroutine runner<br>

might be garbage collected before the generator object. So relying on<br>

the garbage collector is effectively impossible without some kind of<br>

language extension. (PEP 525 does provide such an extension, but it<br>

has a number of limitations that this proposal fixes; see the<br>

"alternatives" section below for discussion.)<br>

<br>

Fortunately, Python provides a standard tool for doing resource<br>

cleanup in a more structured way: ``with`` blocks. For example, this<br>

code opens a file but relies on the garbage collector to close it::<br>

<br>

  def read_newline_separated_json(<wbr>path):<br>

      for line in open(path):<br>

          yield json.loads(line)<br>

<br>

  for document in read_newline_separated_json(<wbr>path):<br>

      ...<br>

<br>

and recent versions of CPython will point this out by issuing a<br>

``ResourceWarning``, nudging us to fix it by adding a ``with`` block::<br>

<br>

  def read_newline_separated_json(<wbr>path):<br>

      with open(path) as file_handle:      # <-- with block<br>

          for line in file_handle:<br>

              yield json.loads(line)<br>

<br>

  for document in read_newline_separated_json(<wbr>path):  # <-- outer for loop<br>

      ...<br>

<br>

But there's a subtlety here, caused by the interaction of ``with``<br>

blocks and generators. ``with`` blocks are Python's main tool for<br>

managing cleanup, and they're a powerful one, because they pin the<br>

lifetime of a resource to the lifetime of a stack frame. But this<br>

assumes that someone will take care of cleaning up the stack frame...<br>

and for generators, this requires that someone ``close`` them.<br>

<br>

In this case, adding the ``with`` block *is* enough to shut up the<br>

``ResourceWarning``, but this is misleading -- the file object cleanup<br>

here is still dependent on the garbage collector. The ``with`` block<br>

will only be unwound when the ``read_newline_separated_json`<wbr>`<br>

generator is closed. If the outer ``for`` loop runs to completion then<br>

the cleanup will happen immediately; but if this loop is terminated<br>

early by a ``break`` or an exception, then the ``with`` block won't<br>

fire until the generator object is garbage collected.<br>

<br>

The correct solution requires that all *users* of this API wrap every<br>

``for`` loop in its own ``with`` block::<br>

<br>

  with closing(read_newline_<wbr>separated_json(path)) as genobj:<br>

      for document in genobj:<br>

          ...<br>

<br>

This gets even worse if we consider the idiom of decomposing a complex<br>

pipeline into multiple nested generators::<br>

<br>

  def read_users(path):<br>

      with closing(read_newline_<wbr>separated_json(path)) as gen:<br>

          for document in gen:<br>

              yield User.from_json(document)<br>

<br>

  def users_in_group(path, group):<br>

      with closing(read_users(path)) as gen:<br>

          for user in gen:<br>

              if user.group == group:<br>

                  yield user<br>

<br>

In general if you have N nested generators then you need N+1 ``with``<br>

blocks to clean up 1 file. And good defensive programming would<br>

suggest that any time we use a generator, we should assume the<br>

possibility that there could be at least one ``with`` block somewhere<br>

in its (potentially transitive) call stack, either now or in the<br>

future, and thus always wrap it in a ``with``. But in practice,<br>

basically nobody does this, because programmers would rather write<br>

buggy code than tiresome repetitive code. In simple cases like this<br>

there are some workarounds that good Python developers know (e.g. in<br>

this simple case it would be idiomatic to pass in a file handle<br>

instead of a path and move the resource management to the top level),<br>

but in general we cannot avoid the use of ``with``/``finally`` inside<br>

of generators, and thus dealing with this problem one way or another.<br>

When beauty and correctness fight then beauty tends to win, so it's<br>

important to make correct code beautiful.<br>

<br>

Still, is this worth fixing? Until async generators came along I would<br>

have argued yes, but that it was a low priority, since everyone seems<br>

to be muddling along okay -- but async generators make it much more<br>

urgent. Async generators cannot do cleanup *at all* without some<br>

mechanism for deterministic cleanup that people will actually use, and<br>

async generators are particularly likely to hold resources like file<br>

descriptors. (After all, if they weren't doing I/O, they'd be<br>

generators, not async generators.) So we have to do something, and it<br>

might as well be a comprehensive fix to the underlying problem. And<br>

it's much easier to fix this now when async generators are first<br>

rolling out, then it will be to fix it later.<br>

<br>

The proposal itself is simple in concept: add a ``__(a)iterclose__``<br>

method to the iterator protocol, and have (async) ``for`` loops call<br>

it when the loop is exited, even if this occurs via ``break`` or<br>

exception unwinding. Effectively, we're taking the current cumbersome<br>

idiom (``with`` block + ``for`` loop) and merging them together into a<br>

fancier ``for``. This may seem non-orthogonal, but makes sense when<br>

you consider that the existence of generators means that ``with``<br>

blocks actually depend on iterator cleanup to work reliably, plus<br>

experience showing that iterator cleanup is often a desireable feature<br>

in its own right.<br>

<br>

<br>

Alternatives<br>

============<br>

<br>

PEP 525 asyncgen hooks<br>

----------------------<br>

<br>

PEP 525 proposes a `set of global thread-local hooks managed by new<br>

``sys.{get/set}_asyncgen_<wbr>hooks()`` functions<br>

<<a href="https://www.python.org/dev/peps/pep-0525/#finalization" rel="noreferrer" target="_blank">https://www.python.org/dev/<wbr>peps/pep-0525/#finalization</a>>`_<wbr>, which<br>

allow event loops to integrate with the garbage collector to run<br>

cleanup for async generators. In principle, this proposal and PEP 525<br>

are complementary, in the same way that ``with`` blocks and<br>

``__del__`` are complementary: this proposal takes care of ensuring<br>

deterministic cleanup in most cases, while PEP 525's GC hooks clean up<br>

anything that gets missed. But ``__aiterclose__`` provides a number of<br>

advantages over GC hooks alone:<br>

<br>

- The GC hook semantics aren't part of the abstract async iterator<br>

protocol, but are instead restricted `specifically to the async<br>

generator concrete type <XX find and link Yury's email saying this>`_.<br>

If you have an async iterator implemented using a class, like::<br>

<br>

    class MyAsyncIterator:<br>

        async def __anext__():<br>

            ...<br>

<br>

  then you can't refactor this into an async generator without<br>

changing its semantics, and vice-versa. This seems very unpythonic.<br>

(It also leaves open the question of what exactly class-based async<br>

iterators are supposed to do, given that they face exactly the same<br>

cleanup problems as async generators.) ``__aiterclose__``, on the<br>

other hand, is defined at the protocol level, so it's duck-type<br>

friendly and works for all iterators, not just generators.<br>

<br>

- Code that wants to work on non-CPython implementations like PyPy<br>

cannot in general rely on GC for cleanup. Without ``__aiterclose__``,<br>

it's more or less guaranteed that developers who develop and test on<br>

CPython will produce libraries that leak resources when used on PyPy.<br>

Developers who do want to target alternative implementations will<br>

either have to take the defensive approach of wrapping every ``for``<br>

loop in a ``with`` block, or else carefully audit their code to figure<br>

out which generators might possibly contain cleanup code and add<br>

``with`` blocks around those only. With ``__aiterclose__``, writing<br>

portable code becomes easy and natural.<br>

<br>

- An important part of building robust software is making sure that<br>

exceptions always propagate correctly without being lost. One of the<br>

most exciting things about async/await compared to traditional<br>

callback-based systems is that instead of requiring manual chaining,<br>

the runtime can now do the heavy lifting of propagating errors, making<br>

it *much* easier to write robust code. But, this beautiful new picture<br>

has one major gap: if we rely on the GC for generator cleanup, then<br>

exceptions raised during cleanup are lost. So, again, with<br>

``__aiterclose__``, developers who care about this kind of robustness<br>

will either have to take the defensive approach of wrapping every<br>

``for`` loop in a ``with`` block, or else carefully audit their code<br>

to figure out which generators might possibly contain cleanup code.<br>

``__aiterclose__`` plugs this hole by performing cleanup in the<br>

caller's context, so writing more robust code becomes the path of<br>

least resistance.<br>

<br>

- The WSGI experience suggests that there exist important<br>

iterator-based APIs that need prompt cleanup and cannot rely on the<br>

GC, even in CPython. For example, consider a hypothetical WSGI-like<br>

API based around async/await and async iterators, where a response<br>

handler is an async generator that takes request headers + an async<br>

iterator over the request body, and yields response headers + the<br>

response body. (This is actually the use case that got me interested<br>

in async generators in the first place, i.e. this isn't hypothetical.)<br>

If we follow WSGI in requiring that child iterators must be closed<br>

properly, then without ``__aiterclose__`` the absolute most<br>

minimalistic middleware in our system looks something like::<br>

<br>

    async def noop_middleware(handler, request_header, request_body):<br>

        async with aclosing(handler(request_body, request_body)) as aiter:<br>

            async for response_item in aiter:<br>

                yield response_item<br>

<br>

  Arguably in regular code one can get away with skipping the ``with``<br>

block around ``for`` loops, depending on how confident one is that one<br>

understands the internal implementation of the generator. But here we<br>

have to cope with arbitrary response handlers, so without<br>

``__aiterclose__``, this ``with`` construction is a mandatory part of<br>

every middleware.<br>

<br>

  ``__aiterclose__`` allows us to eliminate the mandatory boilerplate<br>

and an extra level of indentation from every middleware::<br>

<br>

    async def noop_middleware(handler, request_header, request_body):<br>

        async for response_item in handler(request_header, request_body):<br>

            yield response_item<br>

<br>

So the ``__aiterclose__`` approach provides substantial advantages<br>

over GC hooks.<br>

<br>

This leaves open the question of whether we want a combination of GC<br>

hooks + ``__aiterclose__``, or just ``__aiterclose__`` alone. Since<br>

the vast majority of generators are iterated over using a ``for`` loop<br>

or equivalent, ``__aiterclose__`` handles most situations before the<br>

GC has a chance to get involved. The case where GC hooks provide<br>

additional value is in code that does manual iteration, e.g.::<br>

<br>

    agen = fetch_newline_separated_json_<wbr>from_url(...)<br>

    while True:<br>

        document = await type(agen).__anext__(agen)<br>

        if document["id"] == needle:<br>

            break<br>

    # doesn't do 'await agen.aclose()'<br>

<br>

If we go with the GC-hooks + ``__aiterclose__`` approach, this<br>

generator will eventually be cleaned up by GC calling the generator<br>

``__del__`` method, which then will use the hooks to call back into<br>

the event loop to run the cleanup code.<br>

<br>

If we go with the no-GC-hooks approach, this generator will eventually<br>

be garbage collected, with the following effects:<br>

<br>

- its ``__del__`` method will issue a warning that the generator was<br>

not closed (similar to the existing "coroutine never awaited"<br>

warning).<br>

<br>

- The underlying resources involved will still be cleaned up, because<br>

the generator frame will still be garbage collected, causing it to<br>

drop references to any file handles or sockets it holds, and then<br>

those objects's ``__del__`` methods will release the actual operating<br>

system resources.<br>

<br>

- But, any cleanup code inside the generator itself (e.g. logging,<br>

buffer flushing) will not get a chance to run.<br>

<br>

The solution here -- as the warning would indicate -- is to fix the<br>

code so that it calls ``__aiterclose__``, e.g. by using a ``with``<br>

block::<br>

<br>

    async with aclosing(fetch_newline_<wbr>separated_json_from_url(...)) as agen:<br>

        while True:<br>

            document = await type(agen).__anext__(agen)<br>

            if document["id"] == needle:<br>

                break<br>

<br>

Basically in this approach, the rule would be that if you want to<br>

manually implement the iterator protocol, then it's your<br>

responsibility to implement all of it, and that now includes<br>

``__(a)iterclose__``.<br>

<br>

GC hooks add non-trivial complexity in the form of (a) new global<br>

interpreter state, (b) a somewhat complicated control flow (e.g.,<br>

async generator GC always involves resurrection, so the details of PEP<br>

442 are important), and (c) a new public API in asyncio (``await<br>

loop.shutdown_asyncgens()``) that users have to remember to call at<br>

the appropriate time. (This last point in particular somewhat<br>

undermines the argument that GC hooks provide a safe backup to<br>

guarantee cleanup, since if ``shutdown_asyncgens()`` isn't called<br>

correctly then I *think* it's possible for generators to be silently<br>

discarded without their cleanup code being called; compare this to the<br>

``__aiterclose__``-only approach where in the worst case we still at<br>

least get a warning printed. This might be fixable.) All this<br>

considered, GC hooks arguably aren't worth it, given that the only<br>

people they help are those who want to manually call ``__anext__`` yet<br>

don't want to manually call ``__aiterclose__``. But Yury disagrees<br>

with me on this :-). And both options are viable.<br>

<br>

<br>

Always inject resources, and do all cleanup at the top level<br>

------------------------------<wbr>------------------------------<br>

<br>

It was suggested on python-dev (XX find link) that a pattern to avoid<br>

these problems is to always pass resources in from above, e.g.<br>

``read_newline_separated_json`<wbr>` should take a file object rather than<br>

a path, with cleanup handled at the top level::<br>

<br>

  def read_newline_separated_json(<wbr>file_handle):<br>

      for line in file_handle:<br>

          yield json.loads(line)<br>

<br>

  def read_users(file_handle):<br>

      for document in read_newline_separated_json(<wbr>file_handle):<br>

          yield User.from_json(document)<br>

<br>

  with open(path) as file_handle:<br>

      for user in read_users(file_handle):<br>

          ...<br>

<br>

This works well in simple cases; here it lets us avoid the "N+1<br>

``with`` blocks problem". But unfortunately, it breaks down quickly<br>

when things get more complex. Consider if instead of reading from a<br>

file, our generator was reading from a streaming HTTP GET request --<br>

while handling redirects and authentication via OAUTH. Then we'd<br>

really want the sockets to be managed down inside our HTTP client<br>

library, not at the top level. Plus there are other cases where<br>

``finally`` blocks embedded inside generators are important in their<br>

own right: db transaction management, emitting logging information<br>

during cleanup (one of the major motivating use cases for WSGI<br>

``close``), and so forth. So this is really a workaround for simple<br>

cases, not a general solution.<br>

<br>

<br>

More complex variants of __(a)iterclose__<br>

------------------------------<wbr>-----------<br>

<br>

The semantics of ``__(a)iterclose__`` are somewhat inspired by<br>

``with`` blocks, but context managers are more powerful:<br>

``__(a)exit__`` can distinguish between a normal exit versus exception<br>

unwinding, and in the case of an exception it can examine the<br>

exception details and optionally suppress propagation.<br>

``__(a)iterclose__`` as proposed here does not have these powers, but<br>

one can imagine an alternative design where it did.<br>

<br>

However, this seems like unwarranted complexity: experience suggests<br>

that it's common for iterables to have ``close`` methods, and even to<br>

have ``__exit__`` methods that call ``self.close()``, but I'm not<br>

aware of any common cases that make use of ``__exit__``'s full power.<br>

I also can't think of any examples where this would be useful. And it<br>

seems unnecessarily confusing to allow iterators to affect flow<br>

control by swallowing exceptions -- if you're in a situation where you<br>

really want that, then you should probably use a real ``with`` block<br>

anyway.<br>

<br>

<br>

Specification<br>

=============<br>

<br>

This section describes where we want to eventually end up, though<br>

there are some backwards compatibility issues that mean we can't jump<br>

directly here. A later section describes the transition plan.<br>

<br>

<br>

Guiding principles<br>

------------------<br>

<br>

Generally, ``__(a)iterclose__`` implementations should:<br>

<br>

- be idempotent,<br>

- perform any cleanup that is appropriate on the assumption that the<br>

iterator will not be used again after ``__(a)iterclose__`` is called.<br>

In particular, once ``__(a)iterclose__`` has been called then calling<br>

``__(a)next__`` produces undefined behavior.<br>

<br>

And generally, any code which starts iterating through an iterable<br>

with the intention of exhausting it, should arrange to make sure that<br>

``__(a)iterclose__`` is eventually called, whether or not the iterator<br>

is actually exhausted.<br>

<br>

<br>

Changes to iteration<br>

--------------------<br>

<br>

The core proposal is the change in behavior of ``for`` loops. Given<br>

this Python code::<br>

<br>

  for VAR in ITERABLE:<br>

      LOOP-BODY<br>

  else:<br>

      ELSE-BODY<br>

<br>

we desugar to the equivalent of::<br>

<br>

  _iter = iter(ITERABLE)<br>

  _iterclose = getattr(type(_iter), "__iterclose__", lambda: None)<br>

  try:<br>

      traditional-for VAR in _iter:<br>

          LOOP-BODY<br>

      else:<br>

          ELSE-BODY<br>

  finally:<br>

      _iterclose(_iter)<br>

<br>

where the "traditional-for statement" here is meant as a shorthand for<br>

the classic 3.5-and-earlier ``for`` loop semantics.<br>

<br>

Besides the top-level ``for`` statement, Python also contains several<br>

other places where iterators are consumed. For consistency, these<br>

should call ``__iterclose__`` as well using semantics equivalent to<br>

the above. This includes:<br>

<br>

- ``for`` loops inside comprehensions<br>

- ``*`` unpacking<br>

- functions which accept and fully consume iterables, like<br>

``list(it)``, ``tuple(it)``, ``itertools.product(it1, it2, ...)``, and<br>

others.<br>

<br>

<br>

Changes to async iteration<br>

--------------------------<br>

<br>

We also make the analogous changes to async iteration constructs,<br>

except that the new slot is called ``__aiterclose__``, and it's an<br>

async method that gets ``await``\ed.<br>

<br>

<br>

Modifications to basic iterator types<br>

------------------------------<wbr>-------<br>

<br>

Generator objects (including those created by generator comprehensions):<br>

- ``__iterclose__`` calls ``self.close()``<br>

- ``__del__`` calls ``self.close()`` (same as now), and additionally<br>

issues a ``ResourceWarning`` if the generator wasn't exhausted. This<br>

warning is hidden by default, but can be enabled for those who want to<br>

make sure they aren't inadverdantly relying on CPython-specific GC<br>

semantics.<br>

<br>

Async generator objects (including those created by async generator<br>

comprehensions):<br>

- ``__aiterclose__`` calls ``self.aclose()``<br>

- ``__del__`` issues a ``RuntimeWarning`` if ``aclose`` has not been<br>

called, since this probably indicates a latent bug, similar to the<br>

"coroutine never awaited" warning.<br>

<br>

QUESTION: should file objects implement ``__iterclose__`` to close the<br>

file? On the one hand this would make this change more disruptive; on<br>

the other hand people really like writing ``for line in open(...):<br>

...``, and if we get used to iterators taking care of their own<br>

cleanup then it might become very weird if files don't.<br>

<br>

<br>

New convenience functions<br>

-------------------------<br>

<br>

The ``itertools`` module gains a new iterator wrapper that can be used<br>

to selectively disable the new ``__iterclose__`` behavior::<br>

<br>

  # QUESTION: I feel like there might be a better name for this one?<br>

  class preserve(iterable):<br>

      def __init__(self, iterable):<br>

          self._it = iter(iterable)<br>

<br>

      def __iter__(self):<br>

          return self<br>

<br>

      def __next__(self):<br>

          return next(self._it)<br>

<br>

      def __iterclose__(self):<br>

          # Swallow __iterclose__ without passing it on<br>

          pass<br>

<br>

Example usage (assuming that file objects implements ``__iterclose__``)::<br>

<br>

  with open(...) as handle:<br>

      # Iterate through the same file twice:<br>

      for line in itertools.preserve(handle):<br>

          ...<br>

      handle.seek(0)<br>

      for line in itertools.preserve(handle):<br>

          ...<br>

<br>

The ``operator`` module gains two new functions, with semantics<br>

equivalent to the following::<br>

<br>

  def iterclose(it):<br>

      if hasattr(type(it), "__iterclose__"):<br>

          type(it).__iterclose__(it)<br>

<br>

  async def aiterclose(ait):<br>

      if hasattr(type(ait), "__aiterclose__"):<br>

          await type(ait).__aiterclose__(ait)<br>

<br>

These are particularly useful when implementing the changes in the next section:<br>

<br>

<br>

__iterclose__ implementations for iterator wrappers<br>

------------------------------<wbr>---------------------<br>

<br>

Python ships a number of iterator types that act as wrappers around<br>

other iterators: ``map``, ``zip``, ``itertools.accumulate``,<br>

``csv.reader``, and others. These iterators should define a<br>

``__iterclose__`` method which calls ``__iterclose__`` in turn on<br>

their underlying iterators. For example, ``map`` could be implemented<br>

as::<br>

<br>

  class map:<br>

      def __init__(self, fn, *iterables):<br>

          self._fn = fn<br>

          self._iters = [iter(iterable) for iterable in iterables]<br>

<br>

      def __iter__(self):<br>

          return self<br>

<br>

      def __next__(self):<br>

          return self._fn(*[next(it) for it in self._iters])<br>

<br>

      def __iterclose__(self):<br>

          for it in self._iters:<br>

              operator.iterclose(it)<br>

<br>

In some cases this requires some subtlety; for example,<br>

```itertools.tee``<br>

<<a href="https://docs.python.org/3/library/itertools.html#itertools.tee" rel="noreferrer" target="_blank">https://docs.python.org/3/<wbr>library/itertools.html#<wbr>itertools.tee</a>>`_<br>

should not call ``__iterclose__`` on the underlying iterator until it<br>

has been called on *all* of the clone iterators.<br>

<br>

<br>

Example / Rationale<br>

-------------------<br>

<br>

The payoff for all this is that we can now write straightforward code like::<br>

<br>

  def read_newline_separated_json(<wbr>path):<br>

      for line in open(path):<br>

          yield json.loads(line)<br>

<br>

and be confident that the file will receive deterministic cleanup<br>

*without the end-user having to take any special effort*, even in<br>

complex cases. For example, consider this silly pipeline::<br>

<br>

  list(map(lambda key: key.upper(),<br>

           doc["key"] for doc in read_newline_separated_json(<wbr>path)))<br>

<br>

If our file contains a document where ``doc["key"]`` turns out to be<br>

an integer, then the following sequence of events will happen:<br>

<br>

1. ``key.upper()`` raises an ``AttributeError``, which propagates out<br>

of the ``map`` and triggers the implicit ``finally`` block inside<br>

``list``.<br>

2. The ``finally`` block in ``list`` calls ``__iterclose__()`` on the<br>

map object.<br>

3. ``map.__iterclose__()`` calls ``__iterclose__()`` on the generator<br>

comprehension object.<br>

4. This injects a ``GeneratorExit`` exception into the generator<br>

comprehension body, which is currently suspended inside the<br>

comprehension's ``for`` loop body.<br>

5. The exception propagates out of the ``for`` loop, triggering the<br>

``for`` loop's implicit ``finally`` block, which calls<br>

``__iterclose__`` on the generator object representing the call to<br>

``read_newline_separated_json`<wbr>`.<br>

6. This injects an inner ``GeneratorExit`` exception into the body of<br>

``read_newline_separated_json`<wbr>`, currently suspended at the ``yield``.<br>

7. The inner ``GeneratorExit`` propagates out of the ``for`` loop,<br>

triggering the ``for`` loop's implicit ``finally`` block, which calls<br>

``__iterclose__()`` on the file object.<br>

8. The file object is closed.<br>

9. The inner ``GeneratorExit`` resumes propagating, hits the boundary<br>

of the generator function, and causes<br>

``read_newline_separated_json`<wbr>`'s ``__iterclose__()`` method to return<br>

successfully.<br>

10. Control returns to the generator comprehension body, and the outer<br>

``GeneratorExit`` continues propagating, allowing the comprehension's<br>

``__iterclose__()`` to return successfully.<br>

11. The rest of the ``__iterclose__()`` calls unwind without incident,<br>

back into the body of ``list``.<br>

12. The original ``AttributeError`` resumes propagating.<br>

<br>

(The details above assume that we implement ``file.__iterclose__``; if<br>

not then add a ``with`` block to ``read_newline_separated_json`<wbr>` and<br>

essentially the same logic goes through.)<br>

<br>

Of course, from the user's point of view, this can be simplified down to just:<br>

<br>

1. ``int.upper()`` raises an ``AttributeError``<br>

1. The file object is closed.<br>

2. The ``AttributeError`` propagates out of ``list``<br>

<br>

So we've accomplished our goal of making this "just work" without the<br>

user having to think about it.<br>

<br>

<br>

Transition plan<br>

===============<br>

<br>

While the majority of existing ``for`` loops will continue to produce<br>

identical results, the proposed changes will produce<br>

backwards-incompatible behavior in some cases. Example::<br>

<br>

  def read_csv_with_header(lines_<wbr>iterable):<br>

      lines_iterator = iter(lines_iterable)<br>

      for line in lines_iterator:<br>

          column_names = line.strip().split("\t")<br>

          break<br>

      for line in lines_iterator:<br>

          values = line.strip().split("\t")<br>

          record = dict(zip(column_names, values))<br>

          yield record<br>

<br>

This code used to be correct, but after this proposal is implemented<br>

will require an ``itertools.preserve`` call added to the first ``for``<br>

loop.<br>

<br>

[QUESTION: currently, if you close a generator and then try to iterate<br>

over it then it just raises ``Stop(Async)Iteration``, so code the<br>

passes the same generator object to multiple ``for`` loops but forgets<br>

to use ``itertools.preserve`` won't see an obvious error -- the second<br>

``for`` loop will just exit immediately. Perhaps it would be better if<br>

iterating a closed generator raised a ``RuntimeError``? Note that<br>

files don't have this problem -- attempting to iterate a closed file<br>

object already raises ``ValueError``.]<br>

<br>

Specifically, the incompatibility happens when all of these factors<br>

come together:<br>

<br>

- The automatic calling of ``__(a)iterclose__`` is enabled<br>

- The iterable did not previously define ``__(a)iterclose__``<br>

- The iterable does now define ``__(a)iterclose__``<br>

- The iterable is re-used after the ``for`` loop exits<br>

<br>

So the problem is how to manage this transition, and those are the<br>

levers we have to work with.<br>

<br>

First, observe that the only async iterables where we propose to add<br>

``__aiterclose__`` are async generators, and there is currently no<br>

existing code using async generators (though this will start changing<br>

very soon), so the async changes do not produce any backwards<br>

incompatibilities. (There is existing code using async iterators, but<br>

using the new async for loop on an old async iterator is harmless,<br>

because old async iterators don't have ``__aiterclose__``.) In<br>

addition, PEP 525 was accepted on a provisional basis, and async<br>

generators are by far the biggest beneficiary of this PEP's proposed<br>

changes. Therefore, I think we should strongly consider enabling<br>

``__aiterclose__`` for ``async for`` loops and async generators ASAP,<br>

ideally for 3.6.0 or 3.6.1.<br>

<br>

For the non-async world, things are harder, but here's a potential<br>

transition path:<br>

<br>

In 3.7:<br>

<br>

Our goal is that existing unsafe code will start emitting warnings,<br>

while those who want to opt-in to the future can do that immediately:<br>

<br>

- We immediately add all the ``__iterclose__`` methods described above.<br>

- If ``from __future__ import iterclose`` is in effect, then ``for``<br>

loops and ``*`` unpacking call ``__iterclose__`` as specified above.<br>

- If the future is *not* enabled, then ``for`` loops and ``*``<br>

unpacking do *not* call ``__iterclose__``. But they do call some other<br>

method instead, e.g. ``__iterclose_warning__``.<br>

- Similarly, functions like ``list`` use stack introspection (!!) to<br>

check whether their direct caller has ``__future__.iterclose``<br>

enabled, and use this to decide whether to call ``__iterclose__`` or<br>

``__iterclose_warning__``.<br>

- For all the wrapper iterators, we also add ``__iterclose_warning__``<br>

methods that forward to the ``__iterclose_warning__`` method of the<br>

underlying iterator or iterators.<br>

- For generators (and files, if we decide to do that),<br>

``__iterclose_warning__`` is defined to set an internal flag, and<br>

other methods on the object are modified to check for this flag. If<br>

they find the flag set, they issue a ``PendingDeprecationWarning`` to<br>

inform the user that in the future this sequence would have led to a<br>

use-after-close situation and the user should use ``preserve()``.<br>

<br>

In 3.8:<br>

<br>

- Switch from ``PendingDeprecationWarning`` to ``DeprecationWarning``<br>

<br>

In 3.9:<br>

<br>

- Enable the ``__future__`` unconditionally and remove all the<br>

``__iterclose_warning__`` stuff.<br>

<br>

I believe that this satisfies the normal requirements for this kind of<br>

transition -- opt-in initially, with warnings targeted precisely to<br>

the cases that will be effected, and a long deprecation cycle.<br>

<br>

Probably the most controversial / risky part of this is the use of<br>

stack introspection to make the iterable-consuming functions sensitive<br>

to a ``__future__`` setting, though I haven't thought of any situation<br>

where it would actually go wrong yet...<br>

<br>

<br>

Acknowledgements<br>

================<br>

<br>

Thanks to Yury Selivanov, Armin Rigo, and Carl Friedrich Bolz for<br>

helpful discussion on earlier versions of this idea.<br>

<span class="HOEnZb"><font color="#888888"><br>

--<br>

Nathaniel J. Smith -- <a href="https://vorpus.org" rel="noreferrer" target="_blank">https://vorpus.org</a><br>

______________________________<wbr>_________________<br>

Python-ideas mailing list<br>

<a href="mailto:Python-ideas@python.org">Python-ideas@python.org</a><br>

<a href="https://mail.python.org/mailman/listinfo/python-ideas" rel="noreferrer" target="_blank">https://mail.python.org/<wbr>mailman/listinfo/python-ideas</a><br>

Code of Conduct: <a href="http://python.org/psf/codeofconduct/" rel="noreferrer" target="_blank">http://python.org/psf/<wbr>codeofconduct/</a><br>

</font></span></blockquote></div><br></div>