[Python-ideas] Deterministic iterator cleanup

Amit Green amit.mixie at gmail.com
Fri Oct 21 18:48:30 EDT 2016


NOTE: This is my first post to this mailing list, I'm not really sure
      how to post a message, so I'm attempting a reply-all.

I like Nathaniel's idea for __iterclose__.

I suggest the following changes to deal with a few of the complex issues
he discussed.

1.  Missing __iterclose__, or a value of none, works as before,
    no changes.

2.  An iterator can be used in one of three ways:

    A. 'for' loop, which will call __iterclose__ when it exits

    B.  User controlled, in which case the user is responsible to use the
        iterator inside a with statement.

    C.  Old style.  The user is responsible for calling __iterclose__

3.  An iterator keeps track of __iter__ calls, this allows it to know
    when to cleanup.


The two key additions, above, are:

    #2B. User can use iterator with __enter__ & __exit cleanly.

    #3.  By tracking __iter__ calls, it makes complex user cases easier
         to handle.

Specification
=============

An iterator may implement the following method: __iterclose__.  A missing
method, or a value of None is allowed.

When the user wants to control the iterator, the user is expected to
use the iterator with a with clause.

The core proposal is the change in behavior of ``for`` loops. Given this
Python code:

  for VAR in ITERABLE:
      LOOP-BODY
  else:
      ELSE-BODY

we desugar to the equivalent of:

  _iter = iter(ITERABLE)
  _iterclose = getattr(_iter, '__iterclose__', None)

  if _iterclose is none:
      traditional-for VAR in _iter:
         LOOP-BODY
      else:
         ELSE-BODY
  else:
     _stop_exception_seen = False try:
         traditional-for VAR in _iter:
             LOOP-BODY
         else:
             _stop_exception_seen = True
             ELSE-BODY
     finally:
        if not _stop_exception_seen:
            _iterclose(_iter)

The test for 'none' allows us to skip the setup of a try/finally clause.

Also we don't bother to call __iterclose__ if the iterator threw
StopException at us.

Modifications to basic iterator types
=====================================

An iterator will implement something like the following:

  _cleanup       - Private funtion, does the following:

                        _enter_count = _itercount = -1

                        Do any neccessary cleanup, release resources, etc.

                   NOTE: Is also called internally by the iterator,
                   before throwing StopIterator

  _iter_count    - Private value, starts at 0.

  _enter_count   - Private value, starts at 0.

  __iter__       - if _iter_count >= 0:
                       _iter_count += 1

                   return self

  __iterclose__  - if _iter_count is 0:
                       if _enter_count is 0:
                           _cleanup()
                   elif _iter_count > 0:
                       _iter_count -= 1

  __enter__      - if _enter_count >= 0:
                       _enter_count += 1

                   Return itself.

  __exit__       - if _enter_count is > 0
                       _enter_count -= 1

                       if _enter_count is _iter_count is 0:
                            _cleanup()

The suggetions on _iter_count & _enter_count are just example; internal
details can differ (and better error handling).


Examples:
=========

NOTE: Example are givin using xrange() or [1, 2, 3, 4, 5, 6, 7] for
      simplicity.  For real use, the iterator would have resources such
      as open files it needs to close on cleanup.


1.  Simple example:

        for v in xrange(7):
            print v

    Creates an iterator with a _usage_count of 0.  The iterator exits
    normally (by throwing StopException), we don't bother to call
    __iterclose__


2.  Break example:

        for v in [1, 2, 3, 4, 5, 6, 7]:
            print v

            if v == 3:
                break

    Creates an iterator with a _usage_count of 0.

    The iterator exists after generating 4 numbers, we then call
    __iterclose__ & the iterator does any necessary cleanup.

3.  Convert example #2 to print the next value:

        with iter([1, 2, 3, 4, 5, 6, 7]) as seven:
            for v in seven:
                print v

                if v == 3:
                    break

            print 'Next value is: ', seven.next()

    This will print:

            1
            2
            3
            Next value is: 4

    How this works:

        1.  We create an iterator named seven (by calling list.__iter__).

        2.  We call seven.__enter__

        3.  The for loop calls: seven.next() 3 times, and then calls:
            seven.__iterclose__

            Since the _enter_count is 1, the iterator does not do
            cleanup yet.

        4.  We call seven.next()

        5.  We call seven.__exit.  The iterator does its cleanup now.

4.  More complicated example:

        with iter([1, 2, 3, 4, 5, 6, 7]) as seven:
            for v in seven:
                print v

                if v == 1:
                    for v in seven:
                        print 'stolen: ', v

                        if v == 3:
                            break

                if v == 5:
                    break

            for v in seven:
                print v * v

    This will print:

        1
        stolen: 2
        stolen: 3
        4
        5
        36
        49

    How this works:

        1.  Same as #3 above, cleanup is done by the __exit__

5.  Alternate way of doing #4.

        seven = iter([1, 2, 3, 4, 5, 6, 7])

        for v in seven:
            print v

            if v == 1:
                for v in seven:
                    print 'stolen: ', v

                    if v == 3:
                        break

            if v == 5:
                break

        for v in seven:
            print v * v
            break           #   Different from #4

        seven.__iterclose__()

    This will print:

        1
        stolen: 2
        stolen: 3
        4
        5
        36

    How this works:

        1.  We create an iterator named seven.

        2.  The for loops all call seven.__iter__, causing _iter_count
            to increment.

        3.  The for loops all call seven.__iterclose__ on exit, decrement
            _iter_count.

        4.  The user calls the final __iterclose_, which close the
            iterator.

    NOTE:
        Method #5 is NOT recommended, the 'with' syntax is better.

        However, something like itertools.zip could call __iterclose__
        during cleanup


Change to iterators
===================

All python iterators would need to add __iterclose__ (possibly with a
value of None), __enter__, & __exit__.

Third party iterators that do not implenent __iterclose__ cannot be
used in a with clause.  A new function could be added to itertools,
something like:

    with itertools.with_wrapper(third_party_iterator) as x:
        ...

The 'with_wrapper' would attempt to call __iterclose__ when its __exit__
function is called.

On Wed, Oct 19, 2016 at 12:38 AM, Nathaniel Smith <njs at pobox.com> wrote:

> Hi all,
>
> I'd like to propose that Python's iterator protocol be enhanced to add
> a first-class notion of completion / cleanup.
>
> This is mostly motivated by thinking about the issues around async
> generators and cleanup. Unfortunately even though PEP 525 was accepted
> I found myself unable to stop pondering this, and the more I've
> pondered the more convinced I've become that the GC hooks added in PEP
> 525 are really not enough, and that we'll regret it if we stick with
> them, or at least with them alone :-/. The strategy here is pretty
> different -- it's an attempt to dig down and make a fundamental
> improvement to the language that fixes a number of long-standing rough
> spots, including async generators.
>
> The basic concept is relatively simple: just adding a '__iterclose__'
> method that 'for' loops call upon completion, even if that's via break
> or exception. But, the overall issue is fairly complicated + iterators
> have a large surface area across the language, so the text below is
> pretty long. Mostly I wrote it all out to convince myself that there
> wasn't some weird showstopper lurking somewhere :-). For a first pass
> discussion, it probably makes sense to mainly focus on whether the
> basic concept makes sense? The main rationale is at the top, but the
> details are there too for those who want them.
>
> Also, for *right* now I'm hoping -- probably unreasonably -- to try to
> get the async iterator parts of the proposal in ASAP, ideally for
> 3.6.0 or 3.6.1. (I know this is about the worst timing for a proposal
> like this, which I apologize for -- though async generators are
> provisional in 3.6, so at least in theory changing them is not out of
> the question.) So again, it might make sense to focus especially on
> the async parts, which are a pretty small and self-contained part, and
> treat the rest of the proposal as a longer-term plan provided for
> context. The comparison to PEP 525 GC hooks comes right after the
> initial rationale.
>
> Anyway, I'll be interested to hear what you think!
>
> -n
>
> ------------------
>
> Abstract
> ========
>
> We propose to extend the iterator protocol with a new
> ``__(a)iterclose__`` slot, which is called automatically on exit from
> ``(async) for`` loops, regardless of how they exit. This allows for
> convenient, deterministic cleanup of resources held by iterators
> without reliance on the garbage collector. This is especially valuable
> for asynchronous generators.
>
>
> Note on timing
> ==============
>
> In practical terms, the proposal here is divided into two separate
> parts: the handling of async iterators, which should ideally be
> implemented ASAP, and the handling of regular iterators, which is a
> larger but more relaxed project that can't start until 3.7 at the
> earliest. But since the changes are closely related, and we probably
> don't want to end up with async iterators and regular iterators
> diverging in the long run, it seems useful to look at them together.
>
>
> Background and motivation
> =========================
>
> Python iterables often hold resources which require cleanup. For
> example: ``file`` objects need to be closed; the `WSGI spec
> <https://www.python.org/dev/peps/pep-0333/>`_ adds a ``close`` method
> on top of the regular iterator protocol and demands that consumers
> call it at the appropriate time (though forgetting to do so is a
> `frequent source of bugs
> <http://blog.dscpl.com.au/2012/10/obligations-for-calling-close-on.html
> >`_);
> and PEP 342 (based on PEP 325) extended generator objects to add a
> ``close`` method to allow generators to clean up after themselves.
>
> Generally, objects that need to clean up after themselves also define
> a ``__del__`` method to ensure that this cleanup will happen
> eventually, when the object is garbage collected. However, relying on
> the garbage collector for cleanup like this causes serious problems in
> at least two cases:
>
> - In Python implementations that do not use reference counting (e.g.
> PyPy, Jython), calls to ``__del__`` may be arbitrarily delayed -- yet
> many situations require *prompt* cleanup of resources. Delayed cleanup
> produces problems like crashes due to file descriptor exhaustion, or
> WSGI timing middleware that collects bogus times.
>
> - Async generators (PEP 525) can only perform cleanup under the
> supervision of the appropriate coroutine runner. ``__del__`` doesn't
> have access to the coroutine runner; indeed, the coroutine runner
> might be garbage collected before the generator object. So relying on
> the garbage collector is effectively impossible without some kind of
> language extension. (PEP 525 does provide such an extension, but it
> has a number of limitations that this proposal fixes; see the
> "alternatives" section below for discussion.)
>
> Fortunately, Python provides a standard tool for doing resource
> cleanup in a more structured way: ``with`` blocks. For example, this
> code opens a file but relies on the garbage collector to close it::
>
>   def read_newline_separated_json(path):
>       for line in open(path):
>           yield json.loads(line)
>
>   for document in read_newline_separated_json(path):
>       ...
>
> and recent versions of CPython will point this out by issuing a
> ``ResourceWarning``, nudging us to fix it by adding a ``with`` block::
>
>   def read_newline_separated_json(path):
>       with open(path) as file_handle:      # <-- with block
>           for line in file_handle:
>               yield json.loads(line)
>
>   for document in read_newline_separated_json(path):  # <-- outer for loop
>       ...
>
> But there's a subtlety here, caused by the interaction of ``with``
> blocks and generators. ``with`` blocks are Python's main tool for
> managing cleanup, and they're a powerful one, because they pin the
> lifetime of a resource to the lifetime of a stack frame. But this
> assumes that someone will take care of cleaning up the stack frame...
> and for generators, this requires that someone ``close`` them.
>
> In this case, adding the ``with`` block *is* enough to shut up the
> ``ResourceWarning``, but this is misleading -- the file object cleanup
> here is still dependent on the garbage collector. The ``with`` block
> will only be unwound when the ``read_newline_separated_json``
> generator is closed. If the outer ``for`` loop runs to completion then
> the cleanup will happen immediately; but if this loop is terminated
> early by a ``break`` or an exception, then the ``with`` block won't
> fire until the generator object is garbage collected.
>
> The correct solution requires that all *users* of this API wrap every
> ``for`` loop in its own ``with`` block::
>
>   with closing(read_newline_separated_json(path)) as genobj:
>       for document in genobj:
>           ...
>
> This gets even worse if we consider the idiom of decomposing a complex
> pipeline into multiple nested generators::
>
>   def read_users(path):
>       with closing(read_newline_separated_json(path)) as gen:
>           for document in gen:
>               yield User.from_json(document)
>
>   def users_in_group(path, group):
>       with closing(read_users(path)) as gen:
>           for user in gen:
>               if user.group == group:
>                   yield user
>
> In general if you have N nested generators then you need N+1 ``with``
> blocks to clean up 1 file. And good defensive programming would
> suggest that any time we use a generator, we should assume the
> possibility that there could be at least one ``with`` block somewhere
> in its (potentially transitive) call stack, either now or in the
> future, and thus always wrap it in a ``with``. But in practice,
> basically nobody does this, because programmers would rather write
> buggy code than tiresome repetitive code. In simple cases like this
> there are some workarounds that good Python developers know (e.g. in
> this simple case it would be idiomatic to pass in a file handle
> instead of a path and move the resource management to the top level),
> but in general we cannot avoid the use of ``with``/``finally`` inside
> of generators, and thus dealing with this problem one way or another.
> When beauty and correctness fight then beauty tends to win, so it's
> important to make correct code beautiful.
>
> Still, is this worth fixing? Until async generators came along I would
> have argued yes, but that it was a low priority, since everyone seems
> to be muddling along okay -- but async generators make it much more
> urgent. Async generators cannot do cleanup *at all* without some
> mechanism for deterministic cleanup that people will actually use, and
> async generators are particularly likely to hold resources like file
> descriptors. (After all, if they weren't doing I/O, they'd be
> generators, not async generators.) So we have to do something, and it
> might as well be a comprehensive fix to the underlying problem. And
> it's much easier to fix this now when async generators are first
> rolling out, then it will be to fix it later.
>
> The proposal itself is simple in concept: add a ``__(a)iterclose__``
> method to the iterator protocol, and have (async) ``for`` loops call
> it when the loop is exited, even if this occurs via ``break`` or
> exception unwinding. Effectively, we're taking the current cumbersome
> idiom (``with`` block + ``for`` loop) and merging them together into a
> fancier ``for``. This may seem non-orthogonal, but makes sense when
> you consider that the existence of generators means that ``with``
> blocks actually depend on iterator cleanup to work reliably, plus
> experience showing that iterator cleanup is often a desireable feature
> in its own right.
>
>
> Alternatives
> ============
>
> PEP 525 asyncgen hooks
> ----------------------
>
> PEP 525 proposes a `set of global thread-local hooks managed by new
> ``sys.{get/set}_asyncgen_hooks()`` functions
> <https://www.python.org/dev/peps/pep-0525/#finalization>`_, which
> allow event loops to integrate with the garbage collector to run
> cleanup for async generators. In principle, this proposal and PEP 525
> are complementary, in the same way that ``with`` blocks and
> ``__del__`` are complementary: this proposal takes care of ensuring
> deterministic cleanup in most cases, while PEP 525's GC hooks clean up
> anything that gets missed. But ``__aiterclose__`` provides a number of
> advantages over GC hooks alone:
>
> - The GC hook semantics aren't part of the abstract async iterator
> protocol, but are instead restricted `specifically to the async
> generator concrete type <XX find and link Yury's email saying this>`_.
> If you have an async iterator implemented using a class, like::
>
>     class MyAsyncIterator:
>         async def __anext__():
>             ...
>
>   then you can't refactor this into an async generator without
> changing its semantics, and vice-versa. This seems very unpythonic.
> (It also leaves open the question of what exactly class-based async
> iterators are supposed to do, given that they face exactly the same
> cleanup problems as async generators.) ``__aiterclose__``, on the
> other hand, is defined at the protocol level, so it's duck-type
> friendly and works for all iterators, not just generators.
>
> - Code that wants to work on non-CPython implementations like PyPy
> cannot in general rely on GC for cleanup. Without ``__aiterclose__``,
> it's more or less guaranteed that developers who develop and test on
> CPython will produce libraries that leak resources when used on PyPy.
> Developers who do want to target alternative implementations will
> either have to take the defensive approach of wrapping every ``for``
> loop in a ``with`` block, or else carefully audit their code to figure
> out which generators might possibly contain cleanup code and add
> ``with`` blocks around those only. With ``__aiterclose__``, writing
> portable code becomes easy and natural.
>
> - An important part of building robust software is making sure that
> exceptions always propagate correctly without being lost. One of the
> most exciting things about async/await compared to traditional
> callback-based systems is that instead of requiring manual chaining,
> the runtime can now do the heavy lifting of propagating errors, making
> it *much* easier to write robust code. But, this beautiful new picture
> has one major gap: if we rely on the GC for generator cleanup, then
> exceptions raised during cleanup are lost. So, again, with
> ``__aiterclose__``, developers who care about this kind of robustness
> will either have to take the defensive approach of wrapping every
> ``for`` loop in a ``with`` block, or else carefully audit their code
> to figure out which generators might possibly contain cleanup code.
> ``__aiterclose__`` plugs this hole by performing cleanup in the
> caller's context, so writing more robust code becomes the path of
> least resistance.
>
> - The WSGI experience suggests that there exist important
> iterator-based APIs that need prompt cleanup and cannot rely on the
> GC, even in CPython. For example, consider a hypothetical WSGI-like
> API based around async/await and async iterators, where a response
> handler is an async generator that takes request headers + an async
> iterator over the request body, and yields response headers + the
> response body. (This is actually the use case that got me interested
> in async generators in the first place, i.e. this isn't hypothetical.)
> If we follow WSGI in requiring that child iterators must be closed
> properly, then without ``__aiterclose__`` the absolute most
> minimalistic middleware in our system looks something like::
>
>     async def noop_middleware(handler, request_header, request_body):
>         async with aclosing(handler(request_body, request_body)) as aiter:
>             async for response_item in aiter:
>                 yield response_item
>
>   Arguably in regular code one can get away with skipping the ``with``
> block around ``for`` loops, depending on how confident one is that one
> understands the internal implementation of the generator. But here we
> have to cope with arbitrary response handlers, so without
> ``__aiterclose__``, this ``with`` construction is a mandatory part of
> every middleware.
>
>   ``__aiterclose__`` allows us to eliminate the mandatory boilerplate
> and an extra level of indentation from every middleware::
>
>     async def noop_middleware(handler, request_header, request_body):
>         async for response_item in handler(request_header, request_body):
>             yield response_item
>
> So the ``__aiterclose__`` approach provides substantial advantages
> over GC hooks.
>
> This leaves open the question of whether we want a combination of GC
> hooks + ``__aiterclose__``, or just ``__aiterclose__`` alone. Since
> the vast majority of generators are iterated over using a ``for`` loop
> or equivalent, ``__aiterclose__`` handles most situations before the
> GC has a chance to get involved. The case where GC hooks provide
> additional value is in code that does manual iteration, e.g.::
>
>     agen = fetch_newline_separated_json_from_url(...)
>     while True:
>         document = await type(agen).__anext__(agen)
>         if document["id"] == needle:
>             break
>     # doesn't do 'await agen.aclose()'
>
> If we go with the GC-hooks + ``__aiterclose__`` approach, this
> generator will eventually be cleaned up by GC calling the generator
> ``__del__`` method, which then will use the hooks to call back into
> the event loop to run the cleanup code.
>
> If we go with the no-GC-hooks approach, this generator will eventually
> be garbage collected, with the following effects:
>
> - its ``__del__`` method will issue a warning that the generator was
> not closed (similar to the existing "coroutine never awaited"
> warning).
>
> - The underlying resources involved will still be cleaned up, because
> the generator frame will still be garbage collected, causing it to
> drop references to any file handles or sockets it holds, and then
> those objects's ``__del__`` methods will release the actual operating
> system resources.
>
> - But, any cleanup code inside the generator itself (e.g. logging,
> buffer flushing) will not get a chance to run.
>
> The solution here -- as the warning would indicate -- is to fix the
> code so that it calls ``__aiterclose__``, e.g. by using a ``with``
> block::
>
>     async with aclosing(fetch_newline_separated_json_from_url(...)) as
> agen:
>         while True:
>             document = await type(agen).__anext__(agen)
>             if document["id"] == needle:
>                 break
>
> Basically in this approach, the rule would be that if you want to
> manually implement the iterator protocol, then it's your
> responsibility to implement all of it, and that now includes
> ``__(a)iterclose__``.
>
> GC hooks add non-trivial complexity in the form of (a) new global
> interpreter state, (b) a somewhat complicated control flow (e.g.,
> async generator GC always involves resurrection, so the details of PEP
> 442 are important), and (c) a new public API in asyncio (``await
> loop.shutdown_asyncgens()``) that users have to remember to call at
> the appropriate time. (This last point in particular somewhat
> undermines the argument that GC hooks provide a safe backup to
> guarantee cleanup, since if ``shutdown_asyncgens()`` isn't called
> correctly then I *think* it's possible for generators to be silently
> discarded without their cleanup code being called; compare this to the
> ``__aiterclose__``-only approach where in the worst case we still at
> least get a warning printed. This might be fixable.) All this
> considered, GC hooks arguably aren't worth it, given that the only
> people they help are those who want to manually call ``__anext__`` yet
> don't want to manually call ``__aiterclose__``. But Yury disagrees
> with me on this :-). And both options are viable.
>
>
> Always inject resources, and do all cleanup at the top level
> ------------------------------------------------------------
>
> It was suggested on python-dev (XX find link) that a pattern to avoid
> these problems is to always pass resources in from above, e.g.
> ``read_newline_separated_json`` should take a file object rather than
> a path, with cleanup handled at the top level::
>
>   def read_newline_separated_json(file_handle):
>       for line in file_handle:
>           yield json.loads(line)
>
>   def read_users(file_handle):
>       for document in read_newline_separated_json(file_handle):
>           yield User.from_json(document)
>
>   with open(path) as file_handle:
>       for user in read_users(file_handle):
>           ...
>
> This works well in simple cases; here it lets us avoid the "N+1
> ``with`` blocks problem". But unfortunately, it breaks down quickly
> when things get more complex. Consider if instead of reading from a
> file, our generator was reading from a streaming HTTP GET request --
> while handling redirects and authentication via OAUTH. Then we'd
> really want the sockets to be managed down inside our HTTP client
> library, not at the top level. Plus there are other cases where
> ``finally`` blocks embedded inside generators are important in their
> own right: db transaction management, emitting logging information
> during cleanup (one of the major motivating use cases for WSGI
> ``close``), and so forth. So this is really a workaround for simple
> cases, not a general solution.
>
>
> More complex variants of __(a)iterclose__
> -----------------------------------------
>
> The semantics of ``__(a)iterclose__`` are somewhat inspired by
> ``with`` blocks, but context managers are more powerful:
> ``__(a)exit__`` can distinguish between a normal exit versus exception
> unwinding, and in the case of an exception it can examine the
> exception details and optionally suppress propagation.
> ``__(a)iterclose__`` as proposed here does not have these powers, but
> one can imagine an alternative design where it did.
>
> However, this seems like unwarranted complexity: experience suggests
> that it's common for iterables to have ``close`` methods, and even to
> have ``__exit__`` methods that call ``self.close()``, but I'm not
> aware of any common cases that make use of ``__exit__``'s full power.
> I also can't think of any examples where this would be useful. And it
> seems unnecessarily confusing to allow iterators to affect flow
> control by swallowing exceptions -- if you're in a situation where you
> really want that, then you should probably use a real ``with`` block
> anyway.
>
>
> Specification
> =============
>
> This section describes where we want to eventually end up, though
> there are some backwards compatibility issues that mean we can't jump
> directly here. A later section describes the transition plan.
>
>
> Guiding principles
> ------------------
>
> Generally, ``__(a)iterclose__`` implementations should:
>
> - be idempotent,
> - perform any cleanup that is appropriate on the assumption that the
> iterator will not be used again after ``__(a)iterclose__`` is called.
> In particular, once ``__(a)iterclose__`` has been called then calling
> ``__(a)next__`` produces undefined behavior.
>
> And generally, any code which starts iterating through an iterable
> with the intention of exhausting it, should arrange to make sure that
> ``__(a)iterclose__`` is eventually called, whether or not the iterator
> is actually exhausted.
>
>
> Changes to iteration
> --------------------
>
> The core proposal is the change in behavior of ``for`` loops. Given
> this Python code::
>
>   for VAR in ITERABLE:
>       LOOP-BODY
>   else:
>       ELSE-BODY
>
> we desugar to the equivalent of::
>
>   _iter = iter(ITERABLE)
>   _iterclose = getattr(type(_iter), "__iterclose__", lambda: None)
>   try:
>       traditional-for VAR in _iter:
>           LOOP-BODY
>       else:
>           ELSE-BODY
>   finally:
>       _iterclose(_iter)
>
> where the "traditional-for statement" here is meant as a shorthand for
> the classic 3.5-and-earlier ``for`` loop semantics.
>
> Besides the top-level ``for`` statement, Python also contains several
> other places where iterators are consumed. For consistency, these
> should call ``__iterclose__`` as well using semantics equivalent to
> the above. This includes:
>
> - ``for`` loops inside comprehensions
> - ``*`` unpacking
> - functions which accept and fully consume iterables, like
> ``list(it)``, ``tuple(it)``, ``itertools.product(it1, it2, ...)``, and
> others.
>
>
> Changes to async iteration
> --------------------------
>
> We also make the analogous changes to async iteration constructs,
> except that the new slot is called ``__aiterclose__``, and it's an
> async method that gets ``await``\ed.
>
>
> Modifications to basic iterator types
> -------------------------------------
>
> Generator objects (including those created by generator comprehensions):
> - ``__iterclose__`` calls ``self.close()``
> - ``__del__`` calls ``self.close()`` (same as now), and additionally
> issues a ``ResourceWarning`` if the generator wasn't exhausted. This
> warning is hidden by default, but can be enabled for those who want to
> make sure they aren't inadverdantly relying on CPython-specific GC
> semantics.
>
> Async generator objects (including those created by async generator
> comprehensions):
> - ``__aiterclose__`` calls ``self.aclose()``
> - ``__del__`` issues a ``RuntimeWarning`` if ``aclose`` has not been
> called, since this probably indicates a latent bug, similar to the
> "coroutine never awaited" warning.
>
> QUESTION: should file objects implement ``__iterclose__`` to close the
> file? On the one hand this would make this change more disruptive; on
> the other hand people really like writing ``for line in open(...):
> ...``, and if we get used to iterators taking care of their own
> cleanup then it might become very weird if files don't.
>
>
> New convenience functions
> -------------------------
>
> The ``itertools`` module gains a new iterator wrapper that can be used
> to selectively disable the new ``__iterclose__`` behavior::
>
>   # QUESTION: I feel like there might be a better name for this one?
>   class preserve(iterable):
>       def __init__(self, iterable):
>           self._it = iter(iterable)
>
>       def __iter__(self):
>           return self
>
>       def __next__(self):
>           return next(self._it)
>
>       def __iterclose__(self):
>           # Swallow __iterclose__ without passing it on
>           pass
>
> Example usage (assuming that file objects implements ``__iterclose__``)::
>
>   with open(...) as handle:
>       # Iterate through the same file twice:
>       for line in itertools.preserve(handle):
>           ...
>       handle.seek(0)
>       for line in itertools.preserve(handle):
>           ...
>
> The ``operator`` module gains two new functions, with semantics
> equivalent to the following::
>
>   def iterclose(it):
>       if hasattr(type(it), "__iterclose__"):
>           type(it).__iterclose__(it)
>
>   async def aiterclose(ait):
>       if hasattr(type(ait), "__aiterclose__"):
>           await type(ait).__aiterclose__(ait)
>
> These are particularly useful when implementing the changes in the next
> section:
>
>
> __iterclose__ implementations for iterator wrappers
> ---------------------------------------------------
>
> Python ships a number of iterator types that act as wrappers around
> other iterators: ``map``, ``zip``, ``itertools.accumulate``,
> ``csv.reader``, and others. These iterators should define a
> ``__iterclose__`` method which calls ``__iterclose__`` in turn on
> their underlying iterators. For example, ``map`` could be implemented
> as::
>
>   class map:
>       def __init__(self, fn, *iterables):
>           self._fn = fn
>           self._iters = [iter(iterable) for iterable in iterables]
>
>       def __iter__(self):
>           return self
>
>       def __next__(self):
>           return self._fn(*[next(it) for it in self._iters])
>
>       def __iterclose__(self):
>           for it in self._iters:
>               operator.iterclose(it)
>
> In some cases this requires some subtlety; for example,
> ```itertools.tee``
> <https://docs.python.org/3/library/itertools.html#itertools.tee>`_
> should not call ``__iterclose__`` on the underlying iterator until it
> has been called on *all* of the clone iterators.
>
>
> Example / Rationale
> -------------------
>
> The payoff for all this is that we can now write straightforward code
> like::
>
>   def read_newline_separated_json(path):
>       for line in open(path):
>           yield json.loads(line)
>
> and be confident that the file will receive deterministic cleanup
> *without the end-user having to take any special effort*, even in
> complex cases. For example, consider this silly pipeline::
>
>   list(map(lambda key: key.upper(),
>            doc["key"] for doc in read_newline_separated_json(path)))
>
> If our file contains a document where ``doc["key"]`` turns out to be
> an integer, then the following sequence of events will happen:
>
> 1. ``key.upper()`` raises an ``AttributeError``, which propagates out
> of the ``map`` and triggers the implicit ``finally`` block inside
> ``list``.
> 2. The ``finally`` block in ``list`` calls ``__iterclose__()`` on the
> map object.
> 3. ``map.__iterclose__()`` calls ``__iterclose__()`` on the generator
> comprehension object.
> 4. This injects a ``GeneratorExit`` exception into the generator
> comprehension body, which is currently suspended inside the
> comprehension's ``for`` loop body.
> 5. The exception propagates out of the ``for`` loop, triggering the
> ``for`` loop's implicit ``finally`` block, which calls
> ``__iterclose__`` on the generator object representing the call to
> ``read_newline_separated_json``.
> 6. This injects an inner ``GeneratorExit`` exception into the body of
> ``read_newline_separated_json``, currently suspended at the ``yield``.
> 7. The inner ``GeneratorExit`` propagates out of the ``for`` loop,
> triggering the ``for`` loop's implicit ``finally`` block, which calls
> ``__iterclose__()`` on the file object.
> 8. The file object is closed.
> 9. The inner ``GeneratorExit`` resumes propagating, hits the boundary
> of the generator function, and causes
> ``read_newline_separated_json``'s ``__iterclose__()`` method to return
> successfully.
> 10. Control returns to the generator comprehension body, and the outer
> ``GeneratorExit`` continues propagating, allowing the comprehension's
> ``__iterclose__()`` to return successfully.
> 11. The rest of the ``__iterclose__()`` calls unwind without incident,
> back into the body of ``list``.
> 12. The original ``AttributeError`` resumes propagating.
>
> (The details above assume that we implement ``file.__iterclose__``; if
> not then add a ``with`` block to ``read_newline_separated_json`` and
> essentially the same logic goes through.)
>
> Of course, from the user's point of view, this can be simplified down to
> just:
>
> 1. ``int.upper()`` raises an ``AttributeError``
> 1. The file object is closed.
> 2. The ``AttributeError`` propagates out of ``list``
>
> So we've accomplished our goal of making this "just work" without the
> user having to think about it.
>
>
> Transition plan
> ===============
>
> While the majority of existing ``for`` loops will continue to produce
> identical results, the proposed changes will produce
> backwards-incompatible behavior in some cases. Example::
>
>   def read_csv_with_header(lines_iterable):
>       lines_iterator = iter(lines_iterable)
>       for line in lines_iterator:
>           column_names = line.strip().split("\t")
>           break
>       for line in lines_iterator:
>           values = line.strip().split("\t")
>           record = dict(zip(column_names, values))
>           yield record
>
> This code used to be correct, but after this proposal is implemented
> will require an ``itertools.preserve`` call added to the first ``for``
> loop.
>
> [QUESTION: currently, if you close a generator and then try to iterate
> over it then it just raises ``Stop(Async)Iteration``, so code the
> passes the same generator object to multiple ``for`` loops but forgets
> to use ``itertools.preserve`` won't see an obvious error -- the second
> ``for`` loop will just exit immediately. Perhaps it would be better if
> iterating a closed generator raised a ``RuntimeError``? Note that
> files don't have this problem -- attempting to iterate a closed file
> object already raises ``ValueError``.]
>
> Specifically, the incompatibility happens when all of these factors
> come together:
>
> - The automatic calling of ``__(a)iterclose__`` is enabled
> - The iterable did not previously define ``__(a)iterclose__``
> - The iterable does now define ``__(a)iterclose__``
> - The iterable is re-used after the ``for`` loop exits
>
> So the problem is how to manage this transition, and those are the
> levers we have to work with.
>
> First, observe that the only async iterables where we propose to add
> ``__aiterclose__`` are async generators, and there is currently no
> existing code using async generators (though this will start changing
> very soon), so the async changes do not produce any backwards
> incompatibilities. (There is existing code using async iterators, but
> using the new async for loop on an old async iterator is harmless,
> because old async iterators don't have ``__aiterclose__``.) In
> addition, PEP 525 was accepted on a provisional basis, and async
> generators are by far the biggest beneficiary of this PEP's proposed
> changes. Therefore, I think we should strongly consider enabling
> ``__aiterclose__`` for ``async for`` loops and async generators ASAP,
> ideally for 3.6.0 or 3.6.1.
>
> For the non-async world, things are harder, but here's a potential
> transition path:
>
> In 3.7:
>
> Our goal is that existing unsafe code will start emitting warnings,
> while those who want to opt-in to the future can do that immediately:
>
> - We immediately add all the ``__iterclose__`` methods described above.
> - If ``from __future__ import iterclose`` is in effect, then ``for``
> loops and ``*`` unpacking call ``__iterclose__`` as specified above.
> - If the future is *not* enabled, then ``for`` loops and ``*``
> unpacking do *not* call ``__iterclose__``. But they do call some other
> method instead, e.g. ``__iterclose_warning__``.
> - Similarly, functions like ``list`` use stack introspection (!!) to
> check whether their direct caller has ``__future__.iterclose``
> enabled, and use this to decide whether to call ``__iterclose__`` or
> ``__iterclose_warning__``.
> - For all the wrapper iterators, we also add ``__iterclose_warning__``
> methods that forward to the ``__iterclose_warning__`` method of the
> underlying iterator or iterators.
> - For generators (and files, if we decide to do that),
> ``__iterclose_warning__`` is defined to set an internal flag, and
> other methods on the object are modified to check for this flag. If
> they find the flag set, they issue a ``PendingDeprecationWarning`` to
> inform the user that in the future this sequence would have led to a
> use-after-close situation and the user should use ``preserve()``.
>
> In 3.8:
>
> - Switch from ``PendingDeprecationWarning`` to ``DeprecationWarning``
>
> In 3.9:
>
> - Enable the ``__future__`` unconditionally and remove all the
> ``__iterclose_warning__`` stuff.
>
> I believe that this satisfies the normal requirements for this kind of
> transition -- opt-in initially, with warnings targeted precisely to
> the cases that will be effected, and a long deprecation cycle.
>
> Probably the most controversial / risky part of this is the use of
> stack introspection to make the iterable-consuming functions sensitive
> to a ``__future__`` setting, though I haven't thought of any situation
> where it would actually go wrong yet...
>
>
> Acknowledgements
> ================
>
> Thanks to Yury Selivanov, Armin Rigo, and Carl Friedrich Bolz for
> helpful discussion on earlier versions of this idea.
>
> --
> Nathaniel J. Smith -- https://vorpus.org
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20161021/36bbd05e/attachment-0001.html>


More information about the Python-ideas mailing list