NOTE: This is my first post to this mailing list, I'm not really sure
      how to post a message, so I'm attempting a reply-all.

I like Nathaniel's idea for __iterclose__.

I suggest the following changes to deal with a few of the complex issues
he discussed.

1.  Missing __iterclose__, or a value of none, works as before,
    no changes.

2.  An iterator can be used in one of three ways:

    A. 'for' loop, which will call __iterclose__ when it exits

    B.  User controlled, in which case the user is responsible to use the
        iterator inside a with statement.

    C.  Old style.  The user is responsible for calling __iterclose__

3.  An iterator keeps track of __iter__ calls, this allows it to know
    when to cleanup.


The two key additions, above, are:

    #2B. User can use iterator with __enter__ & __exit cleanly.

    #3.  By tracking __iter__ calls, it makes complex user cases easier
         to handle.

Specification
=============

An iterator may implement the following method: __iterclose__.  A missing
method, or a value of None is allowed.

When the user wants to control the iterator, the user is expected to
use the iterator with a with clause.

The core proposal is the change in behavior of ``for`` loops. Given this
Python code:

  for VAR in ITERABLE:
      LOOP-BODY
  else:
      ELSE-BODY

we desugar to the equivalent of:

  _iter = iter(ITERABLE)
  _iterclose = getattr(_iter, '__iterclose__', None)

  if _iterclose is none:
      traditional-for VAR in _iter:
         LOOP-BODY
      else:
         ELSE-BODY
  else:
     _stop_exception_seen = False try:
         traditional-for VAR in _iter:
             LOOP-BODY
         else:
             _stop_exception_seen = True
             ELSE-BODY
     finally:
        if not _stop_exception_seen:
            _iterclose(_iter)

The test for 'none' allows us to skip the setup of a try/finally clause.

Also we don't bother to call __iterclose__ if the iterator threw
StopException at us.

Modifications to basic iterator types
=====================================

An iterator will implement something like the following:

  _cleanup       - Private funtion, does the following:

                        _enter_count = _itercount = -1

                        Do any neccessary cleanup, release resources, etc.

                   NOTE: Is also called internally by the iterator,
                   before throwing StopIterator

  _iter_count    - Private value, starts at 0.

  _enter_count   - Private value, starts at 0.

  __iter__       - if _iter_count >= 0:
                       _iter_count += 1

                   return self

  __iterclose__  - if _iter_count is 0:
                       if _enter_count is 0:
                           _cleanup()
                   elif _iter_count > 0:
                       _iter_count -= 1

  __enter__      - if _enter_count >= 0:
                       _enter_count += 1

                   Return itself.

  __exit__       - if _enter_count is > 0
                       _enter_count -= 1

                       if _enter_count is _iter_count is 0:
                            _cleanup()

The suggetions on _iter_count & _enter_count are just example; internal
details can differ (and better error handling).


Examples:
=========

NOTE: Example are givin using xrange() or [1, 2, 3, 4, 5, 6, 7] for
      simplicity.  For real use, the iterator would have resources such
      as open files it needs to close on cleanup.


1.  Simple example:

        for v in xrange(7):
            print v

    Creates an iterator with a _usage_count of 0.  The iterator exits
    normally (by throwing StopException), we don't bother to call
    __iterclose__


2.  Break example:

        for v in [1, 2, 3, 4, 5, 6, 7]:
            print v

            if v == 3:
                break

    Creates an iterator with a _usage_count of 0.

    The iterator exists after generating 4 numbers, we then call
    __iterclose__ & the iterator does any necessary cleanup.

3.  Convert example #2 to print the next value:

        with iter([1, 2, 3, 4, 5, 6, 7]) as seven:
            for v in seven:
                print v

                if v == 3:
                    break

            print 'Next value is: ', seven.next()

    This will print:

            1
            2
            3
            Next value is: 4

    How this works:

        1.  We create an iterator named seven (by calling list.__iter__).

        2.  We call seven.__enter__

        3.  The for loop calls: seven.next() 3 times, and then calls:
            seven.__iterclose__

            Since the _enter_count is 1, the iterator does not do
            cleanup yet.

        4.  We call seven.next()

        5.  We call seven.__exit.  The iterator does its cleanup now.

4.  More complicated example:

        with iter([1, 2, 3, 4, 5, 6, 7]) as seven:
            for v in seven:
                print v

                if v == 1:
                    for v in seven:
                        print 'stolen: ', v

                        if v == 3:
                            break

                if v == 5:
                    break

            for v in seven:
                print v * v

    This will print:

        1
        stolen: 2
        stolen: 3
        4
        5
        36
        49

    How this works:

        1.  Same as #3 above, cleanup is done by the __exit__

5.  Alternate way of doing #4.

        seven = iter([1, 2, 3, 4, 5, 6, 7])

        for v in seven:
            print v

            if v == 1:
                for v in seven:
                    print 'stolen: ', v

                    if v == 3:
                        break

            if v == 5:
                break

        for v in seven:
            print v * v
            break           #   Different from #4

        seven.__iterclose__()

    This will print:

        1
        stolen: 2
        stolen: 3
        4
        5
        36

    How this works:

        1.  We create an iterator named seven.

        2.  The for loops all call seven.__iter__, causing _iter_count
            to increment.

        3.  The for loops all call seven.__iterclose__ on exit, decrement
            _iter_count.

        4.  The user calls the final __iterclose_, which close the
            iterator.

    NOTE:
        Method #5 is NOT recommended, the 'with' syntax is better.

        However, something like itertools.zip could call __iterclose__
        during cleanup


Change to iterators
===================

All python iterators would need to add __iterclose__ (possibly with a
value of None), __enter__, & __exit__.

Third party iterators that do not implenent __iterclose__ cannot be
used in a with clause.  A new function could be added to itertools,
something like:

    with itertools.with_wrapper(third_party_iterator) as x:
        ...

The 'with_wrapper' would attempt to call __iterclose__ when its __exit__
function is called.

On Wed, Oct 19, 2016 at 12:38 AM, Nathaniel Smith <njs@pobox.com> wrote:
Hi all,

I'd like to propose that Python's iterator protocol be enhanced to add
a first-class notion of completion / cleanup.

This is mostly motivated by thinking about the issues around async
generators and cleanup. Unfortunately even though PEP 525 was accepted
I found myself unable to stop pondering this, and the more I've
pondered the more convinced I've become that the GC hooks added in PEP
525 are really not enough, and that we'll regret it if we stick with
them, or at least with them alone :-/. The strategy here is pretty
different -- it's an attempt to dig down and make a fundamental
improvement to the language that fixes a number of long-standing rough
spots, including async generators.

The basic concept is relatively simple: just adding a '__iterclose__'
method that 'for' loops call upon completion, even if that's via break
or exception. But, the overall issue is fairly complicated + iterators
have a large surface area across the language, so the text below is
pretty long. Mostly I wrote it all out to convince myself that there
wasn't some weird showstopper lurking somewhere :-). For a first pass
discussion, it probably makes sense to mainly focus on whether the
basic concept makes sense? The main rationale is at the top, but the
details are there too for those who want them.

Also, for *right* now I'm hoping -- probably unreasonably -- to try to
get the async iterator parts of the proposal in ASAP, ideally for
3.6.0 or 3.6.1. (I know this is about the worst timing for a proposal
like this, which I apologize for -- though async generators are
provisional in 3.6, so at least in theory changing them is not out of
the question.) So again, it might make sense to focus especially on
the async parts, which are a pretty small and self-contained part, and
treat the rest of the proposal as a longer-term plan provided for
context. The comparison to PEP 525 GC hooks comes right after the
initial rationale.

Anyway, I'll be interested to hear what you think!

-n

------------------

Abstract
========

We propose to extend the iterator protocol with a new
``__(a)iterclose__`` slot, which is called automatically on exit from
``(async) for`` loops, regardless of how they exit. This allows for
convenient, deterministic cleanup of resources held by iterators
without reliance on the garbage collector. This is especially valuable
for asynchronous generators.


Note on timing
==============

In practical terms, the proposal here is divided into two separate
parts: the handling of async iterators, which should ideally be
implemented ASAP, and the handling of regular iterators, which is a
larger but more relaxed project that can't start until 3.7 at the
earliest. But since the changes are closely related, and we probably
don't want to end up with async iterators and regular iterators
diverging in the long run, it seems useful to look at them together.


Background and motivation
=========================

Python iterables often hold resources which require cleanup. For
example: ``file`` objects need to be closed; the `WSGI spec
<https://www.python.org/dev/peps/pep-0333/>`_ adds a ``close`` method
on top of the regular iterator protocol and demands that consumers
call it at the appropriate time (though forgetting to do so is a
`frequent source of bugs
<http://blog.dscpl.com.au/2012/10/obligations-for-calling-close-on.html>`_);
and PEP 342 (based on PEP 325) extended generator objects to add a
``close`` method to allow generators to clean up after themselves.

Generally, objects that need to clean up after themselves also define
a ``__del__`` method to ensure that this cleanup will happen
eventually, when the object is garbage collected. However, relying on
the garbage collector for cleanup like this causes serious problems in
at least two cases:

- In Python implementations that do not use reference counting (e.g.
PyPy, Jython), calls to ``__del__`` may be arbitrarily delayed -- yet
many situations require *prompt* cleanup of resources. Delayed cleanup
produces problems like crashes due to file descriptor exhaustion, or
WSGI timing middleware that collects bogus times.

- Async generators (PEP 525) can only perform cleanup under the
supervision of the appropriate coroutine runner. ``__del__`` doesn't
have access to the coroutine runner; indeed, the coroutine runner
might be garbage collected before the generator object. So relying on
the garbage collector is effectively impossible without some kind of
language extension. (PEP 525 does provide such an extension, but it
has a number of limitations that this proposal fixes; see the
"alternatives" section below for discussion.)

Fortunately, Python provides a standard tool for doing resource
cleanup in a more structured way: ``with`` blocks. For example, this
code opens a file but relies on the garbage collector to close it::

  def read_newline_separated_json(path):
      for line in open(path):
          yield json.loads(line)

  for document in read_newline_separated_json(path):
      ...

and recent versions of CPython will point this out by issuing a
``ResourceWarning``, nudging us to fix it by adding a ``with`` block::

  def read_newline_separated_json(path):
      with open(path) as file_handle:      # <-- with block
          for line in file_handle:
              yield json.loads(line)

  for document in read_newline_separated_json(path):  # <-- outer for loop
      ...

But there's a subtlety here, caused by the interaction of ``with``
blocks and generators. ``with`` blocks are Python's main tool for
managing cleanup, and they're a powerful one, because they pin the
lifetime of a resource to the lifetime of a stack frame. But this
assumes that someone will take care of cleaning up the stack frame...
and for generators, this requires that someone ``close`` them.

In this case, adding the ``with`` block *is* enough to shut up the
``ResourceWarning``, but this is misleading -- the file object cleanup
here is still dependent on the garbage collector. The ``with`` block
will only be unwound when the ``read_newline_separated_json``
generator is closed. If the outer ``for`` loop runs to completion then
the cleanup will happen immediately; but if this loop is terminated
early by a ``break`` or an exception, then the ``with`` block won't
fire until the generator object is garbage collected.

The correct solution requires that all *users* of this API wrap every
``for`` loop in its own ``with`` block::

  with closing(read_newline_separated_json(path)) as genobj:
      for document in genobj:
          ...

This gets even worse if we consider the idiom of decomposing a complex
pipeline into multiple nested generators::

  def read_users(path):
      with closing(read_newline_separated_json(path)) as gen:
          for document in gen:
              yield User.from_json(document)

  def users_in_group(path, group):
      with closing(read_users(path)) as gen:
          for user in gen:
              if user.group == group:
                  yield user

In general if you have N nested generators then you need N+1 ``with``
blocks to clean up 1 file. And good defensive programming would
suggest that any time we use a generator, we should assume the
possibility that there could be at least one ``with`` block somewhere
in its (potentially transitive) call stack, either now or in the
future, and thus always wrap it in a ``with``. But in practice,
basically nobody does this, because programmers would rather write
buggy code than tiresome repetitive code. In simple cases like this
there are some workarounds that good Python developers know (e.g. in
this simple case it would be idiomatic to pass in a file handle
instead of a path and move the resource management to the top level),
but in general we cannot avoid the use of ``with``/``finally`` inside
of generators, and thus dealing with this problem one way or another.
When beauty and correctness fight then beauty tends to win, so it's
important to make correct code beautiful.

Still, is this worth fixing? Until async generators came along I would
have argued yes, but that it was a low priority, since everyone seems
to be muddling along okay -- but async generators make it much more
urgent. Async generators cannot do cleanup *at all* without some
mechanism for deterministic cleanup that people will actually use, and
async generators are particularly likely to hold resources like file
descriptors. (After all, if they weren't doing I/O, they'd be
generators, not async generators.) So we have to do something, and it
might as well be a comprehensive fix to the underlying problem. And
it's much easier to fix this now when async generators are first
rolling out, then it will be to fix it later.

The proposal itself is simple in concept: add a ``__(a)iterclose__``
method to the iterator protocol, and have (async) ``for`` loops call
it when the loop is exited, even if this occurs via ``break`` or
exception unwinding. Effectively, we're taking the current cumbersome
idiom (``with`` block + ``for`` loop) and merging them together into a
fancier ``for``. This may seem non-orthogonal, but makes sense when
you consider that the existence of generators means that ``with``
blocks actually depend on iterator cleanup to work reliably, plus
experience showing that iterator cleanup is often a desireable feature
in its own right.


Alternatives
============

PEP 525 asyncgen hooks
----------------------

PEP 525 proposes a `set of global thread-local hooks managed by new
``sys.{get/set}_asyncgen_hooks()`` functions
<https://www.python.org/dev/peps/pep-0525/#finalization>`_, which
allow event loops to integrate with the garbage collector to run
cleanup for async generators. In principle, this proposal and PEP 525
are complementary, in the same way that ``with`` blocks and
``__del__`` are complementary: this proposal takes care of ensuring
deterministic cleanup in most cases, while PEP 525's GC hooks clean up
anything that gets missed. But ``__aiterclose__`` provides a number of
advantages over GC hooks alone:

- The GC hook semantics aren't part of the abstract async iterator
protocol, but are instead restricted `specifically to the async
generator concrete type <XX find and link Yury's email saying this>`_.
If you have an async iterator implemented using a class, like::

    class MyAsyncIterator:
        async def __anext__():
            ...

  then you can't refactor this into an async generator without
changing its semantics, and vice-versa. This seems very unpythonic.
(It also leaves open the question of what exactly class-based async
iterators are supposed to do, given that they face exactly the same
cleanup problems as async generators.) ``__aiterclose__``, on the
other hand, is defined at the protocol level, so it's duck-type
friendly and works for all iterators, not just generators.

- Code that wants to work on non-CPython implementations like PyPy
cannot in general rely on GC for cleanup. Without ``__aiterclose__``,
it's more or less guaranteed that developers who develop and test on
CPython will produce libraries that leak resources when used on PyPy.
Developers who do want to target alternative implementations will
either have to take the defensive approach of wrapping every ``for``
loop in a ``with`` block, or else carefully audit their code to figure
out which generators might possibly contain cleanup code and add
``with`` blocks around those only. With ``__aiterclose__``, writing
portable code becomes easy and natural.

- An important part of building robust software is making sure that
exceptions always propagate correctly without being lost. One of the
most exciting things about async/await compared to traditional
callback-based systems is that instead of requiring manual chaining,
the runtime can now do the heavy lifting of propagating errors, making
it *much* easier to write robust code. But, this beautiful new picture
has one major gap: if we rely on the GC for generator cleanup, then
exceptions raised during cleanup are lost. So, again, with
``__aiterclose__``, developers who care about this kind of robustness
will either have to take the defensive approach of wrapping every
``for`` loop in a ``with`` block, or else carefully audit their code
to figure out which generators might possibly contain cleanup code.
``__aiterclose__`` plugs this hole by performing cleanup in the
caller's context, so writing more robust code becomes the path of
least resistance.

- The WSGI experience suggests that there exist important
iterator-based APIs that need prompt cleanup and cannot rely on the
GC, even in CPython. For example, consider a hypothetical WSGI-like
API based around async/await and async iterators, where a response
handler is an async generator that takes request headers + an async
iterator over the request body, and yields response headers + the
response body. (This is actually the use case that got me interested
in async generators in the first place, i.e. this isn't hypothetical.)
If we follow WSGI in requiring that child iterators must be closed
properly, then without ``__aiterclose__`` the absolute most
minimalistic middleware in our system looks something like::

    async def noop_middleware(handler, request_header, request_body):
        async with aclosing(handler(request_body, request_body)) as aiter:
            async for response_item in aiter:
                yield response_item

  Arguably in regular code one can get away with skipping the ``with``
block around ``for`` loops, depending on how confident one is that one
understands the internal implementation of the generator. But here we
have to cope with arbitrary response handlers, so without
``__aiterclose__``, this ``with`` construction is a mandatory part of
every middleware.

  ``__aiterclose__`` allows us to eliminate the mandatory boilerplate
and an extra level of indentation from every middleware::

    async def noop_middleware(handler, request_header, request_body):
        async for response_item in handler(request_header, request_body):
            yield response_item

So the ``__aiterclose__`` approach provides substantial advantages
over GC hooks.

This leaves open the question of whether we want a combination of GC
hooks + ``__aiterclose__``, or just ``__aiterclose__`` alone. Since
the vast majority of generators are iterated over using a ``for`` loop
or equivalent, ``__aiterclose__`` handles most situations before the
GC has a chance to get involved. The case where GC hooks provide
additional value is in code that does manual iteration, e.g.::

    agen = fetch_newline_separated_json_from_url(...)
    while True:
        document = await type(agen).__anext__(agen)
        if document["id"] == needle:
            break
    # doesn't do 'await agen.aclose()'

If we go with the GC-hooks + ``__aiterclose__`` approach, this
generator will eventually be cleaned up by GC calling the generator
``__del__`` method, which then will use the hooks to call back into
the event loop to run the cleanup code.

If we go with the no-GC-hooks approach, this generator will eventually
be garbage collected, with the following effects:

- its ``__del__`` method will issue a warning that the generator was
not closed (similar to the existing "coroutine never awaited"
warning).

- The underlying resources involved will still be cleaned up, because
the generator frame will still be garbage collected, causing it to
drop references to any file handles or sockets it holds, and then
those objects's ``__del__`` methods will release the actual operating
system resources.

- But, any cleanup code inside the generator itself (e.g. logging,
buffer flushing) will not get a chance to run.

The solution here -- as the warning would indicate -- is to fix the
code so that it calls ``__aiterclose__``, e.g. by using a ``with``
block::

    async with aclosing(fetch_newline_separated_json_from_url(...)) as agen:
        while True:
            document = await type(agen).__anext__(agen)
            if document["id"] == needle:
                break

Basically in this approach, the rule would be that if you want to
manually implement the iterator protocol, then it's your
responsibility to implement all of it, and that now includes
``__(a)iterclose__``.

GC hooks add non-trivial complexity in the form of (a) new global
interpreter state, (b) a somewhat complicated control flow (e.g.,
async generator GC always involves resurrection, so the details of PEP
442 are important), and (c) a new public API in asyncio (``await
loop.shutdown_asyncgens()``) that users have to remember to call at
the appropriate time. (This last point in particular somewhat
undermines the argument that GC hooks provide a safe backup to
guarantee cleanup, since if ``shutdown_asyncgens()`` isn't called
correctly then I *think* it's possible for generators to be silently
discarded without their cleanup code being called; compare this to the
``__aiterclose__``-only approach where in the worst case we still at
least get a warning printed. This might be fixable.) All this
considered, GC hooks arguably aren't worth it, given that the only
people they help are those who want to manually call ``__anext__`` yet
don't want to manually call ``__aiterclose__``. But Yury disagrees
with me on this :-). And both options are viable.


Always inject resources, and do all cleanup at the top level
------------------------------------------------------------

It was suggested on python-dev (XX find link) that a pattern to avoid
these problems is to always pass resources in from above, e.g.
``read_newline_separated_json`` should take a file object rather than
a path, with cleanup handled at the top level::

  def read_newline_separated_json(file_handle):
      for line in file_handle:
          yield json.loads(line)

  def read_users(file_handle):
      for document in read_newline_separated_json(file_handle):
          yield User.from_json(document)

  with open(path) as file_handle:
      for user in read_users(file_handle):
          ...

This works well in simple cases; here it lets us avoid the "N+1
``with`` blocks problem". But unfortunately, it breaks down quickly
when things get more complex. Consider if instead of reading from a
file, our generator was reading from a streaming HTTP GET request --
while handling redirects and authentication via OAUTH. Then we'd
really want the sockets to be managed down inside our HTTP client
library, not at the top level. Plus there are other cases where
``finally`` blocks embedded inside generators are important in their
own right: db transaction management, emitting logging information
during cleanup (one of the major motivating use cases for WSGI
``close``), and so forth. So this is really a workaround for simple
cases, not a general solution.


More complex variants of __(a)iterclose__
-----------------------------------------

The semantics of ``__(a)iterclose__`` are somewhat inspired by
``with`` blocks, but context managers are more powerful:
``__(a)exit__`` can distinguish between a normal exit versus exception
unwinding, and in the case of an exception it can examine the
exception details and optionally suppress propagation.
``__(a)iterclose__`` as proposed here does not have these powers, but
one can imagine an alternative design where it did.

However, this seems like unwarranted complexity: experience suggests
that it's common for iterables to have ``close`` methods, and even to
have ``__exit__`` methods that call ``self.close()``, but I'm not
aware of any common cases that make use of ``__exit__``'s full power.
I also can't think of any examples where this would be useful. And it
seems unnecessarily confusing to allow iterators to affect flow
control by swallowing exceptions -- if you're in a situation where you
really want that, then you should probably use a real ``with`` block
anyway.


Specification
=============

This section describes where we want to eventually end up, though
there are some backwards compatibility issues that mean we can't jump
directly here. A later section describes the transition plan.


Guiding principles
------------------

Generally, ``__(a)iterclose__`` implementations should:

- be idempotent,
- perform any cleanup that is appropriate on the assumption that the
iterator will not be used again after ``__(a)iterclose__`` is called.
In particular, once ``__(a)iterclose__`` has been called then calling
``__(a)next__`` produces undefined behavior.

And generally, any code which starts iterating through an iterable
with the intention of exhausting it, should arrange to make sure that
``__(a)iterclose__`` is eventually called, whether or not the iterator
is actually exhausted.


Changes to iteration
--------------------

The core proposal is the change in behavior of ``for`` loops. Given
this Python code::

  for VAR in ITERABLE:
      LOOP-BODY
  else:
      ELSE-BODY

we desugar to the equivalent of::

  _iter = iter(ITERABLE)
  _iterclose = getattr(type(_iter), "__iterclose__", lambda: None)
  try:
      traditional-for VAR in _iter:
          LOOP-BODY
      else:
          ELSE-BODY
  finally:
      _iterclose(_iter)

where the "traditional-for statement" here is meant as a shorthand for
the classic 3.5-and-earlier ``for`` loop semantics.

Besides the top-level ``for`` statement, Python also contains several
other places where iterators are consumed. For consistency, these
should call ``__iterclose__`` as well using semantics equivalent to
the above. This includes:

- ``for`` loops inside comprehensions
- ``*`` unpacking
- functions which accept and fully consume iterables, like
``list(it)``, ``tuple(it)``, ``itertools.product(it1, it2, ...)``, and
others.


Changes to async iteration
--------------------------

We also make the analogous changes to async iteration constructs,
except that the new slot is called ``__aiterclose__``, and it's an
async method that gets ``await``\ed.


Modifications to basic iterator types
-------------------------------------

Generator objects (including those created by generator comprehensions):
- ``__iterclose__`` calls ``self.close()``
- ``__del__`` calls ``self.close()`` (same as now), and additionally
issues a ``ResourceWarning`` if the generator wasn't exhausted. This
warning is hidden by default, but can be enabled for those who want to
make sure they aren't inadverdantly relying on CPython-specific GC
semantics.

Async generator objects (including those created by async generator
comprehensions):
- ``__aiterclose__`` calls ``self.aclose()``
- ``__del__`` issues a ``RuntimeWarning`` if ``aclose`` has not been
called, since this probably indicates a latent bug, similar to the
"coroutine never awaited" warning.

QUESTION: should file objects implement ``__iterclose__`` to close the
file? On the one hand this would make this change more disruptive; on
the other hand people really like writing ``for line in open(...):
...``, and if we get used to iterators taking care of their own
cleanup then it might become very weird if files don't.


New convenience functions
-------------------------

The ``itertools`` module gains a new iterator wrapper that can be used
to selectively disable the new ``__iterclose__`` behavior::

  # QUESTION: I feel like there might be a better name for this one?
  class preserve(iterable):
      def __init__(self, iterable):
          self._it = iter(iterable)

      def __iter__(self):
          return self

      def __next__(self):
          return next(self._it)

      def __iterclose__(self):
          # Swallow __iterclose__ without passing it on
          pass

Example usage (assuming that file objects implements ``__iterclose__``)::

  with open(...) as handle:
      # Iterate through the same file twice:
      for line in itertools.preserve(handle):
          ...
      handle.seek(0)
      for line in itertools.preserve(handle):
          ...

The ``operator`` module gains two new functions, with semantics
equivalent to the following::

  def iterclose(it):
      if hasattr(type(it), "__iterclose__"):
          type(it).__iterclose__(it)

  async def aiterclose(ait):
      if hasattr(type(ait), "__aiterclose__"):
          await type(ait).__aiterclose__(ait)

These are particularly useful when implementing the changes in the next section:


__iterclose__ implementations for iterator wrappers
---------------------------------------------------

Python ships a number of iterator types that act as wrappers around
other iterators: ``map``, ``zip``, ``itertools.accumulate``,
``csv.reader``, and others. These iterators should define a
``__iterclose__`` method which calls ``__iterclose__`` in turn on
their underlying iterators. For example, ``map`` could be implemented
as::

  class map:
      def __init__(self, fn, *iterables):
          self._fn = fn
          self._iters = [iter(iterable) for iterable in iterables]

      def __iter__(self):
          return self

      def __next__(self):
          return self._fn(*[next(it) for it in self._iters])

      def __iterclose__(self):
          for it in self._iters:
              operator.iterclose(it)

In some cases this requires some subtlety; for example,
```itertools.tee``
<https://docs.python.org/3/library/itertools.html#itertools.tee>`_
should not call ``__iterclose__`` on the underlying iterator until it
has been called on *all* of the clone iterators.


Example / Rationale
-------------------

The payoff for all this is that we can now write straightforward code like::

  def read_newline_separated_json(path):
      for line in open(path):
          yield json.loads(line)

and be confident that the file will receive deterministic cleanup
*without the end-user having to take any special effort*, even in
complex cases. For example, consider this silly pipeline::

  list(map(lambda key: key.upper(),
           doc["key"] for doc in read_newline_separated_json(path)))

If our file contains a document where ``doc["key"]`` turns out to be
an integer, then the following sequence of events will happen:

1. ``key.upper()`` raises an ``AttributeError``, which propagates out
of the ``map`` and triggers the implicit ``finally`` block inside
``list``.
2. The ``finally`` block in ``list`` calls ``__iterclose__()`` on the
map object.
3. ``map.__iterclose__()`` calls ``__iterclose__()`` on the generator
comprehension object.
4. This injects a ``GeneratorExit`` exception into the generator
comprehension body, which is currently suspended inside the
comprehension's ``for`` loop body.
5. The exception propagates out of the ``for`` loop, triggering the
``for`` loop's implicit ``finally`` block, which calls
``__iterclose__`` on the generator object representing the call to
``read_newline_separated_json``.
6. This injects an inner ``GeneratorExit`` exception into the body of
``read_newline_separated_json``, currently suspended at the ``yield``.
7. The inner ``GeneratorExit`` propagates out of the ``for`` loop,
triggering the ``for`` loop's implicit ``finally`` block, which calls
``__iterclose__()`` on the file object.
8. The file object is closed.
9. The inner ``GeneratorExit`` resumes propagating, hits the boundary
of the generator function, and causes
``read_newline_separated_json``'s ``__iterclose__()`` method to return
successfully.
10. Control returns to the generator comprehension body, and the outer
``GeneratorExit`` continues propagating, allowing the comprehension's
``__iterclose__()`` to return successfully.
11. The rest of the ``__iterclose__()`` calls unwind without incident,
back into the body of ``list``.
12. The original ``AttributeError`` resumes propagating.

(The details above assume that we implement ``file.__iterclose__``; if
not then add a ``with`` block to ``read_newline_separated_json`` and
essentially the same logic goes through.)

Of course, from the user's point of view, this can be simplified down to just:

1. ``int.upper()`` raises an ``AttributeError``
1. The file object is closed.
2. The ``AttributeError`` propagates out of ``list``

So we've accomplished our goal of making this "just work" without the
user having to think about it.


Transition plan
===============

While the majority of existing ``for`` loops will continue to produce
identical results, the proposed changes will produce
backwards-incompatible behavior in some cases. Example::

  def read_csv_with_header(lines_iterable):
      lines_iterator = iter(lines_iterable)
      for line in lines_iterator:
          column_names = line.strip().split("\t")
          break
      for line in lines_iterator:
          values = line.strip().split("\t")
          record = dict(zip(column_names, values))
          yield record

This code used to be correct, but after this proposal is implemented
will require an ``itertools.preserve`` call added to the first ``for``
loop.

[QUESTION: currently, if you close a generator and then try to iterate
over it then it just raises ``Stop(Async)Iteration``, so code the
passes the same generator object to multiple ``for`` loops but forgets
to use ``itertools.preserve`` won't see an obvious error -- the second
``for`` loop will just exit immediately. Perhaps it would be better if
iterating a closed generator raised a ``RuntimeError``? Note that
files don't have this problem -- attempting to iterate a closed file
object already raises ``ValueError``.]

Specifically, the incompatibility happens when all of these factors
come together:

- The automatic calling of ``__(a)iterclose__`` is enabled
- The iterable did not previously define ``__(a)iterclose__``
- The iterable does now define ``__(a)iterclose__``
- The iterable is re-used after the ``for`` loop exits

So the problem is how to manage this transition, and those are the
levers we have to work with.

First, observe that the only async iterables where we propose to add
``__aiterclose__`` are async generators, and there is currently no
existing code using async generators (though this will start changing
very soon), so the async changes do not produce any backwards
incompatibilities. (There is existing code using async iterators, but
using the new async for loop on an old async iterator is harmless,
because old async iterators don't have ``__aiterclose__``.) In
addition, PEP 525 was accepted on a provisional basis, and async
generators are by far the biggest beneficiary of this PEP's proposed
changes. Therefore, I think we should strongly consider enabling
``__aiterclose__`` for ``async for`` loops and async generators ASAP,
ideally for 3.6.0 or 3.6.1.

For the non-async world, things are harder, but here's a potential
transition path:

In 3.7:

Our goal is that existing unsafe code will start emitting warnings,
while those who want to opt-in to the future can do that immediately:

- We immediately add all the ``__iterclose__`` methods described above.
- If ``from __future__ import iterclose`` is in effect, then ``for``
loops and ``*`` unpacking call ``__iterclose__`` as specified above.
- If the future is *not* enabled, then ``for`` loops and ``*``
unpacking do *not* call ``__iterclose__``. But they do call some other
method instead, e.g. ``__iterclose_warning__``.
- Similarly, functions like ``list`` use stack introspection (!!) to
check whether their direct caller has ``__future__.iterclose``
enabled, and use this to decide whether to call ``__iterclose__`` or
``__iterclose_warning__``.
- For all the wrapper iterators, we also add ``__iterclose_warning__``
methods that forward to the ``__iterclose_warning__`` method of the
underlying iterator or iterators.
- For generators (and files, if we decide to do that),
``__iterclose_warning__`` is defined to set an internal flag, and
other methods on the object are modified to check for this flag. If
they find the flag set, they issue a ``PendingDeprecationWarning`` to
inform the user that in the future this sequence would have led to a
use-after-close situation and the user should use ``preserve()``.

In 3.8:

- Switch from ``PendingDeprecationWarning`` to ``DeprecationWarning``

In 3.9:

- Enable the ``__future__`` unconditionally and remove all the
``__iterclose_warning__`` stuff.

I believe that this satisfies the normal requirements for this kind of
transition -- opt-in initially, with warnings targeted precisely to
the cases that will be effected, and a long deprecation cycle.

Probably the most controversial / risky part of this is the use of
stack introspection to make the iterable-consuming functions sensitive
to a ``__future__`` setting, though I haven't thought of any situation
where it would actually go wrong yet...


Acknowledgements
================

Thanks to Yury Selivanov, Armin Rigo, and Carl Friedrich Bolz for
helpful discussion on earlier versions of this idea.

--
Nathaniel J. Smith -- https://vorpus.org
_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/