Deterministic iterator cleanup

Hi all, I'd like to propose that Python's iterator protocol be enhanced to add a first-class notion of completion / cleanup. This is mostly motivated by thinking about the issues around async generators and cleanup. Unfortunately even though PEP 525 was accepted I found myself unable to stop pondering this, and the more I've pondered the more convinced I've become that the GC hooks added in PEP 525 are really not enough, and that we'll regret it if we stick with them, or at least with them alone :-/. The strategy here is pretty different -- it's an attempt to dig down and make a fundamental improvement to the language that fixes a number of long-standing rough spots, including async generators. The basic concept is relatively simple: just adding a '__iterclose__' method that 'for' loops call upon completion, even if that's via break or exception. But, the overall issue is fairly complicated + iterators have a large surface area across the language, so the text below is pretty long. Mostly I wrote it all out to convince myself that there wasn't some weird showstopper lurking somewhere :-). For a first pass discussion, it probably makes sense to mainly focus on whether the basic concept makes sense? The main rationale is at the top, but the details are there too for those who want them. Also, for *right* now I'm hoping -- probably unreasonably -- to try to get the async iterator parts of the proposal in ASAP, ideally for 3.6.0 or 3.6.1. (I know this is about the worst timing for a proposal like this, which I apologize for -- though async generators are provisional in 3.6, so at least in theory changing them is not out of the question.) So again, it might make sense to focus especially on the async parts, which are a pretty small and self-contained part, and treat the rest of the proposal as a longer-term plan provided for context. The comparison to PEP 525 GC hooks comes right after the initial rationale. Anyway, I'll be interested to hear what you think! -n ------------------ Abstract ======== We propose to extend the iterator protocol with a new ``__(a)iterclose__`` slot, which is called automatically on exit from ``(async) for`` loops, regardless of how they exit. This allows for convenient, deterministic cleanup of resources held by iterators without reliance on the garbage collector. This is especially valuable for asynchronous generators. Note on timing ============== In practical terms, the proposal here is divided into two separate parts: the handling of async iterators, which should ideally be implemented ASAP, and the handling of regular iterators, which is a larger but more relaxed project that can't start until 3.7 at the earliest. But since the changes are closely related, and we probably don't want to end up with async iterators and regular iterators diverging in the long run, it seems useful to look at them together. Background and motivation ========================= Python iterables often hold resources which require cleanup. For example: ``file`` objects need to be closed; the `WSGI spec <https://www.python.org/dev/peps/pep-0333/>`_ adds a ``close`` method on top of the regular iterator protocol and demands that consumers call it at the appropriate time (though forgetting to do so is a `frequent source of bugs <http://blog.dscpl.com.au/2012/10/obligations-for-calling-close-on.html>`_); and PEP 342 (based on PEP 325) extended generator objects to add a ``close`` method to allow generators to clean up after themselves. Generally, objects that need to clean up after themselves also define a ``__del__`` method to ensure that this cleanup will happen eventually, when the object is garbage collected. However, relying on the garbage collector for cleanup like this causes serious problems in at least two cases: - In Python implementations that do not use reference counting (e.g. PyPy, Jython), calls to ``__del__`` may be arbitrarily delayed -- yet many situations require *prompt* cleanup of resources. Delayed cleanup produces problems like crashes due to file descriptor exhaustion, or WSGI timing middleware that collects bogus times. - Async generators (PEP 525) can only perform cleanup under the supervision of the appropriate coroutine runner. ``__del__`` doesn't have access to the coroutine runner; indeed, the coroutine runner might be garbage collected before the generator object. So relying on the garbage collector is effectively impossible without some kind of language extension. (PEP 525 does provide such an extension, but it has a number of limitations that this proposal fixes; see the "alternatives" section below for discussion.) Fortunately, Python provides a standard tool for doing resource cleanup in a more structured way: ``with`` blocks. For example, this code opens a file but relies on the garbage collector to close it:: def read_newline_separated_json(path): for line in open(path): yield json.loads(line) for document in read_newline_separated_json(path): ... and recent versions of CPython will point this out by issuing a ``ResourceWarning``, nudging us to fix it by adding a ``with`` block:: def read_newline_separated_json(path): with open(path) as file_handle: # <-- with block for line in file_handle: yield json.loads(line) for document in read_newline_separated_json(path): # <-- outer for loop ... But there's a subtlety here, caused by the interaction of ``with`` blocks and generators. ``with`` blocks are Python's main tool for managing cleanup, and they're a powerful one, because they pin the lifetime of a resource to the lifetime of a stack frame. But this assumes that someone will take care of cleaning up the stack frame... and for generators, this requires that someone ``close`` them. In this case, adding the ``with`` block *is* enough to shut up the ``ResourceWarning``, but this is misleading -- the file object cleanup here is still dependent on the garbage collector. The ``with`` block will only be unwound when the ``read_newline_separated_json`` generator is closed. If the outer ``for`` loop runs to completion then the cleanup will happen immediately; but if this loop is terminated early by a ``break`` or an exception, then the ``with`` block won't fire until the generator object is garbage collected. The correct solution requires that all *users* of this API wrap every ``for`` loop in its own ``with`` block:: with closing(read_newline_separated_json(path)) as genobj: for document in genobj: ... This gets even worse if we consider the idiom of decomposing a complex pipeline into multiple nested generators:: def read_users(path): with closing(read_newline_separated_json(path)) as gen: for document in gen: yield User.from_json(document) def users_in_group(path, group): with closing(read_users(path)) as gen: for user in gen: if user.group == group: yield user In general if you have N nested generators then you need N+1 ``with`` blocks to clean up 1 file. And good defensive programming would suggest that any time we use a generator, we should assume the possibility that there could be at least one ``with`` block somewhere in its (potentially transitive) call stack, either now or in the future, and thus always wrap it in a ``with``. But in practice, basically nobody does this, because programmers would rather write buggy code than tiresome repetitive code. In simple cases like this there are some workarounds that good Python developers know (e.g. in this simple case it would be idiomatic to pass in a file handle instead of a path and move the resource management to the top level), but in general we cannot avoid the use of ``with``/``finally`` inside of generators, and thus dealing with this problem one way or another. When beauty and correctness fight then beauty tends to win, so it's important to make correct code beautiful. Still, is this worth fixing? Until async generators came along I would have argued yes, but that it was a low priority, since everyone seems to be muddling along okay -- but async generators make it much more urgent. Async generators cannot do cleanup *at all* without some mechanism for deterministic cleanup that people will actually use, and async generators are particularly likely to hold resources like file descriptors. (After all, if they weren't doing I/O, they'd be generators, not async generators.) So we have to do something, and it might as well be a comprehensive fix to the underlying problem. And it's much easier to fix this now when async generators are first rolling out, then it will be to fix it later. The proposal itself is simple in concept: add a ``__(a)iterclose__`` method to the iterator protocol, and have (async) ``for`` loops call it when the loop is exited, even if this occurs via ``break`` or exception unwinding. Effectively, we're taking the current cumbersome idiom (``with`` block + ``for`` loop) and merging them together into a fancier ``for``. This may seem non-orthogonal, but makes sense when you consider that the existence of generators means that ``with`` blocks actually depend on iterator cleanup to work reliably, plus experience showing that iterator cleanup is often a desireable feature in its own right. Alternatives ============ PEP 525 asyncgen hooks ---------------------- PEP 525 proposes a `set of global thread-local hooks managed by new ``sys.{get/set}_asyncgen_hooks()`` functions <https://www.python.org/dev/peps/pep-0525/#finalization>`_, which allow event loops to integrate with the garbage collector to run cleanup for async generators. In principle, this proposal and PEP 525 are complementary, in the same way that ``with`` blocks and ``__del__`` are complementary: this proposal takes care of ensuring deterministic cleanup in most cases, while PEP 525's GC hooks clean up anything that gets missed. But ``__aiterclose__`` provides a number of advantages over GC hooks alone: - The GC hook semantics aren't part of the abstract async iterator protocol, but are instead restricted `specifically to the async generator concrete type <XX find and link Yury's email saying this>`_. If you have an async iterator implemented using a class, like:: class MyAsyncIterator: async def __anext__(): ... then you can't refactor this into an async generator without changing its semantics, and vice-versa. This seems very unpythonic. (It also leaves open the question of what exactly class-based async iterators are supposed to do, given that they face exactly the same cleanup problems as async generators.) ``__aiterclose__``, on the other hand, is defined at the protocol level, so it's duck-type friendly and works for all iterators, not just generators. - Code that wants to work on non-CPython implementations like PyPy cannot in general rely on GC for cleanup. Without ``__aiterclose__``, it's more or less guaranteed that developers who develop and test on CPython will produce libraries that leak resources when used on PyPy. Developers who do want to target alternative implementations will either have to take the defensive approach of wrapping every ``for`` loop in a ``with`` block, or else carefully audit their code to figure out which generators might possibly contain cleanup code and add ``with`` blocks around those only. With ``__aiterclose__``, writing portable code becomes easy and natural. - An important part of building robust software is making sure that exceptions always propagate correctly without being lost. One of the most exciting things about async/await compared to traditional callback-based systems is that instead of requiring manual chaining, the runtime can now do the heavy lifting of propagating errors, making it *much* easier to write robust code. But, this beautiful new picture has one major gap: if we rely on the GC for generator cleanup, then exceptions raised during cleanup are lost. So, again, with ``__aiterclose__``, developers who care about this kind of robustness will either have to take the defensive approach of wrapping every ``for`` loop in a ``with`` block, or else carefully audit their code to figure out which generators might possibly contain cleanup code. ``__aiterclose__`` plugs this hole by performing cleanup in the caller's context, so writing more robust code becomes the path of least resistance. - The WSGI experience suggests that there exist important iterator-based APIs that need prompt cleanup and cannot rely on the GC, even in CPython. For example, consider a hypothetical WSGI-like API based around async/await and async iterators, where a response handler is an async generator that takes request headers + an async iterator over the request body, and yields response headers + the response body. (This is actually the use case that got me interested in async generators in the first place, i.e. this isn't hypothetical.) If we follow WSGI in requiring that child iterators must be closed properly, then without ``__aiterclose__`` the absolute most minimalistic middleware in our system looks something like:: async def noop_middleware(handler, request_header, request_body): async with aclosing(handler(request_body, request_body)) as aiter: async for response_item in aiter: yield response_item Arguably in regular code one can get away with skipping the ``with`` block around ``for`` loops, depending on how confident one is that one understands the internal implementation of the generator. But here we have to cope with arbitrary response handlers, so without ``__aiterclose__``, this ``with`` construction is a mandatory part of every middleware. ``__aiterclose__`` allows us to eliminate the mandatory boilerplate and an extra level of indentation from every middleware:: async def noop_middleware(handler, request_header, request_body): async for response_item in handler(request_header, request_body): yield response_item So the ``__aiterclose__`` approach provides substantial advantages over GC hooks. This leaves open the question of whether we want a combination of GC hooks + ``__aiterclose__``, or just ``__aiterclose__`` alone. Since the vast majority of generators are iterated over using a ``for`` loop or equivalent, ``__aiterclose__`` handles most situations before the GC has a chance to get involved. The case where GC hooks provide additional value is in code that does manual iteration, e.g.:: agen = fetch_newline_separated_json_from_url(...) while True: document = await type(agen).__anext__(agen) if document["id"] == needle: break # doesn't do 'await agen.aclose()' If we go with the GC-hooks + ``__aiterclose__`` approach, this generator will eventually be cleaned up by GC calling the generator ``__del__`` method, which then will use the hooks to call back into the event loop to run the cleanup code. If we go with the no-GC-hooks approach, this generator will eventually be garbage collected, with the following effects: - its ``__del__`` method will issue a warning that the generator was not closed (similar to the existing "coroutine never awaited" warning). - The underlying resources involved will still be cleaned up, because the generator frame will still be garbage collected, causing it to drop references to any file handles or sockets it holds, and then those objects's ``__del__`` methods will release the actual operating system resources. - But, any cleanup code inside the generator itself (e.g. logging, buffer flushing) will not get a chance to run. The solution here -- as the warning would indicate -- is to fix the code so that it calls ``__aiterclose__``, e.g. by using a ``with`` block:: async with aclosing(fetch_newline_separated_json_from_url(...)) as agen: while True: document = await type(agen).__anext__(agen) if document["id"] == needle: break Basically in this approach, the rule would be that if you want to manually implement the iterator protocol, then it's your responsibility to implement all of it, and that now includes ``__(a)iterclose__``. GC hooks add non-trivial complexity in the form of (a) new global interpreter state, (b) a somewhat complicated control flow (e.g., async generator GC always involves resurrection, so the details of PEP 442 are important), and (c) a new public API in asyncio (``await loop.shutdown_asyncgens()``) that users have to remember to call at the appropriate time. (This last point in particular somewhat undermines the argument that GC hooks provide a safe backup to guarantee cleanup, since if ``shutdown_asyncgens()`` isn't called correctly then I *think* it's possible for generators to be silently discarded without their cleanup code being called; compare this to the ``__aiterclose__``-only approach where in the worst case we still at least get a warning printed. This might be fixable.) All this considered, GC hooks arguably aren't worth it, given that the only people they help are those who want to manually call ``__anext__`` yet don't want to manually call ``__aiterclose__``. But Yury disagrees with me on this :-). And both options are viable. Always inject resources, and do all cleanup at the top level ------------------------------------------------------------ It was suggested on python-dev (XX find link) that a pattern to avoid these problems is to always pass resources in from above, e.g. ``read_newline_separated_json`` should take a file object rather than a path, with cleanup handled at the top level:: def read_newline_separated_json(file_handle): for line in file_handle: yield json.loads(line) def read_users(file_handle): for document in read_newline_separated_json(file_handle): yield User.from_json(document) with open(path) as file_handle: for user in read_users(file_handle): ... This works well in simple cases; here it lets us avoid the "N+1 ``with`` blocks problem". But unfortunately, it breaks down quickly when things get more complex. Consider if instead of reading from a file, our generator was reading from a streaming HTTP GET request -- while handling redirects and authentication via OAUTH. Then we'd really want the sockets to be managed down inside our HTTP client library, not at the top level. Plus there are other cases where ``finally`` blocks embedded inside generators are important in their own right: db transaction management, emitting logging information during cleanup (one of the major motivating use cases for WSGI ``close``), and so forth. So this is really a workaround for simple cases, not a general solution. More complex variants of __(a)iterclose__ ----------------------------------------- The semantics of ``__(a)iterclose__`` are somewhat inspired by ``with`` blocks, but context managers are more powerful: ``__(a)exit__`` can distinguish between a normal exit versus exception unwinding, and in the case of an exception it can examine the exception details and optionally suppress propagation. ``__(a)iterclose__`` as proposed here does not have these powers, but one can imagine an alternative design where it did. However, this seems like unwarranted complexity: experience suggests that it's common for iterables to have ``close`` methods, and even to have ``__exit__`` methods that call ``self.close()``, but I'm not aware of any common cases that make use of ``__exit__``'s full power. I also can't think of any examples where this would be useful. And it seems unnecessarily confusing to allow iterators to affect flow control by swallowing exceptions -- if you're in a situation where you really want that, then you should probably use a real ``with`` block anyway. Specification ============= This section describes where we want to eventually end up, though there are some backwards compatibility issues that mean we can't jump directly here. A later section describes the transition plan. Guiding principles ------------------ Generally, ``__(a)iterclose__`` implementations should: - be idempotent, - perform any cleanup that is appropriate on the assumption that the iterator will not be used again after ``__(a)iterclose__`` is called. In particular, once ``__(a)iterclose__`` has been called then calling ``__(a)next__`` produces undefined behavior. And generally, any code which starts iterating through an iterable with the intention of exhausting it, should arrange to make sure that ``__(a)iterclose__`` is eventually called, whether or not the iterator is actually exhausted. Changes to iteration -------------------- The core proposal is the change in behavior of ``for`` loops. Given this Python code:: for VAR in ITERABLE: LOOP-BODY else: ELSE-BODY we desugar to the equivalent of:: _iter = iter(ITERABLE) _iterclose = getattr(type(_iter), "__iterclose__", lambda: None) try: traditional-for VAR in _iter: LOOP-BODY else: ELSE-BODY finally: _iterclose(_iter) where the "traditional-for statement" here is meant as a shorthand for the classic 3.5-and-earlier ``for`` loop semantics. Besides the top-level ``for`` statement, Python also contains several other places where iterators are consumed. For consistency, these should call ``__iterclose__`` as well using semantics equivalent to the above. This includes: - ``for`` loops inside comprehensions - ``*`` unpacking - functions which accept and fully consume iterables, like ``list(it)``, ``tuple(it)``, ``itertools.product(it1, it2, ...)``, and others. Changes to async iteration -------------------------- We also make the analogous changes to async iteration constructs, except that the new slot is called ``__aiterclose__``, and it's an async method that gets ``await``\ed. Modifications to basic iterator types ------------------------------------- Generator objects (including those created by generator comprehensions): - ``__iterclose__`` calls ``self.close()`` - ``__del__`` calls ``self.close()`` (same as now), and additionally issues a ``ResourceWarning`` if the generator wasn't exhausted. This warning is hidden by default, but can be enabled for those who want to make sure they aren't inadverdantly relying on CPython-specific GC semantics. Async generator objects (including those created by async generator comprehensions): - ``__aiterclose__`` calls ``self.aclose()`` - ``__del__`` issues a ``RuntimeWarning`` if ``aclose`` has not been called, since this probably indicates a latent bug, similar to the "coroutine never awaited" warning. QUESTION: should file objects implement ``__iterclose__`` to close the file? On the one hand this would make this change more disruptive; on the other hand people really like writing ``for line in open(...): ...``, and if we get used to iterators taking care of their own cleanup then it might become very weird if files don't. New convenience functions ------------------------- The ``itertools`` module gains a new iterator wrapper that can be used to selectively disable the new ``__iterclose__`` behavior:: # QUESTION: I feel like there might be a better name for this one? class preserve(iterable): def __init__(self, iterable): self._it = iter(iterable) def __iter__(self): return self def __next__(self): return next(self._it) def __iterclose__(self): # Swallow __iterclose__ without passing it on pass Example usage (assuming that file objects implements ``__iterclose__``):: with open(...) as handle: # Iterate through the same file twice: for line in itertools.preserve(handle): ... handle.seek(0) for line in itertools.preserve(handle): ... The ``operator`` module gains two new functions, with semantics equivalent to the following:: def iterclose(it): if hasattr(type(it), "__iterclose__"): type(it).__iterclose__(it) async def aiterclose(ait): if hasattr(type(ait), "__aiterclose__"): await type(ait).__aiterclose__(ait) These are particularly useful when implementing the changes in the next section: __iterclose__ implementations for iterator wrappers --------------------------------------------------- Python ships a number of iterator types that act as wrappers around other iterators: ``map``, ``zip``, ``itertools.accumulate``, ``csv.reader``, and others. These iterators should define a ``__iterclose__`` method which calls ``__iterclose__`` in turn on their underlying iterators. For example, ``map`` could be implemented as:: class map: def __init__(self, fn, *iterables): self._fn = fn self._iters = [iter(iterable) for iterable in iterables] def __iter__(self): return self def __next__(self): return self._fn(*[next(it) for it in self._iters]) def __iterclose__(self): for it in self._iters: operator.iterclose(it) In some cases this requires some subtlety; for example, ```itertools.tee`` <https://docs.python.org/3/library/itertools.html#itertools.tee>`_ should not call ``__iterclose__`` on the underlying iterator until it has been called on *all* of the clone iterators. Example / Rationale ------------------- The payoff for all this is that we can now write straightforward code like:: def read_newline_separated_json(path): for line in open(path): yield json.loads(line) and be confident that the file will receive deterministic cleanup *without the end-user having to take any special effort*, even in complex cases. For example, consider this silly pipeline:: list(map(lambda key: key.upper(), doc["key"] for doc in read_newline_separated_json(path))) If our file contains a document where ``doc["key"]`` turns out to be an integer, then the following sequence of events will happen: 1. ``key.upper()`` raises an ``AttributeError``, which propagates out of the ``map`` and triggers the implicit ``finally`` block inside ``list``. 2. The ``finally`` block in ``list`` calls ``__iterclose__()`` on the map object. 3. ``map.__iterclose__()`` calls ``__iterclose__()`` on the generator comprehension object. 4. This injects a ``GeneratorExit`` exception into the generator comprehension body, which is currently suspended inside the comprehension's ``for`` loop body. 5. The exception propagates out of the ``for`` loop, triggering the ``for`` loop's implicit ``finally`` block, which calls ``__iterclose__`` on the generator object representing the call to ``read_newline_separated_json``. 6. This injects an inner ``GeneratorExit`` exception into the body of ``read_newline_separated_json``, currently suspended at the ``yield``. 7. The inner ``GeneratorExit`` propagates out of the ``for`` loop, triggering the ``for`` loop's implicit ``finally`` block, which calls ``__iterclose__()`` on the file object. 8. The file object is closed. 9. The inner ``GeneratorExit`` resumes propagating, hits the boundary of the generator function, and causes ``read_newline_separated_json``'s ``__iterclose__()`` method to return successfully. 10. Control returns to the generator comprehension body, and the outer ``GeneratorExit`` continues propagating, allowing the comprehension's ``__iterclose__()`` to return successfully. 11. The rest of the ``__iterclose__()`` calls unwind without incident, back into the body of ``list``. 12. The original ``AttributeError`` resumes propagating. (The details above assume that we implement ``file.__iterclose__``; if not then add a ``with`` block to ``read_newline_separated_json`` and essentially the same logic goes through.) Of course, from the user's point of view, this can be simplified down to just: 1. ``int.upper()`` raises an ``AttributeError`` 1. The file object is closed. 2. The ``AttributeError`` propagates out of ``list`` So we've accomplished our goal of making this "just work" without the user having to think about it. Transition plan =============== While the majority of existing ``for`` loops will continue to produce identical results, the proposed changes will produce backwards-incompatible behavior in some cases. Example:: def read_csv_with_header(lines_iterable): lines_iterator = iter(lines_iterable) for line in lines_iterator: column_names = line.strip().split("\t") break for line in lines_iterator: values = line.strip().split("\t") record = dict(zip(column_names, values)) yield record This code used to be correct, but after this proposal is implemented will require an ``itertools.preserve`` call added to the first ``for`` loop. [QUESTION: currently, if you close a generator and then try to iterate over it then it just raises ``Stop(Async)Iteration``, so code the passes the same generator object to multiple ``for`` loops but forgets to use ``itertools.preserve`` won't see an obvious error -- the second ``for`` loop will just exit immediately. Perhaps it would be better if iterating a closed generator raised a ``RuntimeError``? Note that files don't have this problem -- attempting to iterate a closed file object already raises ``ValueError``.] Specifically, the incompatibility happens when all of these factors come together: - The automatic calling of ``__(a)iterclose__`` is enabled - The iterable did not previously define ``__(a)iterclose__`` - The iterable does now define ``__(a)iterclose__`` - The iterable is re-used after the ``for`` loop exits So the problem is how to manage this transition, and those are the levers we have to work with. First, observe that the only async iterables where we propose to add ``__aiterclose__`` are async generators, and there is currently no existing code using async generators (though this will start changing very soon), so the async changes do not produce any backwards incompatibilities. (There is existing code using async iterators, but using the new async for loop on an old async iterator is harmless, because old async iterators don't have ``__aiterclose__``.) In addition, PEP 525 was accepted on a provisional basis, and async generators are by far the biggest beneficiary of this PEP's proposed changes. Therefore, I think we should strongly consider enabling ``__aiterclose__`` for ``async for`` loops and async generators ASAP, ideally for 3.6.0 or 3.6.1. For the non-async world, things are harder, but here's a potential transition path: In 3.7: Our goal is that existing unsafe code will start emitting warnings, while those who want to opt-in to the future can do that immediately: - We immediately add all the ``__iterclose__`` methods described above. - If ``from __future__ import iterclose`` is in effect, then ``for`` loops and ``*`` unpacking call ``__iterclose__`` as specified above. - If the future is *not* enabled, then ``for`` loops and ``*`` unpacking do *not* call ``__iterclose__``. But they do call some other method instead, e.g. ``__iterclose_warning__``. - Similarly, functions like ``list`` use stack introspection (!!) to check whether their direct caller has ``__future__.iterclose`` enabled, and use this to decide whether to call ``__iterclose__`` or ``__iterclose_warning__``. - For all the wrapper iterators, we also add ``__iterclose_warning__`` methods that forward to the ``__iterclose_warning__`` method of the underlying iterator or iterators. - For generators (and files, if we decide to do that), ``__iterclose_warning__`` is defined to set an internal flag, and other methods on the object are modified to check for this flag. If they find the flag set, they issue a ``PendingDeprecationWarning`` to inform the user that in the future this sequence would have led to a use-after-close situation and the user should use ``preserve()``. In 3.8: - Switch from ``PendingDeprecationWarning`` to ``DeprecationWarning`` In 3.9: - Enable the ``__future__`` unconditionally and remove all the ``__iterclose_warning__`` stuff. I believe that this satisfies the normal requirements for this kind of transition -- opt-in initially, with warnings targeted precisely to the cases that will be effected, and a long deprecation cycle. Probably the most controversial / risky part of this is the use of stack introspection to make the iterable-consuming functions sensitive to a ``__future__`` setting, though I haven't thought of any situation where it would actually go wrong yet... Acknowledgements ================ Thanks to Yury Selivanov, Armin Rigo, and Carl Friedrich Bolz for helpful discussion on earlier versions of this idea. -- Nathaniel J. Smith -- https://vorpus.org

This is a very interesting proposal. I just wanted to share something I found in my quick search: http://stackoverflow.com/questions/14797930/python-custom-iterator-close-a-f... Could you explain why the accepted answer there doesn't address this issue? class Parse(object): """A generator that iterates through a file""" def __init__(self, path): self.path = path def __iter__(self): with open(self.path) as f: yield from f Best, Neil On Wednesday, October 19, 2016 at 12:39:34 AM UTC-4, Nathaniel Smith wrote:

On Wed, Oct 19, 2016 at 3:38 AM, Neil Girdhar <mistersheik@gmail.com> wrote:
I think the difference is that this new approach guarantees cleanup the exact moment the loop ends, no matter how it ends. If I understand correctly, your approach will do cleanup when the loop ends only if the iterator is exhausted. But if someone zips it with a shorter iterator, uses itertools.islice or something similar, breaks the loop, returns inside the loop, or in some other way ends the loop before the iterator is exhausted, the cleanup won't happen when the iterator is garbage collected. And for non-reference-counting python implementations, when this happens is completely unpredictable.

On Wed, Oct 19, 2016 at 11:08 AM Todd <toddrjen@gmail.com> wrote:
I don't see that. The "cleanup" will happen when collection is interrupted by an exception. This has nothing to do with garbage collection either since the cleanup happens deterministically when the block is ended. If this is the only example, then I would say this behavior is already provided and does not need to be added.

On Wed, Oct 19, 2016 at 10:08 AM, Neil Girdhar <mistersheik@gmail.com> wrote:
BTW it may make this easier to read if we notice that it's essentially a verbose way of writing: def parse(path): with open(path) as f: yield from f
I think there might be a misunderstanding here. Consider code like this, that breaks out from the middle of the for loop: def use_that_generator(): for line in parse(...): if found_the_line_we_want(line): break # -- mark -- do_something_with_that_line(line) With current Python, what will happen is that when we reach the marked line, then the for loop has finished and will drop its reference to the generator object. At this point, the garbage collector comes into play. On CPython, with its reference counting collector, the garbage collector will immediately collect the generator object, and then the generator object's __del__ method will restart 'parse' by having the last 'yield' raise a GeneratorExit, and *that* exception will trigger the 'with' block's cleanup. But in order to get there, we're absolutely depending on the garbage collector to inject that GeneratorExit. And on an implementation like PyPy that doesn't use reference counting, the generator object will become collect*ible* at the marked line, but might not actually be collect*ed* for an arbitrarily long time afterwards. And until it's collected, the file will remain open. 'with' blocks guarantee that the resources they hold will be cleaned up promptly when the enclosing stack frame gets cleaned up, but for a 'with' block inside a generator then you still need something to guarantee that the enclosing stack frame gets cleaned up promptly! This proposal is about providing that thing -- with __(a)iterclose__, the end of the for loop immediately closes the generator object, so the garbage collector doesn't need to get involved. Essentially the same thing happens if we replace the 'break' with a 'raise'. Though with exceptions, things can actually get even messier, even on CPython. Here's a similar example except that (a) it exits early due to an exception (which then gets caught elsewhere), and (b) the invocation of the generator function ended up being kind of long, so I split the for loop into two lines with a temporary variable: def use_that_generator2(): it = parse("/a/really/really/really/really/really/really/really/long/path") for line in it: if not valid_format(line): raise ValueError() def catch_the_exception(): try: use_that_generator2() except ValueError: # -- mark -- ... Here the ValueError() is raised from use_that_generator2(), and then caught in catch_the_exception(). At the marked line, use_that_generator2's stack frame is still pinned in memory by the exception's traceback. And that means that all the local variables are also pinned in memory, including our temporary 'it'. Which means that parse's stack frame is also pinned in memory, and the file is not closed. With the __(a)iterclose__ proposal, when the exception is thrown then the 'for' loop in use_that_generator2() immediately closes the generator object, which in turn triggers parse's 'with' block, and that closes the file handle. And then after the file handle is closed, the exception continues propagating. So at the marked line, it's still the case that 'it' will be pinned in memory, but now 'it' is a closed generator object that has already relinquished its resources. -n -- Nathaniel J. Smith -- https://vorpus.org

On Wed, Oct 19, 2016 at 2:11 PM Nathaniel Smith <njs@pobox.com> wrote:
Yes, I understand that. Maybe this is clearer. This class adds an iterclose to any iterator so that when iteration ends, iterclose is automatically called: def my_iterclose(): print("Closing!") class AddIterclose: def __init__(self, iterable, iterclose): self.iterable = iterable self.iterclose = iterclose def __iter__(self): try: for x in self.iterable: yield x finally: self.iterclose() try: for x in AddIterclose(range(10), my_iterclose): print(x) if x == 5: raise ValueError except: pass

Ohhh, sorry, you want __iterclose__ to happen when iteration is terminated by a break statement as well? Okay, I understand, and that's fair. However, I would rather that people be explicit about when they're iterating (use the iteration protocol) and when they're managing a resource (use a context manager). Trying to figure out where the context manager should go automatically (which is what it sounds like the proposal amounts to) is too difficult to get right, and when you get it wrong you close too early, and then what's the user supposed to do? Suppress the early close with an even more convoluted notation? If there is a problem with people iterating over things without a generator, my suggestion is to force them to use the generator. For example, don't make your object iterable: make the value yielded by the context manager iterable. Best, Neil (On preview, Re: Chris Angelico's refactoring of my code, nice!!) On Wednesday, October 19, 2016 at 4:14:32 PM UTC-4, Neil Girdhar wrote:

Thanks Nathaniel for this great proposal. As I went through your mail, I realized all the comments I wanted to make were already covered in later paragraphs. And I don't think there's a single point I disagree with. I don't have a strong opinion about the synchronous part of the proposal. I actually wouldn't mind the disparity between asynchronous and synchronous iterators if '__aiterclose__' were to be accepted and '__iterclose__' rejected. However, I would like very much to see the asynchronous part happening in python 3.6. I can add another example for the reference: aioreactive (a fresh implementation of Rx for asyncio) is planning to handle subscriptions to a producer using a context manager: https://github.com/dbrattli/aioreactive#subscriptions-are-async-iterables async with listen(xs) as ys: async for x in ys: do_something(x) Like the proposal points out, this happens in the *user* code. With '__aiterclose__', the former example could be simplified as: async for x in listen(xs): do_something(x) Or even better: async for x in xs: do_something(x) Cheers, /Vincent On 10/19/2016 06:38 AM, Nathaniel Smith wrote:

On 17 October 2016 at 09:08, Nathaniel Smith <njs@pobox.com> wrote:
Hi all,
Hi Nathaniel. I'm just reposting what I wrote on pypy-dev (as requested) but under the assumption that you didn't substantially alter your draft - I apologise if some of the quoted text below has already been edited.
I suggested this and I still think that it is the best idea.
I haven't written the kind of code that you're describing so I can't say exactly how I would do it. I imagine though that helpers could be used to solve some of the problems that you're referring to though. Here's a case I do know where the above suggestion is awkward: def concat(filenames): for filename in filenames: with open(filename) as inputfile: yield from inputfile for line in concat(filenames): ... It's still possible to safely handle this use case by creating a helper though. fileinput.input almost does what you want: with fileinput.input(filenames) as lines: for line in lines: ... Unfortunately if filenames is empty this will default to sys.stdin so it's not perfect but really I think introducing useful helpers for common cases (rather than core language changes) should be considered as the obvious solution here. Generally it would have been better if the discussion for PEP 525 has focussed more on helping people to debug/fix dependence on __del__ rather than trying to magically fix broken code.
It would be much simpler to reverse this suggestion and say let's introduce a helper that selectively *enables* the new behaviour you're proposing i.e.: for line in itertools.closeafter(open(...)): ... if not line.startswith('#'): break # <--------------- file gets closed here Then we can leave (async) for loops as they are and there are no backward compatbility problems etc. -- Oscar

On 19 October 2016 at 12:33, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
Looking more closely at this I realise that there is no way to implement closeafter like this without depending on closeafter.__del__ to do the closing. So actually this is not a solution to the problem at all. Sorry for the noise there! -- Oscar

I'm -1 on the idea. Here's why: 1. Python is a very dynamic language with GC and that is one of its fundamental properties. This proposal might make GC of iterators more deterministic, but that is only one case. For instance, in some places in asyncio source code we have statements like this: "self = None". Why? When an exception occurs and we want to save it (for instance to log it), it holds a reference to the Traceback object. Which in turn references frame objects. Which means that a lot of objects in those frames will be alive while the exception object is alive. So in asyncio we go to great lengths to avoid unnecessary runs of GC, but this is an exception! Most of Python code out there today doesn't do this sorts of tricks. And this is just one example of how you can have cycles that require a run of GC. It is not possible to have deterministic GC in real life Python applications. This proposal addresses only *one* use case, leaving 100s of others unresolved. IMO, while GC-related issues can be annoying to debug sometimes, it's not worth it to change the behaviour of iteration in Python only to slightly improve on this. 2. This proposal will make writing iterators significantly harder. Consider 'itertools.chain'. We will have to rewrite it to add the proposed __iterclose__ method. The Chain iterator object will have to track all of its iterators, call __iterclose__ on them when it's necessary (there are a few corner cases). Given that this object is implemented in C, it's quite a bit of work. And we'll have a lot of objects to fix. We can probably update all iterators in standard library (in 3.7), but what about third-party code? It will take many years until you can say with certainty that most of Python code supports __iterclose__ / __aiterclose__. 3. This proposal changes the behaviour of 'for' and 'async for' statements significantly. To do partial iteration you will have to use a special builtin function to guard the iterator from being closed. This is completely non-obvious to any existing Python user and will be hard to explain to newcomers. 4. This proposal only addresses iteration with 'for' and 'async for' statements. If you iterate using a 'while' loop and 'next()' function, this proposal wouldn't help you. Also see the point #2 about third-party code. 5. Asynchronous generators (AG) introduced by PEP 525 are finalized in a very similar fashion to synchronous generators. There is an API to help Python to call event loop to finalize AGs. asyncio in 3.6 (and other event loops in the near future) already uses this API to ensure that *all AGs in a long-running program are properly finalized* while it is being run. There is an extra loop method (`loop.shutdown_asyncgens`) that should be called right before stopping the loop (exiting the program) to make sure that all AGs are finalized, but if you forget to call it the world won't end. The process will end and the interpreter will shutdown, maybe issuing a couple of ResourceWarnings. No exception will pass silently in the current PEP 525 implementation. And if some AG isn't properly finalized a warning will be issued. The current AG finalization mechanism must stay even if this proposal gets accepted, as it ensures that even manually iterated AGs are properly finalized. 6. If this proposal gets accepted, I think we shouldn't introduce it in any form in 3.6. It's too late to implement it for both sync- and async-generators. Implementing it only for async-generators will only add cognitive overhead. Even implementing this only for async-generators will (and should!) delay 3.6 release significantly. 7. To conclude: I'm not convinced that this proposal fully solves the issue of non-deterministic GC of iterators. It cripples iteration protocols to partially solve the problem for 'for' and 'async for' statements, leaving manual iteration unresolved. It will make it harder to write *correct* (async-) iterators. It introduces some *implicit* context management to 'for' and 'async for' statements -- something that IMO should be done by user with an explicit 'with' or 'async with'. Yury

On 2016-10-19 12:38 PM, Random832 wrote:
I understand, but both topics are closely tied together. Cleanup code can be implemented in some __del__ method of some non-iterator object. This proposal doesn't address such cases, it focuses only on iterators. My point is that it's not worth it to *significantly* change iteration (protocols and statements) in Python to only *partially* address the issue. Yury

On Thu, Oct 20, 2016 at 3:38 AM, Random832 <random832@fastmail.com> wrote:
Currently, iterators get passed around casually - you can build on them, derive from them, etc, etc, etc. If you change the 'for' loop to explicitly close an iterator, will you also change 'yield from'? What about other forms of iteration? Will the iterator be closed when it runs out normally? This proposal is to iterators what 'with' is to open files and other resources. I can build on top of an open file fairly easily: @contextlib.contextmanager def file_with_header(fn): with open(fn, "w") as f: f.write("Header Row") yield f def main(): with file_with_header("asdf") as f: """do stuff""" I create a context manager based on another context manager, and I have a guarantee that the end of the main() 'with' block is going to properly close the file. Now, what happens if I do something similar with an iterator? def every_second(it): try: next(it) except StopIteration: return for value in it: yield value try: next(it) except StopIteration: break This will work, because it's built on a 'for' loop. What if it's built on a 'while' loop instead? def every_second_broken(it): try: while True: nextIit) yield next(it) except StopIteration: pass Now it *won't* correctly call the end-of-iteration function, because there's no 'for' loop. This is going to either (a) require that EVERY consumer of an iterator follow this new protocol, or (b) introduce a ton of edge cases. ChrisA

On 19 October 2016 at 19:13, Chris Angelico <rosuav@gmail.com> wrote:
Also, unless I'm misunderstanding the proposal, there's a fairly major compatibility break. At present we have:
With the proposed behaviour, if I understand it, "it" would be closed after the first loop, so resuming "it" for the second loop wouldn't work. Am I right in that? I know there's a proposed itertools function to bring back the old behaviour, but it's still a compatibility break. And code like this, that partially consumes an iterator, is not uncommon. Paul

On 10/19/2016 11:38 AM, Paul Moore wrote:
Agreed. I like the idea in general, but this particular break feels like a deal-breaker. I'd be okay with not having break close the iterator, and either introducing a 'break_and_close' type of keyword or some other way of signalling that we will not be using the iterator any more so go ahead and close it. Does that invalidate, or take away most of value of, the proposal? -- ~Ethan~

On Wed, Oct 19, 2016 at 11:38 AM, Paul Moore <p.f.moore@gmail.com> wrote:
Right -- did you reach the "transition plan" section? (I know it's wayyy down there.) The proposal is to hide this behind a __future__ at first + a mechanism during the transition period to catch code that depends on the old behavior and issue deprecation warnings. But it is a compatibility break, yes. -n -- Nathaniel J. Smith -- https://vorpus.org

On Wed, Oct 19, 2016 at 12:21 PM, Nathaniel Smith <njs@pobox.com> wrote:
I should also say, regarding your specific example, I guess it's an open question whether we would want list_iterator.__iterclose__ to actually do anything. It could flip the iterator to a state where it always raises StopIteration, or RuntimeError, or it could just be a no-op that allows iteration to continue normally afterwards. list_iterator doesn't have a close method right now, and it certainly can't "close" the underlying list (whatever that would even mean), so I don't think there's a strong expectation that it should do anything in particular. The __iterclose__ contract is that you're not supposed to call __next__ afterwards, so there's no real rule about what happens if you do. And there aren't strong conventions right now about what happens when you try to iterate an explicitly closed iterator -- files raise an error, generators just act like they were exhausted. So there's a few options that all seem more-or-less reasonable and I don't know that it's very important which one we pick. -n -- Nathaniel J. Smith -- https://vorpus.org

On 2016-10-19 3:33 PM, Nathaniel Smith wrote:
Making 'for' loop to behave differently for built-in containers (i.e. make __iterclose__ a no-op for them) will only make this whole thing even more confusing. It has to be consistent: if you partially iterate over *anything* without wrapping it with `preserve()`, it should always close the iterator. Yury

On Wed, Oct 19, 2016 at 1:33 PM, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
You're probably right. My gut is leaning the same way, I'm just hesitant to commit because I haven't thought about it for long. But I do stand by the claim that this is probably not *that* important either way :-). -n -- Nathaniel J. Smith -- https://vorpus.org

You know, I'm actually starting to lean towards this proposal and away from my earlier objections... On Wed, Oct 19, 2016 at 12:33:57PM -0700, Nathaniel Smith wrote:
That seems like the most obvious. [...]
If I recall correctly, in your proposal you use language like "behaviour is undefined". I don't like that language, because it sounds like undefined behaviour in C, which is something to be avoided like the plague. I hope I don't need to explain why, but for those who may not understand the dangers of "undefined behaviour" as per the C standard, you can start here: https://randomascii.wordpress.com/2014/05/19/undefined-behavior-can-format-y... So let's make it clear that what we actually mean is not C-ish undefined behaviour, where the compiler is free to open a portal to the Dungeon Dimensions or use Guido's time machine to erase code that executes before the undefined code: https://blogs.msdn.microsoft.com/oldnewthing/20140627-00/?p=633/ but rather ordinary, standard "implementation-dependent behaviour". If you call next() on a closed iterator, you'll get whatever the iterator happens to do when it is closed. That will be *recommended* to raise whatever error is appropriate to the iterator, but not enforced. That makes it just like the part of the iterator protocol that says that once an iterator raise StopIterator, it should always raise StopIterator. Those that don't are officially called "broken", but they are allowed and you can write one if you want to. Shorter version: - calling next() on a closed iterator is expected to be an error of some sort, often RuntimeError error, but the iterator is free to use a different error if that makes sense (e.g. closed files); - if your own iterator classes break that convention, they will be called "broken", but nobody will stop you from writing such "broken" iterators. -- Steve

On 21 October 2016 at 10:53, Steven D'Aprano <steve@pearwood.info> wrote:
So - does this mean "unless you understand what preserve() does, you're OK to not use it and your code will continue to work as before"? If so, then I'd be happy with this. But I genuinely don't know (without going rummaging through docs) what that statement means in any practical sense. Paul

On Fri, Oct 21, 2016 at 11:07:46AM +0100, Paul Moore wrote:
I've changed my mind -- I think maybe it should do nothing, and preserve the current behaviour of lists. I'm now more concerned with keeping current behaviour as much as possible than creating some sort of consistent error condition for all iterators. Consistency is over-rated, and we already have inconsistency here: file iterators behave differently from list iterators, because they can be closed: py> f = open('/proc/mdstat', 'r') py> a = list(f) py> b = list(f) py> len(a), len(b) (20, 0) py> f.close() py> c = list(f) Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: I/O operation on closed file. We don't need to add a close() to list iterators just so they are consistent with files. Just let __iterclose__ be a no-op.
Almost. Code like this will behave exactly the same as it currently does: for x in it: process(x) y = list(it) If it is a file object, the second call to list() will raise ValueError; if it is a list_iterator, or generator, etc., y will be an empty list. That part (I think) shouldn't change. What *will* change is code that partially processes the iterator in two different places. A simple example: py> it = iter([1, 2, 3, 4, 5, 6]) py> for x in it: ... if x == 4: break ... py> for x in it: ... print(x) ... 5 6 This *may* change. With this proposal, the first loop will "close" the iterator when you exit from the loop. For a list, there's no finaliser, no __del__ to call, so we can keep the current behaviour and nobody will notice any difference. But if `it` is a file iterator instead of a list iterator, the file will be closed when you exit the first for-loop, and the second loop will raise ValueError. That will be different. The fix here is simple: protect the first call from closing: for x in itertools.preserve(it): # preserve, protect, whatever ... Or, if `it` is your own class, give it a __iterclose__ method that does nothing. This is a backwards-incompatible change, so I think we would need to do this: (1) In Python 3.7, we introduce a __future__ directive: from __future__ import iterclose to enable the new behaviour. (Remember, future directives apply on a module-by-module basis.) (2) Without the directive, we keep the old behaviour, except that warnings are raised if something will change. (3) Then in 3.8 iterclose becomes the default, the warnings go away, and the new behaviour just happens. If that's too fast for people, we could slow it down: (1) Add the future directive to Python 3.7; (2) but no warnings by default (you have to opt-in to the warnings with an environment variable, or command-line switch). (3) Then in 3.8 the warnings are on by default; (4) And the iterclose behaviour doesn't become standard until 3.9. That means if this change worries you, you can ignore it until you migrate to 3.8 (which won't be production-ready until about 2020 or so), and don't have to migrate your code until 3.9, which will be a year or two later. But early adopters can start targetting the new functionality from 3.7 if they like. I don't think there's any need for a __future__ directive for aiterclose, since there's not enough backwards-incompatibility to care about. (I think, but don't mind if people disagree.) That can happen starting in 3.7, and when people complain that their syncronous generators don't have deterministic garbage collection like their asyncronous ones do, we can point them at the future directive. Bottom line is: at first I thought this was a scary change that would break too much code. But now I think it won't break much, and we can ease into it really slowly over two or three releases. So I think that the cost is probably low. I'm still not sure on how great the benefit will be, but I'm leaning towards a +1 on this. -- Steve

On 2016-10-19 12:21, Nathaniel Smith wrote:
To me this makes the change too hard to swallow. Although the issues you describe are real, it doesn't seem worth it to me to change the entire semantics of for loops just for these cases. There are lots of for loops that are not async and/or do not rely on resource cleanup. This will change how all of them work, just to fix something that sometimes is a problem for some resource-wrapping iterators. Moreover, even when the iterator does wrap a resource, sometimes I want to be able to stop and resume iteration. It's not uncommon, for instance, to have code using the csv module that reads some rows, pauses to make a decision (e.g., to parse differently depending what header columns are present, or skip some number of rows), and then resumes. This would increase the burden of updating code to adapt to the new breakage (since in this case the programmer would likely have to, or at least want to, think about what is going on rather than just blindly wrapping everything with protect() ). -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown

On 19 October 2016 at 20:21, Nathaniel Smith <njs@pobox.com> wrote:
I missed that you propose phasing this in, but it doesn't really alter much, I think the current behaviour is valuable and common, and I'm -1 on breaking it. It's just too much of a fundamental change to how loops and iterators interact for me to be comfortable with it - particularly as it's only needed for a very specific use case (none of my programs ever use async - why should I have to rewrite my loops with a clumsy extra call just to cater for a problem that only occurs in async code?) IMO, and I'm sorry if this is controversial, there's a *lot* of new language complexity that's been introduced for the async use case, and it's only the fact that it can be pretty much ignored by people who don't need or use async features that makes it acceptable (the "you don't pay for what you don't use" principle). The problem with this proposal is that it doesn't conform to that principle - it has a direct, negative impact on users who have no interest in async. Paul

On Wed, Oct 19, 2016 at 3:07 PM, Paul Moore <p.f.moore@gmail.com> wrote:
Oh, goodness, no -- like Yury said, the use cases here are not specific to async at all. I mean, none of the examples are async even :-). The motivation here is that prompt (non-GC-dependent) cleanup is a good thing for a variety of reasons: determinism, portability across Python implementations, proper exception propagation, etc. async does add yet another entry to this list, but I don't the basic principle is controversial. 'with' blocks are a whole chunk of extra syntax that were added to the language just for this use case. In fact 'with' blocks weren't even needed for the functionality -- we already had 'try/finally', they just weren't ergonomic enough. This use case is so important that it's had multiple rounds of syntax directed at it before async/await was even a glimmer in C#'s eye :-). BUT, currently, 'with' and 'try/finally' have a gap: if you use them inside a generator (async or not, doesn't matter), then they often fail at accomplishing their core purpose. Sure, they'll execute their cleanup code whenever the generator is cleaned up, but there's no ergonomic way to clean up the generator. Oops. I mean, you *could* respond by saying "you should never use 'with' or 'try/finally' inside a generator" and maybe add that as a rule to your style manual and linter -- and some people in this thread have suggested more-or-less that -- but that seems like a step backwards. This proposal instead tries to solve the problem of making 'with'/'try/finally' work and be ergonomic in general, and it should be evaluated on that basis, not on the async/await stuff. The reason I'm emphasizing async generators is that they effect the timeline, not the motivation: - PEP 525 actually does add async-only complexity to the language (the new GC hooks). It doesn't affect non-async users, but it is still complexity. And it's possible that if we have iterclose, then we don't need the new GC hooks (though this is still an open discussion :-)). If this is true, then now is the time to act, while reverting the GC hooks change is still a possibility; otherwise, we risk the situation where we add iterclose later, decide that the GC hooks no longer provide enough additional value to justify their complexity... but we're stuck with them anyway. - For synchronous iteration, the need for a transition period means that the iterclose proposal will take a few years to provide benefits. For asynchronous iteration, it could potentially start providing benefits much sooner -- but there's a very narrow window for that, before people start using async generators and backwards compatibility constraints kick in. If we delay a few months then we'll probably have to delay a few years. ...that said, I guess there is one way that async/await directly affected my motivation here, though it's not what you think :-). async/await have gotten me experimenting with writing network servers, and let me tell you, there is nothing that focuses the mind on correctness and simplicity like trying to write a public-facing asynchronous network server. You might think "oh well if you're trying to do some fancy rocket science and this is a feature for rocket scientists then that's irrelevant to me", but that's actually not what I mean at all. The rocket science part is like, trying to run through all possible execution orders of the different callbacks in your head, or to mentally simulate what happens if a client shows up that writes at 1 byte/second. When I'm trying to do that,then the last thing I want is be distracted by also trying to figure out boring mechanical stuff like whether or not the language is actually going to execute my 'finally' block -- yet right now that's a question that actually cannot be answered without auditing my whole source code! And that boring mechanical stuff is still boring mechanical stuff when writing less terrifying code -- it's just that I'm so used to wasting a trickle of cognitive energy on this kind of thing it that normally I don't notice it so much. And, also, regarding the "clumsy extra call": the preserve() call isn't just arbitrary clumsiness -- it's a signal that hey, you're turning off a safety feature. Now the language won't take care of this cleanup for you, so it's your responsibility. Maybe you should think about how you want to handle that. Of course your decision could be "whatever, this is a one-off script, the GC is good enough". But it's probably worth the ~0.5 seconds of thought to make that an active, conscious decision, because they aren't all one-off scripts. -n -- Nathaniel J. Smith -- https://vorpus.org

On Thu, Oct 20, 2016 at 11:03:11PM -0700, Nathaniel Smith wrote:
Perhaps it should be. The very first thing you say is "determinism". Hmmm. As we (or at least, some of us) move towards more async code, more threads or multi- processing, even another attempt to remove the GIL from CPython which will allow people to use threads with less cost, how much should we really value determinism? That's not a rhetorical question -- I don't know the answer. Portability across Pythons... if all Pythons performed exactly the same, why would we need multiple implementations? The way I see it, non-deterministic cleanup is the cost you pay for a non-reference counting implementation, for those who care about the garbage collection implementation. (And yes, ref counting is garbage collection.) [...]
How often is this *actually* a problem in practice? On my system, I can open 1000+ files as a regular user. I can't even comprehend opening a tenth of that as an ordinary application, although I can imagine that if I were writing a server application things would be different. But then I don't expect to write server applications in quite the same way as I do quick scripts or regular user applications. So it seems to me that a leaked file handler or two normally shouldn't be a problem in practice. They'll be friend when the script or application closes, and in the meantime, you have hundreds more available. 90% of the time, using `with file` does exactly what we want, and the times it doesn't (because we're writing a generator that isn't closed promptly) 90% of those times it doesn't matter. So (it seems to me) that you're talking about changing the behaviour of for-loops to suit only a small proportion of cases: maybe 10% of 10%. It is not uncommon to pass an iterator (such as a generator) through a series of filters, each processing only part of the iterator: it = generator() header = collect_header(it) body = collect_body(it) tail = collect_tail(it) Is it worth disrupting this standard idiom? I don't think so. -- Steve

On Fri, Oct 21, 2016 at 12:12 AM, Steven D'Aprano <steve@pearwood.info> wrote:
Hmm -- and yet "with" was added, and I an't imageine that its largest use-case is with ( ;-) ) open: with open(filename, mode) as my_file: .... .... And yet for years I happily counted on reference counting to close my files, and was particularly happy with: data = open(filename, mode).read() I really liked that that file got opened, read, and closed and cleaned up right off the bat. And then context managers were introduced. And it seems to be there is a consensus in the Python community that we all should be using them when working on files, and I myself have finally started routinely using them, and teaching newbies to use them -- which is kind of a pain, 'cause I want to have them do basic file reading stuff before I explain what a "context manager" is. Anyway, my point is that the broader Python community really has been pretty consistent about making it easy to write code that will work the same way (maybe not with the same performance) across python implementations. Ans specifically with deterministic resource management. On my system, I can open 1000+ files as a regular user. I can't even
well, what you can image isn't really the point -- I've bumped into that darn open file limit in my work, which was not a server application (though it was some pretty serious number crunching...). And I'm sure I'm not alone. OK, to be fair that was a poorly designed library, not an issue with determinism of resource management (through designing the lib well WOULD depend on that) But then I don't expect to write server applications in
quite the same way as I do quick scripts or regular user applications.
Though data analysts DO write "quick scripts" that might need to do things like access 100s of files...
that was the case with "with file" from the beginning -- particularly on cPython. And yet we all thought it was a great idea.
I don't see what the big overhead is here. for loops would get a new feature, but it would only be used by the objects that chose to implement it. So no huge change. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On 21 October 2016 at 21:59, Chris Barker <chris.barker@noaa.gov> wrote:
But the point is that the feature *would* affect people who don't need it. That's what I'm struggling to understand. I keep hearing "most code won't be affected", but then discussions about how we ensure that people are warned of where they need to add preserve() to their existing code to get the behaviour they already have. (And, of course, they need to add an "if we're on older pythons, define a no-op version of preserve() backward compatibility wrapper if they want their code to work cross version). I genuinely expect preserve() to pretty much instantly appear on people's lists of "python warts", and that bothers me. But I'm reaching the point where I'm just saying the same things over and over, so I'll bow out of this discussion now. I remain confused, but I'm going to have to trust that the people who have got a handle on the issue have understood the point I'm making, and have it covered. Paul

On 22 October 2016 at 06:59, Chris Barker <chris.barker@noaa.gov> wrote:
This is actually a case where style guidelines would ideally differ between between scripting use cases (let the GC handle it whenever, since your process will be terminating soon anyway) and library(/framework/application) development use cases (promptly clean up after yourself, since you don't necessarily know your context of use). However, that script/library distinction isn't well-defined in computing instruction in general, and most published style guides are written by library/framework/application developers, so students and folks doing ad hoc scripting tend to be the recipients of a lot of well-meaning advice that isn't actually appropriate for them :( Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 23 October 2016 at 02:17, Nick Coghlan <ncoghlan@gmail.com> wrote:
Pondering this overnight, I realised there's a case where folks using Python primarily as a scripting language can still run into many of the resource management problems that arise in larger applications: IPython notebooks, where the persistent kernel can keep resources alive for a surprisingly long time in the absence of a reference counting GC. Yes, they have the option of just restarting the kernel (which many applications don't have), but it's still a nicer user experience if we can help them avoid having those problems arise in the first place. This is likely mitigated in practice *today* by IPython users mostly being on CPython for access to the Scientific Python stack, but we can easily foresee a future where the PyPy community have worked out enough of their NumPy compatibility and runtime redistribution challenges that it becomes significantly more common to be using notebooks against Python kernels that don't use automatic reference counting. I'm significantly more amenable to that as a rationale for pursuing non-syntactic approaches to local resource management than I am the notion of pursuing it for the sake of high performance application development code. Chris, would you be open to trying a thought experiment with some of your students looking at ways to introduce function-scoped deterministic resource management *before* introducing with statements? Specifically, I'm thinking of a progression along the following lines: # Cleaned up whenever the interpreter gets around to cleaning up the function locals def readlines_with_default_resource_management(fname): return open(fname).readlines() # Cleaned up on function exit, even if the locals are still referenced from an exception traceback # or the interpreter implementation doesn't use a reference counting GC from local_resources import function_resource def readlines_with_declarative_cleanup(fname): return function_resource(open(fname)).readlines() # Cleaned up at the end of the with statement def readlines_with_imperative_cleanup(fname): with open(fname) as f: return f.readlines() The idea here is to change the requirement for new developers from "telling the interpreter what to *do*" (which is the situation we have for context managers) to "telling the interpreter what we *want*" (which is for it to link a managed resource with the lifecycle of the currently running function call, regardless of interpreter implementation details) Under that model, Inada-san's recent buffer snapshotting proposal would effectively be an optimised version of the one liner: def snapshot(data, limit, offset=0): return bytes(function_resource(memoryview(data))[offset:limit]) The big refactoring benefit that this feature would offer over with statements is that it doesn't require a structural change to the code - it's just wrapping an existing expression in a new function call that says "clean this up promptly when the function terminates, even if it's still part of a reference cycle, or we're not using a reference counting GC". Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sat, Oct 22, 2016 at 8:22 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
This is likely mitigated in practice *today* by IPython users mostly being on CPython for access to the Scientific Python stack,
sure -- though there is no reason that Jupyter notebooks aren't really useful to all sort of non-data-crunching tasks. It's just that that's the community it was born in. I can imagine they would be great for database exploration/management, for instance. Chris, would you be open to trying a thought experiment with some of your students looking at ways to introduce function-scoped
deterministic resource management *before* introducing with statements?
At first thought, talking about this seems like it would just confuse newbies even MORE. Most of my students really want simple examples they can copy and then change for their specific use case. But I do have some pretty experienced developers (new to Python, but not programming) in my classes, too, that I might be able to bring this up with. # Cleaned up whenever the interpreter gets around to cleaning up
I can see that, but I'm not sure newbies will -- it either case, you have to think about what you want -- which is the complexity I'm trying to avoid at this stage. Until much later, when I get into weak references, I can pretty much tell people that python will take care of itself with regards to resource management. That's what context mangers are for, in fact. YOU can use: with open(...) as infile: ..... Without needing to know what actually has to be "cleaned up" about a file. In the case of files, it's a close() call, simple enough (in the absence of Exceptions...), but with a database connection or something, it could be a lot more complex, and it's nice to know that it will simply be taken care of for you by the context manager. The big refactoring benefit that this feature would offer over with
hmm -- that would be simpler in one sense, but wouldn't it require a new function to be defined for everything you might want to do this with? rather than the same "with" syntax for everything? -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Chris Barker wrote:
Nick Coghlan wrote:
I'm with Chris, I think: this seems inappropriate to me. A student has to be rather sophisticated to understand resource management at all in Python. Eg, generators and closures can hang on to resources between calls, yet there's no syntactic marker at the call site.
I think this attempt at a distinction is spurious. On the syntactic side, with open("file") as f: results = read_and_process_lines(f) the with statement effectively links management of the file resource to the lifecycle of read_and_process_lines. (Yes, I know what you mean by "link" -- will "new developers"?) On the semantic side, constructs like closures and generators (which they may be cargo- culting!) mean that it's harder to link resource management to (syntactic) function calls than a new developer might think. (Isn't that Nathaniel's motivation for the OP?) And then there's the loop that may not fully consume an iterator problem: that must be explicitly decided -- the question for language designers is which of "close generators on loop exit" or "leave generators open on loop exit" should be marked with explicit syntax -- and what if you've got two generators involved, and want different decisions for both? Chris:
Indeed.
I hope you phrase that very carefully. Python takes care of itself, but does not take care of the use case. That's the programmer's responsibility. In a very large number of use cases, including the novice developer's role in a large project, that is a distinction that makes no difference. But the "close generators on loop exit" (or maybe not!) use case makes it clear that in general the developer must explicitly manage resources.
But somebody has to write that context manager. I suppose in the organizational context imagined here, it was written for the project by the resource management wonk in the group, and the new developer just cargo-cults it at first.
Even if it can be done with a single "ensure_cleanup" function, Python isn't Haskell. I think context management deserves syntax to mark it. After all, from the "open and read one file" scripting standpoint, there's really not a difference between f = open("file") process(f) and with open("file") as f: process(f) (see "taking care of Python ~= taking care of use case" above). But the with statement and indentation clearly mark the call to process as receiving special treatment. As Chris says, the developer doesn't need to know anything but that the object returned by the with expression participates "appropriately" in the context manager protocol (which she may think of as the "with protocol"!, ie, *magic*) and gets the "special treatment" it needs. So (for me) this is full circle: "with" context management is what we need, but it interacts poorly with stateful "function" calls -- and that's what Nathaniel proposes to deal with.

On 25 October 2016 at 11:59, Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
This is my read of Nathaniel's motivation as well, and hence my proposal: rather than trying to auto-magically guess when a developer intended for their resource management to be linked to the current executing frame (which requires fundamentally changing how iteration works in a way that breaks the world, and still doesn't solve the problem in general), I'm starting to think that we instead need a way to let them easily say "This resource, the one I just created or have otherwise gained access to? Link its management to the lifecycle of the currently running function or frame, so it gets cleaned up when it finishes running". Precisely *how* a particular implementation did that resource management would be up to the particular Python implementation, but one relatively straightforward way would be to use contextlib.ExitStack under the covers, and then when the frame finishes execution have a check that goes: - did the lazily instantiated ExitStack instance get created during frame execution? - if yes, close it immediately, thus reclaiming all the registered resources The spelling of the *surface* API though is something I'd need help from educators in designing - my problem is that I already know all the moving parts and how they fit together (hence my confidence that something like this would be relatively easy to implement, at least in CPython, if we decided we wanted to do it), but I *don't* know what kinds for terms could be used in the API if we wanted to make it approachable to relative beginners. My initial thought would be to offer: from local_resources import function_resource and: from local_resources import frame_resource Where the only difference between the two is that the first one would complain if you tried to use it outside a normal function body, while the second would be usable anywhere (function, class, module, generator, coroutine). Both would accept and automatically enter context managers as input, as if you'd wrapped the rest of the frame body in a with statement. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 2016-10-25 4:33 AM, Nick Coghlan wrote:
But how would it help with a partial iteration over generators with a "with" statement inside? def it(): with open(file) as f: for line in f: yield line Nathaniel proposal addresses this by fixing "for" statements, so that the outer loop that iterates over "it" would close the generator once the iteration is stopped. With your proposal you want to attach the opened file to the frame, but you'd need to attach it to the frame of *caller* of "it", right? Yury

On 26 October 2016 at 01:59, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
Every frame in the stack would still need to opt in to deterministic cleanup of its resources, but the difference is that it becomes an inline operation within the expression creating the iterator, rather than a complete restructuring of the function: def iter_consumer(fname): for line in function_resource(open(fname)): ... It doesn't matter *where* the iterator is being used (or even if you received it as a parameter), you get an easy way to say "When this function exits, however that happens, clean this up". Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 25 October 2016 at 03:32, Chris Barker <chris.barker@noaa.gov> wrote:
Nope, hence the references to contextlib.ExitStack: https://docs.python.org/3/library/contextlib.html#contextlib.ExitStack That's a tool for dynamic manipulation of context managers, so even today you can already write code like this:
The setup code to support it is just a few lines of code:
Plus the example context manager definition:
So the gist of my proposal (from an implementation perspective) is that if we give frame objects an ExitStack instance (or an operational equivalent) that can be created on demand and will be cleaned up when the frame exits (regardless of how that happens), then we can define an API for adding "at frame termination" callbacks (including making it easy to dynamically add context managers to that stack) without needing to define your own scaffolding for that feature - it would just be a natural part of the way frame objects work. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sat, Oct 22, 2016 at 9:17 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Hmm -- interesting idea -- and I recall Guido bringing something like this up on one of these lists not too long ago -- "scripting" use cases really are different that "systems programming" However, that script/library distinction isn't well-defined in
computing instruction in general,
no it's not -- except in the case of "scripting languages" vs. "systems languages" -- you can go back to the classic Ousterhout paper: https://www.tcl.tk/doc/scripting.html But Python really is suitable for both use cases, so tricky to know how to teach. And my classes, at least, have folks with a broad range of use-cases in mind, so I can't choose one way or another. And, indeed, there is no small amount of code (and coder) that starts out as a quicky script, but ends up embedded in a larger system down the road. And (another and?) one of the great things ABOUT Python is that is IS suitable for such a broad range of use-cases. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On 25 October 2016 at 03:16, Chris Barker <chris.barker@noaa.gov> wrote:
Steven Lott was pondering the same question a few years back (regarding his preference for teaching procedural programming before any other paradigms), so I had a go at articulating the general idea: http://www.curiousefficiency.org/posts/2011/08/scripting-languages-and-suita... The main paragraph is still pretty unhelpful though, since I handwave away the core of the problem as "the art of software design": """A key part of the art of software design is learning how to choose an appropriate level of complexity for the problem at hand - when a problem calls for a simple script, throwing an entire custom application at it would be overkill. On the other hand, trying to write complex applications using only scripts and no higher level constructs will typically lead to an unmaintainable mess.""" Cheers, Nick. P.S. I'm going to stop now since we're getting somewhat off-topic, but I wanted to highlight this excellent recent article on the challenges of determining the level of "suitable complexity" for any given software engineering problem: https://hackernoon.com/how-to-accept-over-engineering-for-what-it-really-is-... -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 21 October 2016 at 07:03, Nathaniel Smith <njs@pobox.com> wrote:
Ah I follow now. Sorry for the misunderstanding, I'd skimmed a bit more than I realised I had. However, it still feels to me that the code I currently write doesn't need this feature, and I'm therefore unclear as to why it's sufficiently important to warrant a backward compatibility break. It's quite possible that I've never analysed my code well enough to *notice* that there's a problem. Or that I rely on CPython's GC behaviour without realising it. Also, it's honestly very rare that I need deterministic cleanup, as opposed to guaranteed cleanup - running out of file handles, for example, isn't really a problem I encounter. But it's also possible that it's a code design difference. You use the example (from memory, sorry if this is slightly different to what you wrote): def filegen(filename): with open(filename) as f: for line in f: yield line # caller for line in filegen(name): ... I wouldn't normally write a function like that - I'd factor it differently, with the generator taking an open file (or file-like object) and the caller opening the file: def filegen(fd): for line in f: yield line # caller with open(filename) as fd: for line in filegen(fd): ... With that pattern, there's no issue. And the filegen function is more generic, as it can be used with *any* file-like object (a StringIO, for testing, for example).
Well, if preserve() did mean just that, then that would be OK. I'd never use it, as I don't care about deterministic cleanup, so it makes no difference to me if it's on or off. But that's not the case - in fact, preserve() means "give me the old Python 3.5 behaviour", and (because deterministic cleanup isn't important to me) that's a vague and unclear distinction. So I don't know whether my code is affected by the behaviour change and I have to guess at whether I need preserve(). What I think is needed here is a clear explanation of how this proposal affects existing code that *doesn't* need or care about cleanup. The example that's been mentioned is with open(filename) as f: for line in f: if is_end_of_header(line): break process_header(line) for line in f: process_body(line) and similar code that relies on being able to part-process an iterator in a for loop, and then have a later loop pick up where the first left off. Most users of iterators and generators probably have little understanding of GeneratorExit, closing generators, etc. And that's a good thing - it's why iterators in Python are so useful. So the proposal needs to explain how it impacts that sort of user, in terms that they understand. It's a real pity that the explanation isn't "you can ignore all of this, as you aren't affected by the problem it's trying to solve" - that's what I was getting at. At the moment, the take home message for such users feels like it's "you might need to scatter preserve() around your code, to avoid the behaviour change described above, which you glazed over because it talked about all that coroutiney stuff you don't understand" :-) Paul Paul

On Fri, Oct 21, 2016 at 11:03:51AM +0100, Paul Moore wrote:
I now believe that's not necessarily the case. I think that the message should be: - If your iterator class has a __del__ or close method, then you need to read up on __(a)iterclose__. - If you iterate over open files twice, then all you need to remember is that the file will be closed when you exit the first loop. To avoid that auto-closing behaviour, use itertools.preserve(). - Iterating over lists, strings, tuples, dicts, etc. won't change, since they don't have __del__ or close() methods. I think that covers all the cases the average Python code will care about. -- Steve

On 21 October 2016 at 12:23, Steven D'Aprano <steve@pearwood.info> wrote:
OK, that's certainly a lot less scary. Some thoughts, remain, though: 1. You mention files. Presumably (otherwise what would be the point of the change?) there will be other iterables that change similarly. There's no easy way to know in advance. 2. Cleanup protocols for iterators are pretty messy now - __del__, close, __iterclose__, __aiterclose__. What's the chance 3rd party implementers get something wrong? 3. What about generators? If you write your own generator, you don't control the cleanup code. The example: def mygen(name): with open(name) as f: for line in f: yield line is a good example - don't users of this generator need to use preserve() in order to be able to do partial iteration? And yet how would the writer of the generator know to document this? And if it isn't documented, how does the user of the generator know preserve is needed? My feeling is that this proposal is a relatively significant amount of language churn, to solve a relatively niche problem, and furthermore one that is actually only a problem to non-CPython implementations[1]. My instincts are that we need to back off on the level of such change, to give users a chance to catch their breath. We're not at the level of where we need something like the language change moratorium (PEP 3003) but I don't think it would do any harm to give users a chance to catch their breath after the wave of recent big changes (async, typing, path protocol, f-strings, funky unpacking, Windows build and installer changes, ...). To put this change in perspective - we've lived without it for many years now, can we not wait a little while longer?
And yet, it still seems to me that it's going to force me to change (maybe not much, but some of) my existing code, for absolutely zero direct benefit, as I don't personally use or support PyPy or any other non-CPython implementations. Don't forget that PyPy still doesn't even implement Python 3.5 - so no-one benefits from this change until PyPy supports Python 3.8, or whatever version this becomes the default in. It's very easy to misuse an argument like this to block *any* sort of change, and that's not my intention here - but I am trying to understand what the real-world issue is here, and how (and when!) this proposal would allow people to write code to fix that problem. At the moment, it feels like: * The problem is file handle leaks in code running under PyPy * The ability to fix this will come in around 4 years (random guess as to when PyPy implements Python 3.8, plus an assumption that the code needing to be fixed can immediately abandon support for all earlier versions of PyPy). Any other cases seem to me to be theoretical at the moment. Am I being unfair in this assessment? (It feels like I might be, but I can't be sure how). Paul [1] As I understand it. CPython's refcounting GC makes this a non-issue, correct?

Le 21/10/16 à 14:35, Paul Moore a écrit :
[1] As I understand it. CPython's refcounting GC makes this a non-issue, correct?
Wrong. Any guarantee that you think the CPython GC provides goes out of the window as soon as you have a reference cycle. Refcounting does not actually make GC deterministic, it merely hides the problem away from view. For instance, on CPython 3.5, running this code: #%%%%%%%%% class some_resource: def __enter__(self): print("Open resource") return 42 def __exit__(self, *args): print("Close resource") def some_iterator(): with some_resource() as s: yield s def main(): it = some_iterator() for i in it: if i == 42: print("The answer is", i) break print("End loop") # later ... try: 1/0 except ZeroDivisionError as e: exc = e main() print("Exit") #%%%%%%%%%% produces: Open resource The answer is 42 End loop Exit Close resource What happens is that 'exc' holds a cyclic reference back to the main() frame, which prevents it from being destroyed when the function exits, and that frame, in turn, holds a reference to the iterator, via the local variable 'it'. And so, the iterator remains alive, and the resource unclosed, until the next garbage collection.

On Wed, Oct 19, 2016 at 2:38 PM, Paul Moore <p.f.moore@gmail.com> wrote:
I may very well be misunderstanding the purpose of the proposal, but that is not how I saw it being used. I thought of it being used to clean up things that happened in the loop, rather than clean up the iterator itself. This would allow the iterator to manage events that occurred in the body of the loop. So it would be more like this scenario:
In this case, objiterer would do some cleanup related to obj1 and obj2 in the first loop and some cleanup related to obj3 and obj4 in the second loop. There would be no backwards-compatibility break, the method would be purely opt-in and most typical iterators wouldn't need it. However, in this case perhaps it might be better to have some method that is called after every loop, no matter how the loop is terminated (break, continue, return). This would allow the cleanup to be done every loop rather than just at the end.

On Wed, Oct 19, 2016 at 11:13 AM, Chris Angelico <rosuav@gmail.com> wrote:
Oh good point -- 'yield from' definitely needs a mention. Fortunately, I think it's pretty easy: the only way the child generator in a 'yield from' can be aborted early is if the parent generator is aborted early, so the semantics you'd want are that iff the parent generator is closed, then the child generator is also closed. 'yield from' already implements those semantics :-). So the only remaining issue is what to do if the child iterator completes normally, and in this case I guess 'yield from' probably should call '__iterclose__' at that point, like the equivalent for loop would.
The iterator is closed if someone explicitly closes it, either by calling the method by hand, or by passing it to a construct that calls that method -- a 'for' loop without preserve(...), etc. Obviously any given iterator's __next__ method could decide to do whatever it wants when it's exhausted normally, including executing its 'close' logic, but there's no magic that causes __iterclose__ to be called here. The distinction between exhausted and exhausted+closed is useful: consider some sort of file-wrapping iterator that implements __iterclose__ as closing the file. Then this exhausts the iterator and then closes the file: for line in file_wrapping_iter: ... and this also exhausts the iterator, but since __iterclose__ is not called, it doesn't close the file, allowing it to be re-used: for line in preserve(file_wrapping_iter): ... OTOH there is one important limitation to this, which is that if you're implementing your iterator by using a generator, then generators in particular don't provide any way to distinguish between exhausted and exhausted+closed (this is just how generators already work, nothing to do with this proposal). Once a generator has been exhausted, its close() method becomes a no-op.
BTW, it's probably easier to read this way :-): def every_second(it): for i, value in enumerate(it): if i % 2 == 1: yield value
Right. If the proposal is accepted then a lot (I suspect the vast majority) of iterator consumers will automatically DTRT because they're already using 'for' loops or whatever; for those that don't, they'll do whatever they're written to do, and that might or might not match what users have come to expect. Hence the transition period, ResourceWarnings and DeprecationWarnings, etc. I think the benefits are worth it, but there certainly is a transition cost. -n -- Nathaniel J. Smith -- https://vorpus.org

Hi Yury, Thanks for the detailed comments! Replies inline below. On Wed, Oct 19, 2016 at 8:51 AM, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
Maybe I'm misunderstanding, but I think those 100s of other cases where you need deterministic cleanup are why 'with' blocks were invented, and in my experience they work great for that. Once you get in the habit, it's very easy and idiomatic to attach a 'with' to each file handle, socket, etc., at the point where you create it. So from where I stand, it seems like those 100s of unresolved cases actually are resolved? The problem is that 'with' blocks are great, and generators are great, but when you put them together into the same language there's this weird interaction that emerges, where 'with' blocks inside generators don't really work for their intended purpose unless you're very careful and willing to write boilerplate. Adding deterministic cleanup to generators plugs this gap. Beyond that, I do think it's a nice bonus that other iterables can take advantage of the feature, but this isn't just a random "hey let's smush two constructs together to save a line of code" thing -- iteration is special because it's where generator call stacks and regular call stacks meet.
When you say "make writing iterators significantly harder", is it fair to say that you're thinking mostly of what I'm calling "iterator wrappers"? For most day-to-day iterators, it's pretty trivial to either add a close method or not; the tricky cases are when you're trying to manage a collection of sub-iterators. itertools.chain is a great challenge / test case here, because I think it's about as hard as this gets :-). It took me a bit to wrap my head around, but I think I've got it, and that it's not so bad actually. Right now, chain's semantics are: # copied directly from the docs def chain(*iterables): for it in iterables: for element in it: yield element In a post-__iterclose__ world, the inner for loop there will already handle closing each iterators as its finished being consumed, and if the generator is closed early then the inner for loop will also close the current iterator. What we need to add is that if the generator is closed early, we should also close all the unprocessed iterators. The first change is to replace the outer for loop with a while/pop loop, so that if an exception occurs we'll know which iterables remain to be processed: def chain(*iterables): try: while iterables: for element in iterables.pop(0): yield element ... Now, what do we do if an exception does occur? We need to call iterclose on all of the remaining iterables, but the tricky bit is that this might itself raise new exceptions. If this happens, we don't want to abort early; instead, we want to continue until we've closed all the iterables, and then raise a chained exception. Basically what we want is: def chain(*iterables): try: while iterables: for element in iterables.pop(0): yield element finally: try: operators.iterclose(iter(iterables[0])) finally: try: operators.iterclose(iter(iterables[1])) finally: try: operators.iterclose(iter(iterables[2])) finally: ... but of course that's not valid syntax. Fortunately, it's not too hard to rewrite that into real Python -- but it's a little dense: def chain(*iterables): try: while iterables: for element in iterables.pop(0): yield element # This is equivalent to the nested-finally chain above: except BaseException as last_exc: for iterable in iterables: try: operators.iterclose(iter(iterable)) except BaseException as new_exc: if new_exc.__context__ is None: new_exc.__context__ = last_exc last_exc = new_exc raise last_exc It's probably worth wrapping that bottom part into an iterclose_all() helper, since the pattern probably occurs in other cases as well. (Actually, now that I think about it, the map() example in the text should be doing this instead of what it's currently doing... I'll fix that.) This doesn't strike me as fundamentally complicated, really -- the exception chaining logic makes it look scary, but basically it's just the current chain() plus a cleanup loop. I believe that this handles all the corner cases correctly. Am I missing something? And again, this strikes me as one of the worst cases -- the vast majority of iterators out there are not doing anything nearly this complicated with subiterators.
Adding support to itertools, toolz.itertoolz, and generators (which are the most common way to implement iterator wrappers) will probably take care of 95% of uses, but yeah, there's definitely a long tail that will take time to shake out. The (extremely tentative) transition plan has __iterclose__ as opt-in until 3.9, so that's about 3.5 years from now. __aiterclose__ is a different matter of course, since there are very very few async iterator wrappers in the wild, and in general I think most people writing async iterators are watching async/await-related language developments very closely.
It's true that it's non-obvious to existing users, but that's true of literally every change that we could ever make :-). That's why we have release notes, deprecation warnings, enthusiastic blog posts, etc. For newcomers... well, it's always difficult for those of us with more experience to put ourselves back in the mindset, but I don't see why this would be particularly difficult to explain? for loops consume their iterator; if you don't want that then here's how you avoid it. That's no more difficult to explain than what an iterator is in the first place, I don't think, and for me at least it's a lot easier to wrap my head around than the semantics of else blocks on for loops :-). (I always forget how those work.)
True. If you're doing manual iteration, then you are still responsible for manual cleanup (if that's what you want), just like today. This seems fine to me -- I'm not sure why it's an objection to this proposal :-).
There is no law that says that the interpreter always shuts down after the event loop exits. We're talking about a fundamental language feature here, it shouldn't be dependent on the details of libraries and application shutdown tendencies :-(.
No exception will pass silently in the current PEP 525 implementation.
Exceptions that occur inside a garbage-collected iterator will be printed to the console, or possibly logged according to whatever the event loop does with unhandled exceptions. And sure, that's better than nothing, if someone remembers to look at the console/logs. But they *won't* be propagated out to the containing frame, they can't be caught, etc. That's a really big difference.
And if some AG isn't properly finalized a warning will be issued.
This actually isn't true of the code currently in asyncio master -- if the loop is already closed (either manually by the user or by its __del__ being called) when the AG finalizer executes, then the AG is silently discarded: https://github.com/python/asyncio/blob/e3fed68754002000be665ad1a379a747ad924... This isn't really an argument against the mechanism though, just a bug you should probably fix :-). I guess it does point to my main dissatisfaction with the whole GC hook machinery, though. At this point I have spent many, many hours tracing through the details of this catching edge cases -- first during the initial PEP process, where there were a few rounds of revision, then again the last few days when I first thought I found a bunch of bugs that turned out to be spurious because I'd missed one line in the PEP, plus one real bug that you already know about (the finalizer-called-from-wrong-thread issue), and then I spent another hour carefully reading through the code again with PEP 442 open alongside once I realized how subtle the resurrection and cyclic reference issues are here, and now here's another minor bug for you. At this point I'm about 85% confident that it does actually function as described, or that we'll at least be able to shake out any remaining weird edge cases over the next 6-12 months as people use it. But -- and I realize this is an aesthetic reaction as much as anything else -- this all feels *really* unpythonic to me. Looking at the Zen, the phrases that come to mind are "complicated", and "If the implementation is hard to explain, ...". The __(a)iterclose__ proposal definitely has its complexity as well, but it's a very different kind. The core is incredibly straightforward: "there is this method, for loops always call it". That's it. When you look at a for loop, you can be extremely confident about what's going to happen and when. Of course then there's the question of defining this method on all the diverse iterators that we have floating around -- I'm not saying it's trivial. But you can take them one at a time, and each individual case is pretty straightforward.
Like I said in the text, I don't find this very persuasive, since if you're manually iterating then you can just as well take manual responsibility for cleaning things up. But I could live with both mechanisms co-existing.
I certainly don't want to delay 3.6. I'm not as convinced as you that the async-generator code alone is so complicated that it would force a delay, but if it is then 3.6.1 is also an option worth considering.
The goal isn't to "fully solve the problem of non-deterministic GC of iterators". That would require magic :-). The goal is to provide tools so that when users run into this problem, they have viable options to solve it. Right now, we don't have those tools, as evidenced by the fact that I've basically never seen code that does this "correctly". We can tell people that they should be using explicit 'with' on every generator that might contain cleanup code, but they don't and they won't, and as a result their code quality is suffering on several axes (portability across Python implementations, 'with' blocks inside generators that don't actually do anything except spuriously hide ResourceWarnings, etc.). Adding __(a)iterclose__ to (async) for loops makes it easy and convenient to do the right thing in common cases; and in the less-usual case where you want to do manual iteration, then you can and should use a manual 'with' block too. The proposal is not trying to replace 'with' blocks :-). As for implicitness, eh. If 'for' is defined to mean 'iterate and then close', then that's what 'for' means. If we make the change then there won't be anything more implicit about 'for' calling __iterclose__ than there is about 'for' calling __iter__ or __next__. Definitely this will take some adjustment for those who are used to the old system, but sometimes that's the price of progress ;-). -n -- Nathaniel J. Smith -- https://vorpus.org

Nathaniel, On 2016-10-19 5:02 PM, Nathaniel Smith wrote:
Hi Yury,
Thanks for the detailed comments! Replies inline below.
NP!
Not all code can be written with 'with' statements, see my example with 'self = None' in asyncio. Python code can be quite complex, involving classes with __del__ that do some cleanups etc. Fundamentally, you cannot make GC of such objects deterministic. IOW I'm not convinced that if we implement your proposal we'll fix 90% (or even 30%) of cases where non-deterministic and postponed cleanup is harmful.
Yes, I understand that your proposal really improves some things. OTOH it undeniably complicates the iteration protocol and requires a long period of deprecations, teaching users and library authors new semantics, etc. We only now begin to see Python 3 gaining traction. I don't want us to harm that by introducing another set of things to Python 3 that are significantly different from Python 2. DeprecationWarnings/future imports don't excite users either.
Yes, mainly iterator wrappers. You'll also will need to educate users to refactor (more on that below) their __del__ methods to __(a)iterclose__ in 3.6.
Now imagine that being applied throughout the stdlib, plus some of it will have to be implemented in C. I'm not saying it's impossible, I'm saying that it will require additional effort for CPython and ecosystem. [..]
We don't often change the behavior of basic statements like 'for', if ever.
A lot of code that you find on stackoverflow etc will be broken. Porting code from Python2/<3.6 will be challenging. People are still struggling to understand 'dict.keys()'-like views in Python 3.
Right now we can implement the __del__ method to cleanup iterators. And it works for both partial iteration and cases where people forgot to close the iterator explicitly. With you proposal, to achieve the same (and make the code compatible with new for-loop semantics), users will have to implement both __iterclose__ and __del__.
It's not about shutting down the interpreter or exiting the process. The majority of async applications just run the loop until they exit. The point of PEP 525 and how the finalization is handled in asyncio is that AGs will be properly cleaned up for the absolute majority of time (while the loop is running). [..]
I don't think it's a bug. When the loop is closed, the hook will do nothing, so the asynchronous generator will be cleaned up by the interpreter. If it has an 'await' expression in its 'finally' statement, the interpreter will issue a warning. I'll add a comment explaining this.
Yes, I agree it's not an easy thing to digest. Good thing is that asyncio has a reference implementation of PEP 525 support, so people can learn from it. I'll definitely add more comments to make the code easier to read.
The __(a)iterclose__ semantics is clear. What's not clear is how much harm changing the semantics of for-loops will do (and how to quantify the amount of good :)) [..]
Perhaps we should focus on teaching people that using 'with' statements inside (async-) generators is a bad idea. What you should do instead is to have a 'with' statement wrapping the code that uses the generator. Yury

On Wed, Oct 19, 2016 at 05:52:34PM -0400, Yury Selivanov wrote:
Just because something doesn't solve ALL problems doesn't mean it isn't worth doing. Reference counting doesn't solve the problem of cycles, but Python worked really well for many years even though cycles weren't automatically broken. Then a second GC was added, but it didn't solve the problem of cycles with __del__ finalizers. And recently (a year or two ago) there was an improvement that made the GC better able to deal with such cases -- but I expect that there are still edge cases where objects aren't collected. Had people said "garbage collection doesn't solve all the edge cases, therefore its not worth doing" where would we be? I don't know how big a problem the current lack of deterministic GC of resources opened in generators actually is. I guess that users of CPython will have *no idea*, because most of the time the ref counter will cleanup quite early. But not all Pythons are CPython, and despite my earlier post, I now think I've changed my mind and support this proposal. One reason for this is that I thought hard about my own code where I use the double-for-loop idiom: for x in iterator: if cond: break ... # later for y in iterator: # same iterator ... and I realised: (1) I don't do this *that* often; (2) when I do, it really wouldn't be that big a problem for me to guard against auto-closing: for x in protect(iterator): if cond: break ... (3) if I need to write hybrid code that runs over multiple versions, that's easy too: try: from itertools import protect except ImportError: def protect(it): return it
Couldn't __(a)iterclose__ automatically call __del__ if it exists? Seems like a reasonable thing to inherit from object.
A lot of code that you find on stackoverflow etc will be broken.
"A lot"? Or a little? Are you guessing, or did you actually count it? If we are worried about code like this: it = iter([1, 2, 3]) a = list(it) # currently b will be [], with this proposal it will raise RuntimeError b = list(it) we can soften the proposal's recommendation that iterators raise RuntimeError on calling next() when they are closed. I've suggested that "whatever exception makes sense" should be the rule. Iterators with no resources to close can simply raise StopIteration instead. That will preserve the current behaviour.
I spend a lot of time on the tutor and python-list mailing lists, and a little bit of time on Reddit /python, and I don't think I've ever seen anyone struggle with those. I'm sure it happens, but I don't think it happens often. After all, for the most common use-case, there's no real difference between Python 2 and 3: for key, value in mydict.items(): ... [...]
As I ask above, couldn't we just inherit a default __(a)iterclose__ from object that looks like this? def __iterclose__(self): finalizer = getattr(type(self), '__del__', None) if finalizer: finalizer(self) I know it looks a bit funny for non-iterables to have an iterclose method, but they'll never actually be called. [...]
The "easy" way to find out (easy for those who aren't volunteering to do the work) is to fork Python, make the change, and see what breaks. I suspect not much, and most of the breakage will be easy to fix. As for the amount of good, this proposal originally came from PyPy. I expect that CPython users won't appreciate it as much as PyPy users, and Jython/IronPython users when they eventually support Python 3.x. -- Steve

On 2016-10-21 6:29 AM, Steven D'Aprano wrote:
No, we can't call __del__ from __iterclose__. Otherwise we'd break even more code that this proposal already breaks: for i in iter: ... iter.something() # <- this would be call after iter.__del__() [..]
AFAIK the proposal came "for" PyPy, not "from". And the issues Nathaniel tries to solve do also exist in CPython. It's only a question if changing 'for' statement and iteration protocol is worth the trouble. Yury

Personally, I hadn't realised we had this problem in asyncio until now. Does this problem happen in asyncio at all? Or does asyncio somehow work around it by making sure to always explicitly destroy the frames of all coroutine objects, as long as someone waits on each task? On 21 October 2016 at 16:08, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
-- Gustavo J. A. M. Carneiro Gambit Research "The universe is always one step beyond logic." -- Frank Herbert

On 2016-10-21 11:19 AM, Gustavo Carneiro wrote:
No, I think asyncio code is free of the problem this proposal is trying to address. We might have some "problem" in 3.6 when people start using async generators more often. But I think it's important for us to teach people to manage the associated resources from the outside of the generator (i.e. don't put 'async with' or 'with' inside the generator's body; instead, wrap the code that uses the generator with 'async with' or 'with'). Yury

On Fri, Oct 21, 2016 at 3:29 AM, Steven D'Aprano <steve@pearwood.info> wrote:
As for the amount of good, this proposal originally came from PyPy.
Just to be clear, I'm not a PyPy dev, and the PyPy devs' contribution here was mostly to look over a draft I circulated and to agree that it seemed like something that'd be useful to them. -n -- Nathaniel J. Smith -- https://vorpus.org

On 20 October 2016 at 07:02, Nathaniel Smith <njs@pobox.com> wrote:
At this point your code is starting to look a whole lot like the code in contextlib.ExitStack.__exit__ :) Accordingly, I'm going to suggest that while I agree the problem you describe is one that genuinely emerges in large production applications and other complex systems, this particular solution is simply far too intrusive to be accepted as a language change for Python - you're talking a fundamental change to the meaning of iteration for the sake of the relatively small portion of the community that either work on such complex services, or insist on writing their code as if it might become part of such a service, even when it currently isn't. Given that simple applications vastly outnumber complex ones, and always will, I think making such a change would be a bad trade-off that didn't come close to justifying the costs imposed on the rest of the ecosystem to adjust to it. A potentially more fruitful direction of research to pursue for 3.7 would be the notion of "frame local resources", where each Python level execution frame implicitly provided a lazily instantiated ExitStack instance (or an equivalent) for resource management. Assuming that it offered an "enter_frame_context" function that mapped to "contextlib.ExitStack.enter_context", such a system would let us do things like: from frame_resources import enter_frame_context def readlines_1(fname): return enter_frame_context(open(fname)).readlines() def readlines_2(fname): return [*enter_frame_context(open(fname))] def readlines_3(fname): return [line for line in enter_frame_context(open(fname))] def iterlines_1(fname): yield from enter_frame_context(open(fname)) def iterlines_2(fname): for line in enter_frame_context(open(fname)): yield line def iterlines_3(fname): f = enter_frame_context(open(fname)) while True: try: yield next(f) except StopIteration: pass to indicate "clean up this file handle when this frame terminates, regardless of the GC implementation used by the interpreter". Such a feature already gets you a long way towards the determinism you want, as frames are already likely to be cleaned up deterministically even in Python implementations that don't use automatic reference counting - the bit that's non-deterministic is cleaning up the local variables referenced *from* those frames. And then further down the track, once such a system had proven its utility, *then* we could talk about expanding the iteration protocol to allow for implicit registration of iterable cleanup functions as frame local resources. With the cleanup functions not firing until the *frame* exits, then the backwards compatibility break would be substantially reduced (for __main__ module code there'd essentially be no compatibility break at all, and similarly for CPython local variables), and the level of impact on language implementations would also be much lower (reduced to supporting the registration of cleanup functions with frame objects, and executing those cleanup functions when the frame terminates) Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sat, Oct 22, 2016 at 9:02 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
One of the versions I tried but didn't include in my email used ExitStack :-). It turns out not to work here: the problem is that we effectively need to enter *all* the contexts before unwinding, even if trying to enter one of them fails. ExitStack is nested like (try (try (try ... finally) finally) finally), and we need (try finally (try finally (try finally ...))) But this is just a small side-point anyway, since most code is not implementing complicated meta-iterators; I'll address your real proposal below.
So basically a 'with expression', that gives up the block syntax -- taking its scope from the current function instead -- in return for being usable in expression context? That's a really interesting, and I see the intuition that it might be less disruptive if our implicit iterclose calls are scoped to the function rather than the 'for' loop. But having thought about it and investigated some... I don't think function-scoping addresses my problem, and I don't see evidence that it's meaningfully less disruptive to existing code. First, "my problem": Obviously, Python's a language that should be usable for folks doing one-off scripts, and for paranoid folks trying to write robust complex systems, and for everyone in between -- these are all really important constituencies. And unfortunately, there is a trade-off here, where the changes we're discussing effect these constituencies differently. But it's not just a matter of shifting around a fixed amount of pain; the *quality* of the pain really changes under the different proposals. In the status quo: - for one-off scripts: you can just let the GC worry about generator and file handle cleanup, re-use iterators, whatever, it's cool - for robust systems: because it's the *caller's* responsibility to ensure that iterators are cleaned up, you... kinda can't really use generators without -- pick one -- (a) draconian style guides (like forbidding 'with' inside generators or forbidding bare 'for' loops entirely), (b) lots of auditing (every time you write a 'for' loop, go read the source to the generator you're iterating over -- no modularity for you and let's hope the answer doesn't change!), or (c) introducing really subtle bugs. Or all of the above. It's true that a lot of the time you can ignore this problem and get away with it one way or another, but if you're trying to write robust code then this doesn't really help -- it's like saying the footgun only has 1 bullet in the chamber. Not as reassuring as you'd think. It's like if every time you called a function, you had to explicitly say whether you wanted exception handling to be enabled inside that function, and if you forgot then the interpreter might just skip the 'finally' blocks while unwinding. There's just *isn't* a good solution available. In my proposal (for-scoped-iterclose): - for robust systems: life is great -- you're still stopping to think a little about cleanup every time you use an iterator (because that's what it means to write robust code!), but since the iterators now know when they need cleanup and regular 'for' loops know how to invoke it, then 99% of the time (i.e., whenever you don't intend to re-use an iterator) you can be confident that just writing 'for' will do exactly the right thing, and the other 1% of the time (when you do want to re-use an iterator), you already *know* you're doing something clever. So the cognitive overhead on each for-loop is really low. - for one-off scripts: ~99% of the time (actual measurement, see below) everything just works, except maybe a little bit better. 1% of the time, you deploy the clever trick of re-using an iterator with multiple for loops, and it breaks, so this is some pain. Here's what you see: gen_obj = ... for first_line in gen_obj: break for lines in gen_obj: ... Traceback (most recent call last): File "/tmp/foo.py", line 5, in <module> for lines in gen_obj: AlreadyClosedIteratorError: this iterator was already closed, possibly by a previous 'for' loop. (Maybe you want itertools.preserve?) (We could even have a PYTHONDEBUG flag that when enabled makes that error message include the file:line of the previous 'for' loop that called __iterclose__.) So this is pain! But the pain is (a) rare, not pervasive, (b) immediately obvious (an exception, the code doesn't work at all), not subtle and delayed, (c) easily googleable, (d) easy to fix and the fix is reliable. It's a totally different type of pain than the pain that we currently impose on folks who want to write robust code. Now compare to the new proposal (function-scoped-iterclose): - For those who want robust cleanup: Usually, I only need an iterator for as long as I'm iterating over it; that may or may not correspond to the end of the function (often won't). When these don't coincide, it can cause problems. E.g., consider the original example from my proposal: def read_newline_separated_json(path): with open(path) as f: for line in f: yield json.loads(line) but now suppose that I'm a Data Scientist (tm) so instead of having 1 file full of newline-separated JSON, I have a 100 gigabytes worth of the stuff stored in lots of files in a directory tree. Well, that's no problem, I'll just wrap that generator: def read_newline_separated_json_tree(tree): for root, _, paths in os.walk(tree): for path in paths: for document in read_newline_separated_json(join(root, path)): yield document And then I'll run it on PyPy, because that's what you do when you have 100 GB of string processing, and... it'll crash, because the call to read_newline_separated_tree ends up doing thousands of calls to read_newline_separated_json, but never cleans up any of them up until the function exits, so eventually we run out of file descriptors. A similar situation arises in the main loop of something like an HTTP server: while True: request = read_request(sock) for response_chunk in application_handler(request): send_response_chunk(sock) Here we'll accumulate arbitrary numbers of un-closed application_handler generators attached to the stack frame, which is no good at all. And this has the interesting failure mode that you'll probably miss it in testing, because most clients will only re-use a connection a small number of times. So what this means is that every time I write a for loop, I can't just do a quick "am I going to break out of the for-loop and then re-use this iterator?" check -- I have to stop and think about whether this for-loop is nested inside some other loop, etc. And, again, if I get it wrong, then it's a subtle bug that will bite me later. It's true that with the status quo, we need to wrap, X% of for-loops with 'with' blocks, and with this proposal that number would drop to, I don't know, (X/5)% or something. But that's not the most important cost: the most important cost is the cognitive overhead of figuring out which for-loops need the special treatment, and in this proposal that checking is actually *more* complicated than the status quo. - For those who just want to write a quick script and not think about it: here's a script that does repeated partial for-loops over a generator object: https://github.com/python/cpython/blob/553a84c4c9d6476518e2319acda6ba29b8588... (and note that the generator object even has an ineffective 'with open(...)' block inside it!) With the function-scoped-iterclose, this script would continue to work as it does now. Excellent. But, suppose that I decide that that main() function is really complicated and that it would be better to refactor some of those loops out into helper functions. (Probably actually true in this example.) So I do that and... suddenly the code breaks. And in a rather confusing way, because it has to do with this complicated long-distance interaction between two different 'for' loops *and* where they're placed with respect to the original function versus the helper function. If I were an intermediate-level Python student (and I'm pretty sure anyone who is starting to get clever with re-using iterators counts as "intermediate level"), then I'm pretty sure I'd actually prefer the immediate obvious feedback from the for-scoped-iterclose. This would actually be a good time to teach folks about this aspect of resource handling, actually -- it's certainly an important thing to learn eventually on your way to Python mastery, even if it isn't needed for every script. In the pypy-dev thread about this proposal, there's some very distressed emails from someone who's been writing Python for a long time but only just realized that generator cleanup relies on the garbage collector: https://mail.python.org/pipermail/pypy-dev/2016-October/014709.html https://mail.python.org/pipermail/pypy-dev/2016-October/014720.html It's unpleasant to have the rug pulled out from under you like this and suddenly realize that you might have to go re-evaluate all the code you've ever written, and making for loops safe-by-default and fail-fast-when-unsafe avoids that. Anyway, in summary: function-scoped-iterclose doesn't seem to accomplish my goal of getting rid of the *type* of pain involved when you have to run a background thread in your brain that's doing constant paranoid checking every time you write a for loop. Instead it arguably takes that type of pain and spreads it around both the experts and the novices :-/. ------------- Now, let's look at some evidence about how disruptive the two proposals are for real code: As mentioned else-thread, I wrote a stupid little CPython hack [1] to report when the same iterator object gets passed to multiple 'for' loops, and ran the CPython and Django testsuites with it [2]. Looking just at generator objects [3], across these two large codebases there are exactly 4 places where this happens. (Rough idea of prevalence: these 4 places together account for a total of 8 'for' loops; this is out of a total of 11,503 'for' loops total, of which 665 involve generator objects.) The 4 places are: 1) CPython's Lib/test/test_collections.py:1135, Lib/_collections_abc.py:378 This appears to be a bug in the CPython test suite -- the little MySet class does 'def __init__(self, itr): self.contents = itr', which assumes that itr is a container that can be repeatedly iterated. But a bunch of the methods on collections.abc.Set like to pass in a generator object here instead, which breaks everything. If repeated 'for' loops on generators raised an error then this bug would have been caught much sooner. 2) CPython's Tools/scripts/gprof2html.py lines 45, 54, 59, 75 Discussed above -- as written, for-scoped-iterclose would break this script, but function-scoped-iterclose would not, so here function-scoped-iterclose wins. 3) Django django/utils/regex_helper.py:236 This code is very similar to the previous example in its general outline, except that the 'for' loops *have* been factored out into utility functions. So in this case for-scoped-iterclose and function-scoped-iterclose are equally disruptive. 4) CPython's Lib/test/test_generators.py:723 I have to admit I cannot figure out what this code is doing, besides showing off :-). But the different 'for' loops are in different stack frames, so I'm pretty sure that for-scoped-iterclose and function-scoped-iterclose would be equally disruptive. Obviously there's a bias here in that these are still relatively "serious" libraries; I don't have a big corpus of one-off scripts that are just a big __main__, though gprof2html.py isn't far from that. (If anyone knows where to find such a thing let me know...) But still, the tally here is that out of 4 examples, we have 1 subtle bug that iterclose might have caught, 2 cases where for-scoped-iterclose and function-scoped-iterclose are equally disruptive, and only 1 where function-scoped-iterclose is less disruptive -- and in that case it's arguably just avoiding an obvious error now in favor of a more confusing error later. If this reduced the backwards-incompatible cases by a factor of, like, 10x or 100x, then that would be a pretty strong argument in its favor. But it seems to be more like... 1.5x. -n [1] https://github.com/njsmith/cpython/commit/2b9d60e1c1b89f0f1ac30cbf0a5dceee83... [2] CPython: revision b0a272709b from the github mirror; Django: revision 90c3b11e87 [3] I also looked at "all iterators" and "all iterators with .close methods", but this email is long enough... basically the pattern is the same: there are another 13 'for' loops that involve repeated iteration over non-generator objects, and they're roughly equally split between spurious effects due to bugs in the CPython test-suite or my instrumentation, cases where for-scoped-iterclose and function-scoped-iterclose both cause the same problems, and cases where function-scoped-iterclose is less disruptive. -n -- Nathaniel J. Smith -- https://vorpus.org

...Doh. I spent all that time evaluating the function-scoped-cleanup proposal from the high-level design perspective, and then immediately after hitting send, I suddenly realized that I'd missed a much more straightforward technical problem. One thing that 'with' blocks / for-scoped-iterclose do is that they put an upper bound on the lifetime of generator objects. That's important if you're using a non-refcounting-GC, or if there might be reference cycles. But it's not all they do: they also arrange to make sure that any cleanup code is executed in the context of the code that's using the generator. This is *also* really important: if you have an exception in your cleanup code, and the GC runs your cleanup code, then that exception will just disappear into nothingness (well, it'll get printed to the console, but that's hardly better). So you don't want to let the GC run your cleanup code. If you have an async generator, you want to run the cleanup code under supervision of the calling functions coroutine runner, and ideally block the running coroutine while you do it; doing this from the GC is difficult-to-impossible (depending on how picky you are -- PEP 525 does part of it, but not all). Again, letting the GC get involved is bad. So for the function-scoped-iterclose proposal: does this implicit ExitStack-like object take a strong reference to iterators, or just a weak one? If it takes a strong reference, then suddenly we're pinning all iterators in memory until the end of the enclosing function, which will often look like a memory leak. I think this would break a *lot* more existing code than the for-scoped-iterclose proposal does, and in more obscure ways that are harder to detect and warn about ahead of time. So that's out. If it takes a weak reference, ... then there's a good chance that iterators will get garbage collected before the ExitStack has a chance to clean them up properly. So we still have no guarantee that the cleanup will happen in the right context, that exceptions will not be lost, and so forth. In fact, it becomes literally non-deterministic: you might see an exception propagate properly on one run, and not on the next, depending on exactly when the garbage collector happened to run. IMHO that's *way* too spooky to be allowed, but I can't see any way to fix it within the function-scoping framework :-( -n On Tue, Oct 25, 2016 at 3:25 PM, Nathaniel Smith <njs@pobox.com> wrote:
-- Nathaniel J. Smith -- https://vorpus.org

On 26 October 2016 at 08:48, Nathaniel Smith <njs@pobox.com> wrote:
It would take a strong reference, which is another reason why close_resources() would be an essential part of the explicit API (since it would drop the references in addition to calling the __exit__() and close() methods of the declared resources), and also yet another reason why you've convinced me that the only implicit API that would ever make sense is one that was scoped specifically to the iteration process. However, I still think the explicit-API-only suggestion is a much better path to pursue than any implicit proposal - it will give folks that see it for the first something to Google, and it's a general purpose technique rather than being restricted specifically to the cases where the resource to be managed and the iterator being iterated over are one and the same object. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 26 October 2016 at 08:25, Nathaniel Smith <njs@pobox.com> wrote:
Regardless of any other outcome from this thread, it may be useful to have a "contextlib.ResourceSet" as an abstraction for collective management of resources, regardless of whatever else happens. As you say, the main difference is that the invocation of the cleanup functions wouldn't be nested at all and could be called in an arbitrary order (if that's not sufficient for a particular use case, then you'd need to define an ExitStack for the items where the order of cleanup matters, and then register *that* with the ResourceSet).
(Note: I've changed my preferred API name from "function_resource" + "frame_resource" to the general purpose "scoped_resource" - while it's somewhat jargony, which I consider unfortunate, the goal is to make the runtime scope of the resource match the lexical scope of the reference as closely as is feasible, and if folks are going to understand how Python manages references and resources, they're going to need to learn the basics of Python's scope management at some point) Given your points below, the defensive coding recommendation here would be to - always wrap your iterators in scoped_resource() to tell Python to clean them up when the function is done - explicitly call close_resources() after the affected for loops to clean the resources up early You'd still be vulnerable to resource leaks in libraries you didn't write, but would have decent control over your own code without having to make overly draconian changes to your style guide - you'd only need one new rule, which is "Whenever you're iterating over something, pass it through scoped_resource first". To simplify this from a forwards compatibility perspective (i.e. so it can implicitly adjust when an existing type gains a cleanup method), we'd make scoped_resource() quite permissive, accepting arbitrary objects with the following behaviours: - if it's a context manager, enter it, and register the exit callback - if it's not a context manager, but has a close() method, register the close method - otherwise, pass it straight through without taking any other action This would allow folks to always declare something as a scoped resource without impeding their ability to handle objects that aren't resources at all. The long term question would then become whether it made sense to have certain language constructs implicitly mark their targets as scoped resources *by default*, and clean them up selectively after the loop rather than using the blunt instrument of cleaning up all previously registered resources. If we did start seriously considering such a change, then there would be potential utility in an "unmanaged_iter()" wrapper which forwarded *only* the iterator protocol methods, thus hiding any __exit__() or close() methods from scoped_resource(). However, the time to consider such a change in default behaviour would be *after* we had some experience with explicit declarations and management of scoped resources - plenty of folks are writing plenty of software today in garbage collected languages (including Python), and coping with external resource management problems as they arise, so we don't need to do anything hasty here. I personally think an explicit solution is likely to be sufficient (given the caveat of adding a "gc.collect()" counterpart), with an API like `scoped_resource` being adopted over time in libraries, frameworks and applications based on actual defects found in running production systems as well as the defensive coding style, and your example below makes me even more firmly convinced that that's a better way to go.
In mine, if your style guide says "Use scoped_resource() and an explicit close_resources() call when iterating", you'd add it (or your automated linter would complain that it was missing). So the cognitive overhead is higher, but it would remain where it belongs (i.e. on professional developers being paid to write robust code).
And it's completely unecessary - with explicit scoped_resource() calls absolutely nothing changes for the scripting use case, and even with implicit ones, re-use *within the same scope* would still be fine (you'd only get into trouble if the resource escaped the scope where it was first marked as a scoped resource).
If you're being paid to write robust code and are using Python 3.7+, then you'd add scoped_resource() around the read_newline_separated_json() call and then add a close_resources() call after that loop. That'd be part of your job, and just another point in the long list of reasons why developing software as a profession isn't the same thing as doing it as a hobby. We'd design scoped_resource() in such a way that it could be harmlessly wrapped around "paths" as well, even though we know that's technically not necessary (since it's just a list of strings). As noted above, I'm also open to the notion of some day making all for loops implicitly declare the iterators they operate on as scoped resources, but I don't think we should do that without gaining some experience with the explicit form first (where we can be confident that any unexpected negative consequences will be encountered by folks already well equipped to deal with them).
And we'll go "Oops", and refactor our code to better control the scope of our resources, either by adding a with statement around the innermost loop or using the new scoped resources API (if such a thing gets added). The *whole point* of iterative development is to solve the problems you know you have, not the problems you or someone else might potentially have at some point in the indeterminate future.
And the fixed code (given the revised API proposal above) looks like this: while True: request = read_request(sock) for response_chunk in scoped_resource(application_handler(request)): send_response_chunk(sock) close_resources() This pattern has the advantage of also working if the resources you want to manage aren't precisely what your iterating over, or if you're iterating over them in a while loop rather than a for loop.
Or you unconditionally add the scoped_resource/close_resources calls to force non-reference-counted implementations to behave a bit more like CPython and don't worry about it further.
As it would with the explicit scoped_resource/close_resources API.
I do agree the fact that it would break common code refactoring patterns is a good counter-argument against the idea of ever calling scoped_resource() implicitly.
Does the addition of the explicit close_resources() API mitigate your concern?
The standard library and a web framework are in no way typical of Python application and scripting code.
But explicitly scoped resource management leaves it alone.
And explicitly scoped resource management again leaves it alone.
The explicit-API-only aspect of the proposal eliminates 100% of the backwards incompatibilities :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tuesday, October 25, 2016 at 6:26:17 PM UTC-4, Nathaniel Smith wrote:
I still don't understand why you can't write it like this: def read_newline_separated_json_tree(tree): for root, _, paths in os.walk(tree): for path in paths: with read_newline_separated_json(join(root, path)) as iterable: yield from iterable Zero extra lines. Works today. Does everything you want.
Same thing: while True: request = read_request(sock) with application_handler(request) as iterable: for response_chunk in iterable: send_response_chunk(sock) I'll stop posting about this, but I don't see the motivation behind this proposals except replacing one explicit context management line with a hidden "line" of cognitive overhead. I think the solution is to stop returning an iterable when you have state needing a cleanup. Instead, return a context manager and force the caller to open it to get at the iterable. Best, Neil

Hey Nathaniel - I like the intent here, but I think perhaps it would be better if the problem is approached differently. Seems to me that making *generators* have a special 'you are done now' interface is special casing, which usually makes things harder to learn and predict; and that more the net effect is that all loop constructs will need to learn about that special case, whether looping over a list, a generator, or whatever. Generators already have a well defined lifecycle - but as you say its not defined consistently across Python VM's. The language has no guarantees about when finalisation will occur :(. The PEP 525 aclose is a bit awkward itself in this way - but unlike regular generators it does have a reason, which is that the language doesn't define an event loop context as a built in thing - so finalisation can't reliably summon one up. So rather than adding a special case to finalise objects used in one particular iteration - which will play havoc with break statements, can we instead look at making escape analysis a required part of the compiler: the borrow checker in rust is getting pretty good at managing a very similar problem :). I haven't fleshed out exactly what would be entailed, so consider this a 'what if' and YMMV :). -Rob On 19 October 2016 at 17:38, Nathaniel Smith <njs@pobox.com> wrote:

On 10/19/2016 12:38 AM, Nathaniel Smith wrote:
I'd like to propose that Python's iterator protocol be enhanced to add a first-class notion of completion / cleanup.
With respect the the standard iterator protocol, a very solid -1 from me. (I leave commenting specifically on __aiterclose__ to Yury.) 1. I consider the introduction of iterables and the new iterator protocol in 2.2 and their gradual replacement of lists in many situations to be the greatest enhancement to Python since 1.3 (my first version). They are, to me, they one of Python's greatest features and the minimal nature of the protocol an essential part of what makes them great. 2. I think you greatly underestimate the negative impact, just as we did with changing str is bytes to str is unicode. The change itself, embodied in for loops, will break most non-trivial programs. You yourself note that there will have to be pervasive changes in the stdlib just to begin fixing the breakage. 3. Though perhaps common for what you do, the need for the change is extremely rare in the overall Python world. Iterators depending on an external resource are rare (< 1%, I would think). Incomplete iteration is also rare (also < 1%, I think). And resources do not always need to releases immediately. 4. Previous proposals to officially augment the iterator protocol, even with optional methods, have been rejected, and I think this one should be too. a. Add .__len__ as an option. We added __length_hint__, which an iterator may implement, but which is not part of the iterator protocol. It is also ignored by bool(). b., c. Add __bool__ and/or peek(). I posted a LookAhead wrapper class that implements both for most any iterable. I suspect that the is rarely used.
One problem with passing paths around is that it makes the receiving function hard to test. I think functions should at least optionally take an iterable of lines, and make the open part optional. But then closing should also be conditional. If the combination of 'with', 'for', and 'yield' do not work together, then do something else, rather than changing the meaning of 'for'. Moving responsibility for closing the file from 'with' to 'for', makes 'with' pretty useless, while overloading 'for' with something that is rarely needed. This does not strike me as the right solution to the problem.
for document in read_newline_separated_json(path): # <-- outer for loop ...
If the outer loop determines when the file should be closed, then why not open it there? What fails with try: lines = open(path) gen = read_newline_separated_json(lines) for doc in gen: do_something(doc) finally: lines.close # and/or gen.throw(...) to stop the generator. -- Terry Jan Reedy

On Wed, Oct 19, 2016 at 7:07 PM, Terry Reedy <tjreedy@udel.edu> wrote:
Minimalism for its own sake isn't really a core Python value, and in any case the minimalism ship has kinda sailed -- we effectively already have send/throw/close as optional parts of the protocol (they're most strongly associated with generators, but you're free to add them to your own iterators and e.g. yield from will happily work with that). This proposal is basically "we formalize and start automatically calling the 'close' methods that are already there".
The long-ish list of stdlib changes is about enabling the feature everywhere, not about fixing backwards incompatibilities. It's an important question though what programs will break and how badly. To try and get a better handle on it I've been playing a bit with an instrumented version of CPython that logs whenever the same iterator is passed to multiple 'for' loops. I'll write up the results in more detail, but the summary so far is that there seem to be ~8 places in the stdlib that would need preserve() calls added, and ~3 in django. Maybe 2-3 hours and 1 hour of work respectively to fix? It's not a perfect measure, and the cost certainly isn't zero, but it's at a completely different order of magnitude than the str changes. Among other things, this is a transition that allows for gradual opt-in via a __future__, and fine-grained warnings pointing you at what you need to fix, neither of which were possible for str->unicode.
This could equally well be an argument that the change is fine -- e.g. if you're always doing complete iteration, or just iterating over lists and stuff, then it literally doesn't affect you at all either way...
Sure, that's all true, but this is the problem with tiny documentation examples :-). The point here was to explain the surprising interaction between generators and with blocks in the simplest way, not to demonstrate the ideal solution to the problem of reading newline-separated JSON. Everything you want is still doable in a post-__iterclose__ world -- in particular, if you do for doc in read_newline_separated_json(lines_generator()): ... then both iterators will be closed when the for loop exits. But if you want to re-use the lines_generator, just write: it = lines_generator() for doc in read_newline_separated_json(preserve(it)): ... for more_lines in it: ...
Sure, that works in this trivial case, but they aren't all trivial :-). See the example from my first email about a WSGI-like interface where response handlers are generators: in that use case, your suggestion that we avoid all resource management inside generators would translate to: "webapps can't open files". (Or database connections, proxy requests, ... or at least, can't hold them open while streaming out response data.) Or sticking to concrete examples, here's a toy-but-plausible generator where the put-the-with-block-outside strategy seems rather difficult to implement: # Yields all lines in all files in 'directory' that contain the substring 'needle' def recursive_grep(directory, needle): for dirpath, _, filenames in os.walk(directory): for filename in filenames: with open(os.path.join(dirpath, filename)) as file_handle: for line in file_handle: if needle in line: yield line -n -- Nathaniel J. Smith -- https://vorpus.org

NOTE: This is my first post to this mailing list, I'm not really sure how to post a message, so I'm attempting a reply-all. I like Nathaniel's idea for __iterclose__. I suggest the following changes to deal with a few of the complex issues he discussed. 1. Missing __iterclose__, or a value of none, works as before, no changes. 2. An iterator can be used in one of three ways: A. 'for' loop, which will call __iterclose__ when it exits B. User controlled, in which case the user is responsible to use the iterator inside a with statement. C. Old style. The user is responsible for calling __iterclose__ 3. An iterator keeps track of __iter__ calls, this allows it to know when to cleanup. The two key additions, above, are: #2B. User can use iterator with __enter__ & __exit cleanly. #3. By tracking __iter__ calls, it makes complex user cases easier to handle. Specification ============= An iterator may implement the following method: __iterclose__. A missing method, or a value of None is allowed. When the user wants to control the iterator, the user is expected to use the iterator with a with clause. The core proposal is the change in behavior of ``for`` loops. Given this Python code: for VAR in ITERABLE: LOOP-BODY else: ELSE-BODY we desugar to the equivalent of: _iter = iter(ITERABLE) _iterclose = getattr(_iter, '__iterclose__', None) if _iterclose is none: traditional-for VAR in _iter: LOOP-BODY else: ELSE-BODY else: _stop_exception_seen = False try: traditional-for VAR in _iter: LOOP-BODY else: _stop_exception_seen = True ELSE-BODY finally: if not _stop_exception_seen: _iterclose(_iter) The test for 'none' allows us to skip the setup of a try/finally clause. Also we don't bother to call __iterclose__ if the iterator threw StopException at us. Modifications to basic iterator types ===================================== An iterator will implement something like the following: _cleanup - Private funtion, does the following: _enter_count = _itercount = -1 Do any neccessary cleanup, release resources, etc. NOTE: Is also called internally by the iterator, before throwing StopIterator _iter_count - Private value, starts at 0. _enter_count - Private value, starts at 0. __iter__ - if _iter_count >= 0: _iter_count += 1 return self __iterclose__ - if _iter_count is 0: if _enter_count is 0: _cleanup() elif _iter_count > 0: _iter_count -= 1 __enter__ - if _enter_count >= 0: _enter_count += 1 Return itself. __exit__ - if _enter_count is > 0 _enter_count -= 1 if _enter_count is _iter_count is 0: _cleanup() The suggetions on _iter_count & _enter_count are just example; internal details can differ (and better error handling). Examples: ========= NOTE: Example are givin using xrange() or [1, 2, 3, 4, 5, 6, 7] for simplicity. For real use, the iterator would have resources such as open files it needs to close on cleanup. 1. Simple example: for v in xrange(7): print v Creates an iterator with a _usage_count of 0. The iterator exits normally (by throwing StopException), we don't bother to call __iterclose__ 2. Break example: for v in [1, 2, 3, 4, 5, 6, 7]: print v if v == 3: break Creates an iterator with a _usage_count of 0. The iterator exists after generating 4 numbers, we then call __iterclose__ & the iterator does any necessary cleanup. 3. Convert example #2 to print the next value: with iter([1, 2, 3, 4, 5, 6, 7]) as seven: for v in seven: print v if v == 3: break print 'Next value is: ', seven.next() This will print: 1 2 3 Next value is: 4 How this works: 1. We create an iterator named seven (by calling list.__iter__). 2. We call seven.__enter__ 3. The for loop calls: seven.next() 3 times, and then calls: seven.__iterclose__ Since the _enter_count is 1, the iterator does not do cleanup yet. 4. We call seven.next() 5. We call seven.__exit. The iterator does its cleanup now. 4. More complicated example: with iter([1, 2, 3, 4, 5, 6, 7]) as seven: for v in seven: print v if v == 1: for v in seven: print 'stolen: ', v if v == 3: break if v == 5: break for v in seven: print v * v This will print: 1 stolen: 2 stolen: 3 4 5 36 49 How this works: 1. Same as #3 above, cleanup is done by the __exit__ 5. Alternate way of doing #4. seven = iter([1, 2, 3, 4, 5, 6, 7]) for v in seven: print v if v == 1: for v in seven: print 'stolen: ', v if v == 3: break if v == 5: break for v in seven: print v * v break # Different from #4 seven.__iterclose__() This will print: 1 stolen: 2 stolen: 3 4 5 36 How this works: 1. We create an iterator named seven. 2. The for loops all call seven.__iter__, causing _iter_count to increment. 3. The for loops all call seven.__iterclose__ on exit, decrement _iter_count. 4. The user calls the final __iterclose_, which close the iterator. NOTE: Method #5 is NOT recommended, the 'with' syntax is better. However, something like itertools.zip could call __iterclose__ during cleanup Change to iterators =================== All python iterators would need to add __iterclose__ (possibly with a value of None), __enter__, & __exit__. Third party iterators that do not implenent __iterclose__ cannot be used in a with clause. A new function could be added to itertools, something like: with itertools.with_wrapper(third_party_iterator) as x: ... The 'with_wrapper' would attempt to call __iterclose__ when its __exit__ function is called. On Wed, Oct 19, 2016 at 12:38 AM, Nathaniel Smith <njs@pobox.com> wrote:

On 10/21/2016 03:48 PM, Amit Green wrote:
NOTE: This is my first post to this mailing list, I'm not really sure how to post a message, so I'm attempting a reply-all.
Seems to have worked! :)
Your examples are interesting, but they don't seem to address the issue of closing down for loops that are using generators when those loops exit early: ----------------------------- def some_work(): with some_resource(): for widget in resource: yield widget for pane in some_work(): break: # what happens here? ----------------------------- How does your solution deal with that situation? Or are you saying that this would be closed with your modifications, and if I didn't want the generator to be closed I would have to do: ----------------------------- with some_work() as temp_gen: for pane in temp_gen: break: for another_pane in temp_gen: # temp_gen is still alive here ----------------------------- In other words, instead using the preserve() function, we would use a with statement? -- ~Ethan~

On Fri, Oct 21, 2016 at 3:48 PM, Amit Green <amit.mixie@gmail.com> wrote:
These are interesting ideas! A few general comments: - I don't think we want the "don't bother to call __iterclose__ on exhaustion" functionality --it's actually useful to be able to distinguish between # closes file_handle for line in file_handle: ... and # leaves file_handle open for line in preserve(file_handle): ... To be able to distinguish these cases, it's important that the 'for' loop always call __iterclose__ (which preserve() might then cancel out). - I think it'd be practically difficult and maybe too much magic to add __enter__/__exit__/nesting-depth counts to every iterator implementation. But, the idea of using a context manager for repeated partial iteration is a great idea :-). How's this for a simplified version that still covers the main use cases? @contextmanager def reuse_then_close(it): # TODO: come up with a better name it = iter(it) try: yield preserve(it) finally: iterclose(it) with itertools.reuse_then_close(some_generator(...)) as it: for obj in it: ... # still open here, because our reference to the iterator is wrapped in preserve(...) for obj in it: ... # but then closed here, by the 'with' block -n -- Nathaniel J. Smith -- https://vorpus.org

This is a very interesting proposal. I just wanted to share something I found in my quick search: http://stackoverflow.com/questions/14797930/python-custom-iterator-close-a-f... Could you explain why the accepted answer there doesn't address this issue? class Parse(object): """A generator that iterates through a file""" def __init__(self, path): self.path = path def __iter__(self): with open(self.path) as f: yield from f Best, Neil On Wednesday, October 19, 2016 at 12:39:34 AM UTC-4, Nathaniel Smith wrote:

On Wed, Oct 19, 2016 at 3:38 AM, Neil Girdhar <mistersheik@gmail.com> wrote:
I think the difference is that this new approach guarantees cleanup the exact moment the loop ends, no matter how it ends. If I understand correctly, your approach will do cleanup when the loop ends only if the iterator is exhausted. But if someone zips it with a shorter iterator, uses itertools.islice or something similar, breaks the loop, returns inside the loop, or in some other way ends the loop before the iterator is exhausted, the cleanup won't happen when the iterator is garbage collected. And for non-reference-counting python implementations, when this happens is completely unpredictable.

On Wed, Oct 19, 2016 at 11:08 AM Todd <toddrjen@gmail.com> wrote:
I don't see that. The "cleanup" will happen when collection is interrupted by an exception. This has nothing to do with garbage collection either since the cleanup happens deterministically when the block is ended. If this is the only example, then I would say this behavior is already provided and does not need to be added.

On Wed, Oct 19, 2016 at 10:08 AM, Neil Girdhar <mistersheik@gmail.com> wrote:
BTW it may make this easier to read if we notice that it's essentially a verbose way of writing: def parse(path): with open(path) as f: yield from f
I think there might be a misunderstanding here. Consider code like this, that breaks out from the middle of the for loop: def use_that_generator(): for line in parse(...): if found_the_line_we_want(line): break # -- mark -- do_something_with_that_line(line) With current Python, what will happen is that when we reach the marked line, then the for loop has finished and will drop its reference to the generator object. At this point, the garbage collector comes into play. On CPython, with its reference counting collector, the garbage collector will immediately collect the generator object, and then the generator object's __del__ method will restart 'parse' by having the last 'yield' raise a GeneratorExit, and *that* exception will trigger the 'with' block's cleanup. But in order to get there, we're absolutely depending on the garbage collector to inject that GeneratorExit. And on an implementation like PyPy that doesn't use reference counting, the generator object will become collect*ible* at the marked line, but might not actually be collect*ed* for an arbitrarily long time afterwards. And until it's collected, the file will remain open. 'with' blocks guarantee that the resources they hold will be cleaned up promptly when the enclosing stack frame gets cleaned up, but for a 'with' block inside a generator then you still need something to guarantee that the enclosing stack frame gets cleaned up promptly! This proposal is about providing that thing -- with __(a)iterclose__, the end of the for loop immediately closes the generator object, so the garbage collector doesn't need to get involved. Essentially the same thing happens if we replace the 'break' with a 'raise'. Though with exceptions, things can actually get even messier, even on CPython. Here's a similar example except that (a) it exits early due to an exception (which then gets caught elsewhere), and (b) the invocation of the generator function ended up being kind of long, so I split the for loop into two lines with a temporary variable: def use_that_generator2(): it = parse("/a/really/really/really/really/really/really/really/long/path") for line in it: if not valid_format(line): raise ValueError() def catch_the_exception(): try: use_that_generator2() except ValueError: # -- mark -- ... Here the ValueError() is raised from use_that_generator2(), and then caught in catch_the_exception(). At the marked line, use_that_generator2's stack frame is still pinned in memory by the exception's traceback. And that means that all the local variables are also pinned in memory, including our temporary 'it'. Which means that parse's stack frame is also pinned in memory, and the file is not closed. With the __(a)iterclose__ proposal, when the exception is thrown then the 'for' loop in use_that_generator2() immediately closes the generator object, which in turn triggers parse's 'with' block, and that closes the file handle. And then after the file handle is closed, the exception continues propagating. So at the marked line, it's still the case that 'it' will be pinned in memory, but now 'it' is a closed generator object that has already relinquished its resources. -n -- Nathaniel J. Smith -- https://vorpus.org

On Wed, Oct 19, 2016 at 2:11 PM Nathaniel Smith <njs@pobox.com> wrote:
Yes, I understand that. Maybe this is clearer. This class adds an iterclose to any iterator so that when iteration ends, iterclose is automatically called: def my_iterclose(): print("Closing!") class AddIterclose: def __init__(self, iterable, iterclose): self.iterable = iterable self.iterclose = iterclose def __iter__(self): try: for x in self.iterable: yield x finally: self.iterclose() try: for x in AddIterclose(range(10), my_iterclose): print(x) if x == 5: raise ValueError except: pass

Ohhh, sorry, you want __iterclose__ to happen when iteration is terminated by a break statement as well? Okay, I understand, and that's fair. However, I would rather that people be explicit about when they're iterating (use the iteration protocol) and when they're managing a resource (use a context manager). Trying to figure out where the context manager should go automatically (which is what it sounds like the proposal amounts to) is too difficult to get right, and when you get it wrong you close too early, and then what's the user supposed to do? Suppress the early close with an even more convoluted notation? If there is a problem with people iterating over things without a generator, my suggestion is to force them to use the generator. For example, don't make your object iterable: make the value yielded by the context manager iterable. Best, Neil (On preview, Re: Chris Angelico's refactoring of my code, nice!!) On Wednesday, October 19, 2016 at 4:14:32 PM UTC-4, Neil Girdhar wrote:

Thanks Nathaniel for this great proposal. As I went through your mail, I realized all the comments I wanted to make were already covered in later paragraphs. And I don't think there's a single point I disagree with. I don't have a strong opinion about the synchronous part of the proposal. I actually wouldn't mind the disparity between asynchronous and synchronous iterators if '__aiterclose__' were to be accepted and '__iterclose__' rejected. However, I would like very much to see the asynchronous part happening in python 3.6. I can add another example for the reference: aioreactive (a fresh implementation of Rx for asyncio) is planning to handle subscriptions to a producer using a context manager: https://github.com/dbrattli/aioreactive#subscriptions-are-async-iterables async with listen(xs) as ys: async for x in ys: do_something(x) Like the proposal points out, this happens in the *user* code. With '__aiterclose__', the former example could be simplified as: async for x in listen(xs): do_something(x) Or even better: async for x in xs: do_something(x) Cheers, /Vincent On 10/19/2016 06:38 AM, Nathaniel Smith wrote:

On 17 October 2016 at 09:08, Nathaniel Smith <njs@pobox.com> wrote:
Hi all,
Hi Nathaniel. I'm just reposting what I wrote on pypy-dev (as requested) but under the assumption that you didn't substantially alter your draft - I apologise if some of the quoted text below has already been edited.
I suggested this and I still think that it is the best idea.
I haven't written the kind of code that you're describing so I can't say exactly how I would do it. I imagine though that helpers could be used to solve some of the problems that you're referring to though. Here's a case I do know where the above suggestion is awkward: def concat(filenames): for filename in filenames: with open(filename) as inputfile: yield from inputfile for line in concat(filenames): ... It's still possible to safely handle this use case by creating a helper though. fileinput.input almost does what you want: with fileinput.input(filenames) as lines: for line in lines: ... Unfortunately if filenames is empty this will default to sys.stdin so it's not perfect but really I think introducing useful helpers for common cases (rather than core language changes) should be considered as the obvious solution here. Generally it would have been better if the discussion for PEP 525 has focussed more on helping people to debug/fix dependence on __del__ rather than trying to magically fix broken code.
It would be much simpler to reverse this suggestion and say let's introduce a helper that selectively *enables* the new behaviour you're proposing i.e.: for line in itertools.closeafter(open(...)): ... if not line.startswith('#'): break # <--------------- file gets closed here Then we can leave (async) for loops as they are and there are no backward compatbility problems etc. -- Oscar

On 19 October 2016 at 12:33, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
Looking more closely at this I realise that there is no way to implement closeafter like this without depending on closeafter.__del__ to do the closing. So actually this is not a solution to the problem at all. Sorry for the noise there! -- Oscar

I'm -1 on the idea. Here's why: 1. Python is a very dynamic language with GC and that is one of its fundamental properties. This proposal might make GC of iterators more deterministic, but that is only one case. For instance, in some places in asyncio source code we have statements like this: "self = None". Why? When an exception occurs and we want to save it (for instance to log it), it holds a reference to the Traceback object. Which in turn references frame objects. Which means that a lot of objects in those frames will be alive while the exception object is alive. So in asyncio we go to great lengths to avoid unnecessary runs of GC, but this is an exception! Most of Python code out there today doesn't do this sorts of tricks. And this is just one example of how you can have cycles that require a run of GC. It is not possible to have deterministic GC in real life Python applications. This proposal addresses only *one* use case, leaving 100s of others unresolved. IMO, while GC-related issues can be annoying to debug sometimes, it's not worth it to change the behaviour of iteration in Python only to slightly improve on this. 2. This proposal will make writing iterators significantly harder. Consider 'itertools.chain'. We will have to rewrite it to add the proposed __iterclose__ method. The Chain iterator object will have to track all of its iterators, call __iterclose__ on them when it's necessary (there are a few corner cases). Given that this object is implemented in C, it's quite a bit of work. And we'll have a lot of objects to fix. We can probably update all iterators in standard library (in 3.7), but what about third-party code? It will take many years until you can say with certainty that most of Python code supports __iterclose__ / __aiterclose__. 3. This proposal changes the behaviour of 'for' and 'async for' statements significantly. To do partial iteration you will have to use a special builtin function to guard the iterator from being closed. This is completely non-obvious to any existing Python user and will be hard to explain to newcomers. 4. This proposal only addresses iteration with 'for' and 'async for' statements. If you iterate using a 'while' loop and 'next()' function, this proposal wouldn't help you. Also see the point #2 about third-party code. 5. Asynchronous generators (AG) introduced by PEP 525 are finalized in a very similar fashion to synchronous generators. There is an API to help Python to call event loop to finalize AGs. asyncio in 3.6 (and other event loops in the near future) already uses this API to ensure that *all AGs in a long-running program are properly finalized* while it is being run. There is an extra loop method (`loop.shutdown_asyncgens`) that should be called right before stopping the loop (exiting the program) to make sure that all AGs are finalized, but if you forget to call it the world won't end. The process will end and the interpreter will shutdown, maybe issuing a couple of ResourceWarnings. No exception will pass silently in the current PEP 525 implementation. And if some AG isn't properly finalized a warning will be issued. The current AG finalization mechanism must stay even if this proposal gets accepted, as it ensures that even manually iterated AGs are properly finalized. 6. If this proposal gets accepted, I think we shouldn't introduce it in any form in 3.6. It's too late to implement it for both sync- and async-generators. Implementing it only for async-generators will only add cognitive overhead. Even implementing this only for async-generators will (and should!) delay 3.6 release significantly. 7. To conclude: I'm not convinced that this proposal fully solves the issue of non-deterministic GC of iterators. It cripples iteration protocols to partially solve the problem for 'for' and 'async for' statements, leaving manual iteration unresolved. It will make it harder to write *correct* (async-) iterators. It introduces some *implicit* context management to 'for' and 'async for' statements -- something that IMO should be done by user with an explicit 'with' or 'async with'. Yury

On 2016-10-19 12:38 PM, Random832 wrote:
I understand, but both topics are closely tied together. Cleanup code can be implemented in some __del__ method of some non-iterator object. This proposal doesn't address such cases, it focuses only on iterators. My point is that it's not worth it to *significantly* change iteration (protocols and statements) in Python to only *partially* address the issue. Yury

On Thu, Oct 20, 2016 at 3:38 AM, Random832 <random832@fastmail.com> wrote:
Currently, iterators get passed around casually - you can build on them, derive from them, etc, etc, etc. If you change the 'for' loop to explicitly close an iterator, will you also change 'yield from'? What about other forms of iteration? Will the iterator be closed when it runs out normally? This proposal is to iterators what 'with' is to open files and other resources. I can build on top of an open file fairly easily: @contextlib.contextmanager def file_with_header(fn): with open(fn, "w") as f: f.write("Header Row") yield f def main(): with file_with_header("asdf") as f: """do stuff""" I create a context manager based on another context manager, and I have a guarantee that the end of the main() 'with' block is going to properly close the file. Now, what happens if I do something similar with an iterator? def every_second(it): try: next(it) except StopIteration: return for value in it: yield value try: next(it) except StopIteration: break This will work, because it's built on a 'for' loop. What if it's built on a 'while' loop instead? def every_second_broken(it): try: while True: nextIit) yield next(it) except StopIteration: pass Now it *won't* correctly call the end-of-iteration function, because there's no 'for' loop. This is going to either (a) require that EVERY consumer of an iterator follow this new protocol, or (b) introduce a ton of edge cases. ChrisA

On 19 October 2016 at 19:13, Chris Angelico <rosuav@gmail.com> wrote:
Also, unless I'm misunderstanding the proposal, there's a fairly major compatibility break. At present we have:
With the proposed behaviour, if I understand it, "it" would be closed after the first loop, so resuming "it" for the second loop wouldn't work. Am I right in that? I know there's a proposed itertools function to bring back the old behaviour, but it's still a compatibility break. And code like this, that partially consumes an iterator, is not uncommon. Paul

On 10/19/2016 11:38 AM, Paul Moore wrote:
Agreed. I like the idea in general, but this particular break feels like a deal-breaker. I'd be okay with not having break close the iterator, and either introducing a 'break_and_close' type of keyword or some other way of signalling that we will not be using the iterator any more so go ahead and close it. Does that invalidate, or take away most of value of, the proposal? -- ~Ethan~

On Wed, Oct 19, 2016 at 11:38 AM, Paul Moore <p.f.moore@gmail.com> wrote:
Right -- did you reach the "transition plan" section? (I know it's wayyy down there.) The proposal is to hide this behind a __future__ at first + a mechanism during the transition period to catch code that depends on the old behavior and issue deprecation warnings. But it is a compatibility break, yes. -n -- Nathaniel J. Smith -- https://vorpus.org

On Wed, Oct 19, 2016 at 12:21 PM, Nathaniel Smith <njs@pobox.com> wrote:
I should also say, regarding your specific example, I guess it's an open question whether we would want list_iterator.__iterclose__ to actually do anything. It could flip the iterator to a state where it always raises StopIteration, or RuntimeError, or it could just be a no-op that allows iteration to continue normally afterwards. list_iterator doesn't have a close method right now, and it certainly can't "close" the underlying list (whatever that would even mean), so I don't think there's a strong expectation that it should do anything in particular. The __iterclose__ contract is that you're not supposed to call __next__ afterwards, so there's no real rule about what happens if you do. And there aren't strong conventions right now about what happens when you try to iterate an explicitly closed iterator -- files raise an error, generators just act like they were exhausted. So there's a few options that all seem more-or-less reasonable and I don't know that it's very important which one we pick. -n -- Nathaniel J. Smith -- https://vorpus.org

On 2016-10-19 3:33 PM, Nathaniel Smith wrote:
Making 'for' loop to behave differently for built-in containers (i.e. make __iterclose__ a no-op for them) will only make this whole thing even more confusing. It has to be consistent: if you partially iterate over *anything* without wrapping it with `preserve()`, it should always close the iterator. Yury

On Wed, Oct 19, 2016 at 1:33 PM, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
You're probably right. My gut is leaning the same way, I'm just hesitant to commit because I haven't thought about it for long. But I do stand by the claim that this is probably not *that* important either way :-). -n -- Nathaniel J. Smith -- https://vorpus.org

You know, I'm actually starting to lean towards this proposal and away from my earlier objections... On Wed, Oct 19, 2016 at 12:33:57PM -0700, Nathaniel Smith wrote:
That seems like the most obvious. [...]
If I recall correctly, in your proposal you use language like "behaviour is undefined". I don't like that language, because it sounds like undefined behaviour in C, which is something to be avoided like the plague. I hope I don't need to explain why, but for those who may not understand the dangers of "undefined behaviour" as per the C standard, you can start here: https://randomascii.wordpress.com/2014/05/19/undefined-behavior-can-format-y... So let's make it clear that what we actually mean is not C-ish undefined behaviour, where the compiler is free to open a portal to the Dungeon Dimensions or use Guido's time machine to erase code that executes before the undefined code: https://blogs.msdn.microsoft.com/oldnewthing/20140627-00/?p=633/ but rather ordinary, standard "implementation-dependent behaviour". If you call next() on a closed iterator, you'll get whatever the iterator happens to do when it is closed. That will be *recommended* to raise whatever error is appropriate to the iterator, but not enforced. That makes it just like the part of the iterator protocol that says that once an iterator raise StopIterator, it should always raise StopIterator. Those that don't are officially called "broken", but they are allowed and you can write one if you want to. Shorter version: - calling next() on a closed iterator is expected to be an error of some sort, often RuntimeError error, but the iterator is free to use a different error if that makes sense (e.g. closed files); - if your own iterator classes break that convention, they will be called "broken", but nobody will stop you from writing such "broken" iterators. -- Steve

On 21 October 2016 at 10:53, Steven D'Aprano <steve@pearwood.info> wrote:
So - does this mean "unless you understand what preserve() does, you're OK to not use it and your code will continue to work as before"? If so, then I'd be happy with this. But I genuinely don't know (without going rummaging through docs) what that statement means in any practical sense. Paul

On Fri, Oct 21, 2016 at 11:07:46AM +0100, Paul Moore wrote:
I've changed my mind -- I think maybe it should do nothing, and preserve the current behaviour of lists. I'm now more concerned with keeping current behaviour as much as possible than creating some sort of consistent error condition for all iterators. Consistency is over-rated, and we already have inconsistency here: file iterators behave differently from list iterators, because they can be closed: py> f = open('/proc/mdstat', 'r') py> a = list(f) py> b = list(f) py> len(a), len(b) (20, 0) py> f.close() py> c = list(f) Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: I/O operation on closed file. We don't need to add a close() to list iterators just so they are consistent with files. Just let __iterclose__ be a no-op.
Almost. Code like this will behave exactly the same as it currently does: for x in it: process(x) y = list(it) If it is a file object, the second call to list() will raise ValueError; if it is a list_iterator, or generator, etc., y will be an empty list. That part (I think) shouldn't change. What *will* change is code that partially processes the iterator in two different places. A simple example: py> it = iter([1, 2, 3, 4, 5, 6]) py> for x in it: ... if x == 4: break ... py> for x in it: ... print(x) ... 5 6 This *may* change. With this proposal, the first loop will "close" the iterator when you exit from the loop. For a list, there's no finaliser, no __del__ to call, so we can keep the current behaviour and nobody will notice any difference. But if `it` is a file iterator instead of a list iterator, the file will be closed when you exit the first for-loop, and the second loop will raise ValueError. That will be different. The fix here is simple: protect the first call from closing: for x in itertools.preserve(it): # preserve, protect, whatever ... Or, if `it` is your own class, give it a __iterclose__ method that does nothing. This is a backwards-incompatible change, so I think we would need to do this: (1) In Python 3.7, we introduce a __future__ directive: from __future__ import iterclose to enable the new behaviour. (Remember, future directives apply on a module-by-module basis.) (2) Without the directive, we keep the old behaviour, except that warnings are raised if something will change. (3) Then in 3.8 iterclose becomes the default, the warnings go away, and the new behaviour just happens. If that's too fast for people, we could slow it down: (1) Add the future directive to Python 3.7; (2) but no warnings by default (you have to opt-in to the warnings with an environment variable, or command-line switch). (3) Then in 3.8 the warnings are on by default; (4) And the iterclose behaviour doesn't become standard until 3.9. That means if this change worries you, you can ignore it until you migrate to 3.8 (which won't be production-ready until about 2020 or so), and don't have to migrate your code until 3.9, which will be a year or two later. But early adopters can start targetting the new functionality from 3.7 if they like. I don't think there's any need for a __future__ directive for aiterclose, since there's not enough backwards-incompatibility to care about. (I think, but don't mind if people disagree.) That can happen starting in 3.7, and when people complain that their syncronous generators don't have deterministic garbage collection like their asyncronous ones do, we can point them at the future directive. Bottom line is: at first I thought this was a scary change that would break too much code. But now I think it won't break much, and we can ease into it really slowly over two or three releases. So I think that the cost is probably low. I'm still not sure on how great the benefit will be, but I'm leaning towards a +1 on this. -- Steve

On 2016-10-19 12:21, Nathaniel Smith wrote:
To me this makes the change too hard to swallow. Although the issues you describe are real, it doesn't seem worth it to me to change the entire semantics of for loops just for these cases. There are lots of for loops that are not async and/or do not rely on resource cleanup. This will change how all of them work, just to fix something that sometimes is a problem for some resource-wrapping iterators. Moreover, even when the iterator does wrap a resource, sometimes I want to be able to stop and resume iteration. It's not uncommon, for instance, to have code using the csv module that reads some rows, pauses to make a decision (e.g., to parse differently depending what header columns are present, or skip some number of rows), and then resumes. This would increase the burden of updating code to adapt to the new breakage (since in this case the programmer would likely have to, or at least want to, think about what is going on rather than just blindly wrapping everything with protect() ). -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown

On 19 October 2016 at 20:21, Nathaniel Smith <njs@pobox.com> wrote:
I missed that you propose phasing this in, but it doesn't really alter much, I think the current behaviour is valuable and common, and I'm -1 on breaking it. It's just too much of a fundamental change to how loops and iterators interact for me to be comfortable with it - particularly as it's only needed for a very specific use case (none of my programs ever use async - why should I have to rewrite my loops with a clumsy extra call just to cater for a problem that only occurs in async code?) IMO, and I'm sorry if this is controversial, there's a *lot* of new language complexity that's been introduced for the async use case, and it's only the fact that it can be pretty much ignored by people who don't need or use async features that makes it acceptable (the "you don't pay for what you don't use" principle). The problem with this proposal is that it doesn't conform to that principle - it has a direct, negative impact on users who have no interest in async. Paul

On Wed, Oct 19, 2016 at 3:07 PM, Paul Moore <p.f.moore@gmail.com> wrote:
Oh, goodness, no -- like Yury said, the use cases here are not specific to async at all. I mean, none of the examples are async even :-). The motivation here is that prompt (non-GC-dependent) cleanup is a good thing for a variety of reasons: determinism, portability across Python implementations, proper exception propagation, etc. async does add yet another entry to this list, but I don't the basic principle is controversial. 'with' blocks are a whole chunk of extra syntax that were added to the language just for this use case. In fact 'with' blocks weren't even needed for the functionality -- we already had 'try/finally', they just weren't ergonomic enough. This use case is so important that it's had multiple rounds of syntax directed at it before async/await was even a glimmer in C#'s eye :-). BUT, currently, 'with' and 'try/finally' have a gap: if you use them inside a generator (async or not, doesn't matter), then they often fail at accomplishing their core purpose. Sure, they'll execute their cleanup code whenever the generator is cleaned up, but there's no ergonomic way to clean up the generator. Oops. I mean, you *could* respond by saying "you should never use 'with' or 'try/finally' inside a generator" and maybe add that as a rule to your style manual and linter -- and some people in this thread have suggested more-or-less that -- but that seems like a step backwards. This proposal instead tries to solve the problem of making 'with'/'try/finally' work and be ergonomic in general, and it should be evaluated on that basis, not on the async/await stuff. The reason I'm emphasizing async generators is that they effect the timeline, not the motivation: - PEP 525 actually does add async-only complexity to the language (the new GC hooks). It doesn't affect non-async users, but it is still complexity. And it's possible that if we have iterclose, then we don't need the new GC hooks (though this is still an open discussion :-)). If this is true, then now is the time to act, while reverting the GC hooks change is still a possibility; otherwise, we risk the situation where we add iterclose later, decide that the GC hooks no longer provide enough additional value to justify their complexity... but we're stuck with them anyway. - For synchronous iteration, the need for a transition period means that the iterclose proposal will take a few years to provide benefits. For asynchronous iteration, it could potentially start providing benefits much sooner -- but there's a very narrow window for that, before people start using async generators and backwards compatibility constraints kick in. If we delay a few months then we'll probably have to delay a few years. ...that said, I guess there is one way that async/await directly affected my motivation here, though it's not what you think :-). async/await have gotten me experimenting with writing network servers, and let me tell you, there is nothing that focuses the mind on correctness and simplicity like trying to write a public-facing asynchronous network server. You might think "oh well if you're trying to do some fancy rocket science and this is a feature for rocket scientists then that's irrelevant to me", but that's actually not what I mean at all. The rocket science part is like, trying to run through all possible execution orders of the different callbacks in your head, or to mentally simulate what happens if a client shows up that writes at 1 byte/second. When I'm trying to do that,then the last thing I want is be distracted by also trying to figure out boring mechanical stuff like whether or not the language is actually going to execute my 'finally' block -- yet right now that's a question that actually cannot be answered without auditing my whole source code! And that boring mechanical stuff is still boring mechanical stuff when writing less terrifying code -- it's just that I'm so used to wasting a trickle of cognitive energy on this kind of thing it that normally I don't notice it so much. And, also, regarding the "clumsy extra call": the preserve() call isn't just arbitrary clumsiness -- it's a signal that hey, you're turning off a safety feature. Now the language won't take care of this cleanup for you, so it's your responsibility. Maybe you should think about how you want to handle that. Of course your decision could be "whatever, this is a one-off script, the GC is good enough". But it's probably worth the ~0.5 seconds of thought to make that an active, conscious decision, because they aren't all one-off scripts. -n -- Nathaniel J. Smith -- https://vorpus.org

On Thu, Oct 20, 2016 at 11:03:11PM -0700, Nathaniel Smith wrote:
Perhaps it should be. The very first thing you say is "determinism". Hmmm. As we (or at least, some of us) move towards more async code, more threads or multi- processing, even another attempt to remove the GIL from CPython which will allow people to use threads with less cost, how much should we really value determinism? That's not a rhetorical question -- I don't know the answer. Portability across Pythons... if all Pythons performed exactly the same, why would we need multiple implementations? The way I see it, non-deterministic cleanup is the cost you pay for a non-reference counting implementation, for those who care about the garbage collection implementation. (And yes, ref counting is garbage collection.) [...]
How often is this *actually* a problem in practice? On my system, I can open 1000+ files as a regular user. I can't even comprehend opening a tenth of that as an ordinary application, although I can imagine that if I were writing a server application things would be different. But then I don't expect to write server applications in quite the same way as I do quick scripts or regular user applications. So it seems to me that a leaked file handler or two normally shouldn't be a problem in practice. They'll be friend when the script or application closes, and in the meantime, you have hundreds more available. 90% of the time, using `with file` does exactly what we want, and the times it doesn't (because we're writing a generator that isn't closed promptly) 90% of those times it doesn't matter. So (it seems to me) that you're talking about changing the behaviour of for-loops to suit only a small proportion of cases: maybe 10% of 10%. It is not uncommon to pass an iterator (such as a generator) through a series of filters, each processing only part of the iterator: it = generator() header = collect_header(it) body = collect_body(it) tail = collect_tail(it) Is it worth disrupting this standard idiom? I don't think so. -- Steve

On Fri, Oct 21, 2016 at 12:12 AM, Steven D'Aprano <steve@pearwood.info> wrote:
Hmm -- and yet "with" was added, and I an't imageine that its largest use-case is with ( ;-) ) open: with open(filename, mode) as my_file: .... .... And yet for years I happily counted on reference counting to close my files, and was particularly happy with: data = open(filename, mode).read() I really liked that that file got opened, read, and closed and cleaned up right off the bat. And then context managers were introduced. And it seems to be there is a consensus in the Python community that we all should be using them when working on files, and I myself have finally started routinely using them, and teaching newbies to use them -- which is kind of a pain, 'cause I want to have them do basic file reading stuff before I explain what a "context manager" is. Anyway, my point is that the broader Python community really has been pretty consistent about making it easy to write code that will work the same way (maybe not with the same performance) across python implementations. Ans specifically with deterministic resource management. On my system, I can open 1000+ files as a regular user. I can't even
well, what you can image isn't really the point -- I've bumped into that darn open file limit in my work, which was not a server application (though it was some pretty serious number crunching...). And I'm sure I'm not alone. OK, to be fair that was a poorly designed library, not an issue with determinism of resource management (through designing the lib well WOULD depend on that) But then I don't expect to write server applications in
quite the same way as I do quick scripts or regular user applications.
Though data analysts DO write "quick scripts" that might need to do things like access 100s of files...
that was the case with "with file" from the beginning -- particularly on cPython. And yet we all thought it was a great idea.
I don't see what the big overhead is here. for loops would get a new feature, but it would only be used by the objects that chose to implement it. So no huge change. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On 21 October 2016 at 21:59, Chris Barker <chris.barker@noaa.gov> wrote:
But the point is that the feature *would* affect people who don't need it. That's what I'm struggling to understand. I keep hearing "most code won't be affected", but then discussions about how we ensure that people are warned of where they need to add preserve() to their existing code to get the behaviour they already have. (And, of course, they need to add an "if we're on older pythons, define a no-op version of preserve() backward compatibility wrapper if they want their code to work cross version). I genuinely expect preserve() to pretty much instantly appear on people's lists of "python warts", and that bothers me. But I'm reaching the point where I'm just saying the same things over and over, so I'll bow out of this discussion now. I remain confused, but I'm going to have to trust that the people who have got a handle on the issue have understood the point I'm making, and have it covered. Paul

On 22 October 2016 at 06:59, Chris Barker <chris.barker@noaa.gov> wrote:
This is actually a case where style guidelines would ideally differ between between scripting use cases (let the GC handle it whenever, since your process will be terminating soon anyway) and library(/framework/application) development use cases (promptly clean up after yourself, since you don't necessarily know your context of use). However, that script/library distinction isn't well-defined in computing instruction in general, and most published style guides are written by library/framework/application developers, so students and folks doing ad hoc scripting tend to be the recipients of a lot of well-meaning advice that isn't actually appropriate for them :( Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 23 October 2016 at 02:17, Nick Coghlan <ncoghlan@gmail.com> wrote:
Pondering this overnight, I realised there's a case where folks using Python primarily as a scripting language can still run into many of the resource management problems that arise in larger applications: IPython notebooks, where the persistent kernel can keep resources alive for a surprisingly long time in the absence of a reference counting GC. Yes, they have the option of just restarting the kernel (which many applications don't have), but it's still a nicer user experience if we can help them avoid having those problems arise in the first place. This is likely mitigated in practice *today* by IPython users mostly being on CPython for access to the Scientific Python stack, but we can easily foresee a future where the PyPy community have worked out enough of their NumPy compatibility and runtime redistribution challenges that it becomes significantly more common to be using notebooks against Python kernels that don't use automatic reference counting. I'm significantly more amenable to that as a rationale for pursuing non-syntactic approaches to local resource management than I am the notion of pursuing it for the sake of high performance application development code. Chris, would you be open to trying a thought experiment with some of your students looking at ways to introduce function-scoped deterministic resource management *before* introducing with statements? Specifically, I'm thinking of a progression along the following lines: # Cleaned up whenever the interpreter gets around to cleaning up the function locals def readlines_with_default_resource_management(fname): return open(fname).readlines() # Cleaned up on function exit, even if the locals are still referenced from an exception traceback # or the interpreter implementation doesn't use a reference counting GC from local_resources import function_resource def readlines_with_declarative_cleanup(fname): return function_resource(open(fname)).readlines() # Cleaned up at the end of the with statement def readlines_with_imperative_cleanup(fname): with open(fname) as f: return f.readlines() The idea here is to change the requirement for new developers from "telling the interpreter what to *do*" (which is the situation we have for context managers) to "telling the interpreter what we *want*" (which is for it to link a managed resource with the lifecycle of the currently running function call, regardless of interpreter implementation details) Under that model, Inada-san's recent buffer snapshotting proposal would effectively be an optimised version of the one liner: def snapshot(data, limit, offset=0): return bytes(function_resource(memoryview(data))[offset:limit]) The big refactoring benefit that this feature would offer over with statements is that it doesn't require a structural change to the code - it's just wrapping an existing expression in a new function call that says "clean this up promptly when the function terminates, even if it's still part of a reference cycle, or we're not using a reference counting GC". Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sat, Oct 22, 2016 at 8:22 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
This is likely mitigated in practice *today* by IPython users mostly being on CPython for access to the Scientific Python stack,
sure -- though there is no reason that Jupyter notebooks aren't really useful to all sort of non-data-crunching tasks. It's just that that's the community it was born in. I can imagine they would be great for database exploration/management, for instance. Chris, would you be open to trying a thought experiment with some of your students looking at ways to introduce function-scoped
deterministic resource management *before* introducing with statements?
At first thought, talking about this seems like it would just confuse newbies even MORE. Most of my students really want simple examples they can copy and then change for their specific use case. But I do have some pretty experienced developers (new to Python, but not programming) in my classes, too, that I might be able to bring this up with. # Cleaned up whenever the interpreter gets around to cleaning up
I can see that, but I'm not sure newbies will -- it either case, you have to think about what you want -- which is the complexity I'm trying to avoid at this stage. Until much later, when I get into weak references, I can pretty much tell people that python will take care of itself with regards to resource management. That's what context mangers are for, in fact. YOU can use: with open(...) as infile: ..... Without needing to know what actually has to be "cleaned up" about a file. In the case of files, it's a close() call, simple enough (in the absence of Exceptions...), but with a database connection or something, it could be a lot more complex, and it's nice to know that it will simply be taken care of for you by the context manager. The big refactoring benefit that this feature would offer over with
hmm -- that would be simpler in one sense, but wouldn't it require a new function to be defined for everything you might want to do this with? rather than the same "with" syntax for everything? -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Chris Barker wrote:
Nick Coghlan wrote:
I'm with Chris, I think: this seems inappropriate to me. A student has to be rather sophisticated to understand resource management at all in Python. Eg, generators and closures can hang on to resources between calls, yet there's no syntactic marker at the call site.
I think this attempt at a distinction is spurious. On the syntactic side, with open("file") as f: results = read_and_process_lines(f) the with statement effectively links management of the file resource to the lifecycle of read_and_process_lines. (Yes, I know what you mean by "link" -- will "new developers"?) On the semantic side, constructs like closures and generators (which they may be cargo- culting!) mean that it's harder to link resource management to (syntactic) function calls than a new developer might think. (Isn't that Nathaniel's motivation for the OP?) And then there's the loop that may not fully consume an iterator problem: that must be explicitly decided -- the question for language designers is which of "close generators on loop exit" or "leave generators open on loop exit" should be marked with explicit syntax -- and what if you've got two generators involved, and want different decisions for both? Chris:
Indeed.
I hope you phrase that very carefully. Python takes care of itself, but does not take care of the use case. That's the programmer's responsibility. In a very large number of use cases, including the novice developer's role in a large project, that is a distinction that makes no difference. But the "close generators on loop exit" (or maybe not!) use case makes it clear that in general the developer must explicitly manage resources.
But somebody has to write that context manager. I suppose in the organizational context imagined here, it was written for the project by the resource management wonk in the group, and the new developer just cargo-cults it at first.
Even if it can be done with a single "ensure_cleanup" function, Python isn't Haskell. I think context management deserves syntax to mark it. After all, from the "open and read one file" scripting standpoint, there's really not a difference between f = open("file") process(f) and with open("file") as f: process(f) (see "taking care of Python ~= taking care of use case" above). But the with statement and indentation clearly mark the call to process as receiving special treatment. As Chris says, the developer doesn't need to know anything but that the object returned by the with expression participates "appropriately" in the context manager protocol (which she may think of as the "with protocol"!, ie, *magic*) and gets the "special treatment" it needs. So (for me) this is full circle: "with" context management is what we need, but it interacts poorly with stateful "function" calls -- and that's what Nathaniel proposes to deal with.

On 25 October 2016 at 11:59, Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
This is my read of Nathaniel's motivation as well, and hence my proposal: rather than trying to auto-magically guess when a developer intended for their resource management to be linked to the current executing frame (which requires fundamentally changing how iteration works in a way that breaks the world, and still doesn't solve the problem in general), I'm starting to think that we instead need a way to let them easily say "This resource, the one I just created or have otherwise gained access to? Link its management to the lifecycle of the currently running function or frame, so it gets cleaned up when it finishes running". Precisely *how* a particular implementation did that resource management would be up to the particular Python implementation, but one relatively straightforward way would be to use contextlib.ExitStack under the covers, and then when the frame finishes execution have a check that goes: - did the lazily instantiated ExitStack instance get created during frame execution? - if yes, close it immediately, thus reclaiming all the registered resources The spelling of the *surface* API though is something I'd need help from educators in designing - my problem is that I already know all the moving parts and how they fit together (hence my confidence that something like this would be relatively easy to implement, at least in CPython, if we decided we wanted to do it), but I *don't* know what kinds for terms could be used in the API if we wanted to make it approachable to relative beginners. My initial thought would be to offer: from local_resources import function_resource and: from local_resources import frame_resource Where the only difference between the two is that the first one would complain if you tried to use it outside a normal function body, while the second would be usable anywhere (function, class, module, generator, coroutine). Both would accept and automatically enter context managers as input, as if you'd wrapped the rest of the frame body in a with statement. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 2016-10-25 4:33 AM, Nick Coghlan wrote:
But how would it help with a partial iteration over generators with a "with" statement inside? def it(): with open(file) as f: for line in f: yield line Nathaniel proposal addresses this by fixing "for" statements, so that the outer loop that iterates over "it" would close the generator once the iteration is stopped. With your proposal you want to attach the opened file to the frame, but you'd need to attach it to the frame of *caller* of "it", right? Yury

On 26 October 2016 at 01:59, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
Every frame in the stack would still need to opt in to deterministic cleanup of its resources, but the difference is that it becomes an inline operation within the expression creating the iterator, rather than a complete restructuring of the function: def iter_consumer(fname): for line in function_resource(open(fname)): ... It doesn't matter *where* the iterator is being used (or even if you received it as a parameter), you get an easy way to say "When this function exits, however that happens, clean this up". Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 25 October 2016 at 03:32, Chris Barker <chris.barker@noaa.gov> wrote:
Nope, hence the references to contextlib.ExitStack: https://docs.python.org/3/library/contextlib.html#contextlib.ExitStack That's a tool for dynamic manipulation of context managers, so even today you can already write code like this:
The setup code to support it is just a few lines of code:
Plus the example context manager definition:
So the gist of my proposal (from an implementation perspective) is that if we give frame objects an ExitStack instance (or an operational equivalent) that can be created on demand and will be cleaned up when the frame exits (regardless of how that happens), then we can define an API for adding "at frame termination" callbacks (including making it easy to dynamically add context managers to that stack) without needing to define your own scaffolding for that feature - it would just be a natural part of the way frame objects work. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sat, Oct 22, 2016 at 9:17 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Hmm -- interesting idea -- and I recall Guido bringing something like this up on one of these lists not too long ago -- "scripting" use cases really are different that "systems programming" However, that script/library distinction isn't well-defined in
computing instruction in general,
no it's not -- except in the case of "scripting languages" vs. "systems languages" -- you can go back to the classic Ousterhout paper: https://www.tcl.tk/doc/scripting.html But Python really is suitable for both use cases, so tricky to know how to teach. And my classes, at least, have folks with a broad range of use-cases in mind, so I can't choose one way or another. And, indeed, there is no small amount of code (and coder) that starts out as a quicky script, but ends up embedded in a larger system down the road. And (another and?) one of the great things ABOUT Python is that is IS suitable for such a broad range of use-cases. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On 25 October 2016 at 03:16, Chris Barker <chris.barker@noaa.gov> wrote:
Steven Lott was pondering the same question a few years back (regarding his preference for teaching procedural programming before any other paradigms), so I had a go at articulating the general idea: http://www.curiousefficiency.org/posts/2011/08/scripting-languages-and-suita... The main paragraph is still pretty unhelpful though, since I handwave away the core of the problem as "the art of software design": """A key part of the art of software design is learning how to choose an appropriate level of complexity for the problem at hand - when a problem calls for a simple script, throwing an entire custom application at it would be overkill. On the other hand, trying to write complex applications using only scripts and no higher level constructs will typically lead to an unmaintainable mess.""" Cheers, Nick. P.S. I'm going to stop now since we're getting somewhat off-topic, but I wanted to highlight this excellent recent article on the challenges of determining the level of "suitable complexity" for any given software engineering problem: https://hackernoon.com/how-to-accept-over-engineering-for-what-it-really-is-... -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 21 October 2016 at 07:03, Nathaniel Smith <njs@pobox.com> wrote:
Ah I follow now. Sorry for the misunderstanding, I'd skimmed a bit more than I realised I had. However, it still feels to me that the code I currently write doesn't need this feature, and I'm therefore unclear as to why it's sufficiently important to warrant a backward compatibility break. It's quite possible that I've never analysed my code well enough to *notice* that there's a problem. Or that I rely on CPython's GC behaviour without realising it. Also, it's honestly very rare that I need deterministic cleanup, as opposed to guaranteed cleanup - running out of file handles, for example, isn't really a problem I encounter. But it's also possible that it's a code design difference. You use the example (from memory, sorry if this is slightly different to what you wrote): def filegen(filename): with open(filename) as f: for line in f: yield line # caller for line in filegen(name): ... I wouldn't normally write a function like that - I'd factor it differently, with the generator taking an open file (or file-like object) and the caller opening the file: def filegen(fd): for line in f: yield line # caller with open(filename) as fd: for line in filegen(fd): ... With that pattern, there's no issue. And the filegen function is more generic, as it can be used with *any* file-like object (a StringIO, for testing, for example).
Well, if preserve() did mean just that, then that would be OK. I'd never use it, as I don't care about deterministic cleanup, so it makes no difference to me if it's on or off. But that's not the case - in fact, preserve() means "give me the old Python 3.5 behaviour", and (because deterministic cleanup isn't important to me) that's a vague and unclear distinction. So I don't know whether my code is affected by the behaviour change and I have to guess at whether I need preserve(). What I think is needed here is a clear explanation of how this proposal affects existing code that *doesn't* need or care about cleanup. The example that's been mentioned is with open(filename) as f: for line in f: if is_end_of_header(line): break process_header(line) for line in f: process_body(line) and similar code that relies on being able to part-process an iterator in a for loop, and then have a later loop pick up where the first left off. Most users of iterators and generators probably have little understanding of GeneratorExit, closing generators, etc. And that's a good thing - it's why iterators in Python are so useful. So the proposal needs to explain how it impacts that sort of user, in terms that they understand. It's a real pity that the explanation isn't "you can ignore all of this, as you aren't affected by the problem it's trying to solve" - that's what I was getting at. At the moment, the take home message for such users feels like it's "you might need to scatter preserve() around your code, to avoid the behaviour change described above, which you glazed over because it talked about all that coroutiney stuff you don't understand" :-) Paul Paul

On Fri, Oct 21, 2016 at 11:03:51AM +0100, Paul Moore wrote:
I now believe that's not necessarily the case. I think that the message should be: - If your iterator class has a __del__ or close method, then you need to read up on __(a)iterclose__. - If you iterate over open files twice, then all you need to remember is that the file will be closed when you exit the first loop. To avoid that auto-closing behaviour, use itertools.preserve(). - Iterating over lists, strings, tuples, dicts, etc. won't change, since they don't have __del__ or close() methods. I think that covers all the cases the average Python code will care about. -- Steve

On 21 October 2016 at 12:23, Steven D'Aprano <steve@pearwood.info> wrote:
OK, that's certainly a lot less scary. Some thoughts, remain, though: 1. You mention files. Presumably (otherwise what would be the point of the change?) there will be other iterables that change similarly. There's no easy way to know in advance. 2. Cleanup protocols for iterators are pretty messy now - __del__, close, __iterclose__, __aiterclose__. What's the chance 3rd party implementers get something wrong? 3. What about generators? If you write your own generator, you don't control the cleanup code. The example: def mygen(name): with open(name) as f: for line in f: yield line is a good example - don't users of this generator need to use preserve() in order to be able to do partial iteration? And yet how would the writer of the generator know to document this? And if it isn't documented, how does the user of the generator know preserve is needed? My feeling is that this proposal is a relatively significant amount of language churn, to solve a relatively niche problem, and furthermore one that is actually only a problem to non-CPython implementations[1]. My instincts are that we need to back off on the level of such change, to give users a chance to catch their breath. We're not at the level of where we need something like the language change moratorium (PEP 3003) but I don't think it would do any harm to give users a chance to catch their breath after the wave of recent big changes (async, typing, path protocol, f-strings, funky unpacking, Windows build and installer changes, ...). To put this change in perspective - we've lived without it for many years now, can we not wait a little while longer?
And yet, it still seems to me that it's going to force me to change (maybe not much, but some of) my existing code, for absolutely zero direct benefit, as I don't personally use or support PyPy or any other non-CPython implementations. Don't forget that PyPy still doesn't even implement Python 3.5 - so no-one benefits from this change until PyPy supports Python 3.8, or whatever version this becomes the default in. It's very easy to misuse an argument like this to block *any* sort of change, and that's not my intention here - but I am trying to understand what the real-world issue is here, and how (and when!) this proposal would allow people to write code to fix that problem. At the moment, it feels like: * The problem is file handle leaks in code running under PyPy * The ability to fix this will come in around 4 years (random guess as to when PyPy implements Python 3.8, plus an assumption that the code needing to be fixed can immediately abandon support for all earlier versions of PyPy). Any other cases seem to me to be theoretical at the moment. Am I being unfair in this assessment? (It feels like I might be, but I can't be sure how). Paul [1] As I understand it. CPython's refcounting GC makes this a non-issue, correct?

Le 21/10/16 à 14:35, Paul Moore a écrit :
[1] As I understand it. CPython's refcounting GC makes this a non-issue, correct?
Wrong. Any guarantee that you think the CPython GC provides goes out of the window as soon as you have a reference cycle. Refcounting does not actually make GC deterministic, it merely hides the problem away from view. For instance, on CPython 3.5, running this code: #%%%%%%%%% class some_resource: def __enter__(self): print("Open resource") return 42 def __exit__(self, *args): print("Close resource") def some_iterator(): with some_resource() as s: yield s def main(): it = some_iterator() for i in it: if i == 42: print("The answer is", i) break print("End loop") # later ... try: 1/0 except ZeroDivisionError as e: exc = e main() print("Exit") #%%%%%%%%%% produces: Open resource The answer is 42 End loop Exit Close resource What happens is that 'exc' holds a cyclic reference back to the main() frame, which prevents it from being destroyed when the function exits, and that frame, in turn, holds a reference to the iterator, via the local variable 'it'. And so, the iterator remains alive, and the resource unclosed, until the next garbage collection.

On Wed, Oct 19, 2016 at 2:38 PM, Paul Moore <p.f.moore@gmail.com> wrote:
I may very well be misunderstanding the purpose of the proposal, but that is not how I saw it being used. I thought of it being used to clean up things that happened in the loop, rather than clean up the iterator itself. This would allow the iterator to manage events that occurred in the body of the loop. So it would be more like this scenario:
In this case, objiterer would do some cleanup related to obj1 and obj2 in the first loop and some cleanup related to obj3 and obj4 in the second loop. There would be no backwards-compatibility break, the method would be purely opt-in and most typical iterators wouldn't need it. However, in this case perhaps it might be better to have some method that is called after every loop, no matter how the loop is terminated (break, continue, return). This would allow the cleanup to be done every loop rather than just at the end.

On Wed, Oct 19, 2016 at 11:13 AM, Chris Angelico <rosuav@gmail.com> wrote:
Oh good point -- 'yield from' definitely needs a mention. Fortunately, I think it's pretty easy: the only way the child generator in a 'yield from' can be aborted early is if the parent generator is aborted early, so the semantics you'd want are that iff the parent generator is closed, then the child generator is also closed. 'yield from' already implements those semantics :-). So the only remaining issue is what to do if the child iterator completes normally, and in this case I guess 'yield from' probably should call '__iterclose__' at that point, like the equivalent for loop would.
The iterator is closed if someone explicitly closes it, either by calling the method by hand, or by passing it to a construct that calls that method -- a 'for' loop without preserve(...), etc. Obviously any given iterator's __next__ method could decide to do whatever it wants when it's exhausted normally, including executing its 'close' logic, but there's no magic that causes __iterclose__ to be called here. The distinction between exhausted and exhausted+closed is useful: consider some sort of file-wrapping iterator that implements __iterclose__ as closing the file. Then this exhausts the iterator and then closes the file: for line in file_wrapping_iter: ... and this also exhausts the iterator, but since __iterclose__ is not called, it doesn't close the file, allowing it to be re-used: for line in preserve(file_wrapping_iter): ... OTOH there is one important limitation to this, which is that if you're implementing your iterator by using a generator, then generators in particular don't provide any way to distinguish between exhausted and exhausted+closed (this is just how generators already work, nothing to do with this proposal). Once a generator has been exhausted, its close() method becomes a no-op.
BTW, it's probably easier to read this way :-): def every_second(it): for i, value in enumerate(it): if i % 2 == 1: yield value
Right. If the proposal is accepted then a lot (I suspect the vast majority) of iterator consumers will automatically DTRT because they're already using 'for' loops or whatever; for those that don't, they'll do whatever they're written to do, and that might or might not match what users have come to expect. Hence the transition period, ResourceWarnings and DeprecationWarnings, etc. I think the benefits are worth it, but there certainly is a transition cost. -n -- Nathaniel J. Smith -- https://vorpus.org

Hi Yury, Thanks for the detailed comments! Replies inline below. On Wed, Oct 19, 2016 at 8:51 AM, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
Maybe I'm misunderstanding, but I think those 100s of other cases where you need deterministic cleanup are why 'with' blocks were invented, and in my experience they work great for that. Once you get in the habit, it's very easy and idiomatic to attach a 'with' to each file handle, socket, etc., at the point where you create it. So from where I stand, it seems like those 100s of unresolved cases actually are resolved? The problem is that 'with' blocks are great, and generators are great, but when you put them together into the same language there's this weird interaction that emerges, where 'with' blocks inside generators don't really work for their intended purpose unless you're very careful and willing to write boilerplate. Adding deterministic cleanup to generators plugs this gap. Beyond that, I do think it's a nice bonus that other iterables can take advantage of the feature, but this isn't just a random "hey let's smush two constructs together to save a line of code" thing -- iteration is special because it's where generator call stacks and regular call stacks meet.
When you say "make writing iterators significantly harder", is it fair to say that you're thinking mostly of what I'm calling "iterator wrappers"? For most day-to-day iterators, it's pretty trivial to either add a close method or not; the tricky cases are when you're trying to manage a collection of sub-iterators. itertools.chain is a great challenge / test case here, because I think it's about as hard as this gets :-). It took me a bit to wrap my head around, but I think I've got it, and that it's not so bad actually. Right now, chain's semantics are: # copied directly from the docs def chain(*iterables): for it in iterables: for element in it: yield element In a post-__iterclose__ world, the inner for loop there will already handle closing each iterators as its finished being consumed, and if the generator is closed early then the inner for loop will also close the current iterator. What we need to add is that if the generator is closed early, we should also close all the unprocessed iterators. The first change is to replace the outer for loop with a while/pop loop, so that if an exception occurs we'll know which iterables remain to be processed: def chain(*iterables): try: while iterables: for element in iterables.pop(0): yield element ... Now, what do we do if an exception does occur? We need to call iterclose on all of the remaining iterables, but the tricky bit is that this might itself raise new exceptions. If this happens, we don't want to abort early; instead, we want to continue until we've closed all the iterables, and then raise a chained exception. Basically what we want is: def chain(*iterables): try: while iterables: for element in iterables.pop(0): yield element finally: try: operators.iterclose(iter(iterables[0])) finally: try: operators.iterclose(iter(iterables[1])) finally: try: operators.iterclose(iter(iterables[2])) finally: ... but of course that's not valid syntax. Fortunately, it's not too hard to rewrite that into real Python -- but it's a little dense: def chain(*iterables): try: while iterables: for element in iterables.pop(0): yield element # This is equivalent to the nested-finally chain above: except BaseException as last_exc: for iterable in iterables: try: operators.iterclose(iter(iterable)) except BaseException as new_exc: if new_exc.__context__ is None: new_exc.__context__ = last_exc last_exc = new_exc raise last_exc It's probably worth wrapping that bottom part into an iterclose_all() helper, since the pattern probably occurs in other cases as well. (Actually, now that I think about it, the map() example in the text should be doing this instead of what it's currently doing... I'll fix that.) This doesn't strike me as fundamentally complicated, really -- the exception chaining logic makes it look scary, but basically it's just the current chain() plus a cleanup loop. I believe that this handles all the corner cases correctly. Am I missing something? And again, this strikes me as one of the worst cases -- the vast majority of iterators out there are not doing anything nearly this complicated with subiterators.
Adding support to itertools, toolz.itertoolz, and generators (which are the most common way to implement iterator wrappers) will probably take care of 95% of uses, but yeah, there's definitely a long tail that will take time to shake out. The (extremely tentative) transition plan has __iterclose__ as opt-in until 3.9, so that's about 3.5 years from now. __aiterclose__ is a different matter of course, since there are very very few async iterator wrappers in the wild, and in general I think most people writing async iterators are watching async/await-related language developments very closely.
It's true that it's non-obvious to existing users, but that's true of literally every change that we could ever make :-). That's why we have release notes, deprecation warnings, enthusiastic blog posts, etc. For newcomers... well, it's always difficult for those of us with more experience to put ourselves back in the mindset, but I don't see why this would be particularly difficult to explain? for loops consume their iterator; if you don't want that then here's how you avoid it. That's no more difficult to explain than what an iterator is in the first place, I don't think, and for me at least it's a lot easier to wrap my head around than the semantics of else blocks on for loops :-). (I always forget how those work.)
True. If you're doing manual iteration, then you are still responsible for manual cleanup (if that's what you want), just like today. This seems fine to me -- I'm not sure why it's an objection to this proposal :-).
There is no law that says that the interpreter always shuts down after the event loop exits. We're talking about a fundamental language feature here, it shouldn't be dependent on the details of libraries and application shutdown tendencies :-(.
No exception will pass silently in the current PEP 525 implementation.
Exceptions that occur inside a garbage-collected iterator will be printed to the console, or possibly logged according to whatever the event loop does with unhandled exceptions. And sure, that's better than nothing, if someone remembers to look at the console/logs. But they *won't* be propagated out to the containing frame, they can't be caught, etc. That's a really big difference.
And if some AG isn't properly finalized a warning will be issued.
This actually isn't true of the code currently in asyncio master -- if the loop is already closed (either manually by the user or by its __del__ being called) when the AG finalizer executes, then the AG is silently discarded: https://github.com/python/asyncio/blob/e3fed68754002000be665ad1a379a747ad924... This isn't really an argument against the mechanism though, just a bug you should probably fix :-). I guess it does point to my main dissatisfaction with the whole GC hook machinery, though. At this point I have spent many, many hours tracing through the details of this catching edge cases -- first during the initial PEP process, where there were a few rounds of revision, then again the last few days when I first thought I found a bunch of bugs that turned out to be spurious because I'd missed one line in the PEP, plus one real bug that you already know about (the finalizer-called-from-wrong-thread issue), and then I spent another hour carefully reading through the code again with PEP 442 open alongside once I realized how subtle the resurrection and cyclic reference issues are here, and now here's another minor bug for you. At this point I'm about 85% confident that it does actually function as described, or that we'll at least be able to shake out any remaining weird edge cases over the next 6-12 months as people use it. But -- and I realize this is an aesthetic reaction as much as anything else -- this all feels *really* unpythonic to me. Looking at the Zen, the phrases that come to mind are "complicated", and "If the implementation is hard to explain, ...". The __(a)iterclose__ proposal definitely has its complexity as well, but it's a very different kind. The core is incredibly straightforward: "there is this method, for loops always call it". That's it. When you look at a for loop, you can be extremely confident about what's going to happen and when. Of course then there's the question of defining this method on all the diverse iterators that we have floating around -- I'm not saying it's trivial. But you can take them one at a time, and each individual case is pretty straightforward.
Like I said in the text, I don't find this very persuasive, since if you're manually iterating then you can just as well take manual responsibility for cleaning things up. But I could live with both mechanisms co-existing.
I certainly don't want to delay 3.6. I'm not as convinced as you that the async-generator code alone is so complicated that it would force a delay, but if it is then 3.6.1 is also an option worth considering.
The goal isn't to "fully solve the problem of non-deterministic GC of iterators". That would require magic :-). The goal is to provide tools so that when users run into this problem, they have viable options to solve it. Right now, we don't have those tools, as evidenced by the fact that I've basically never seen code that does this "correctly". We can tell people that they should be using explicit 'with' on every generator that might contain cleanup code, but they don't and they won't, and as a result their code quality is suffering on several axes (portability across Python implementations, 'with' blocks inside generators that don't actually do anything except spuriously hide ResourceWarnings, etc.). Adding __(a)iterclose__ to (async) for loops makes it easy and convenient to do the right thing in common cases; and in the less-usual case where you want to do manual iteration, then you can and should use a manual 'with' block too. The proposal is not trying to replace 'with' blocks :-). As for implicitness, eh. If 'for' is defined to mean 'iterate and then close', then that's what 'for' means. If we make the change then there won't be anything more implicit about 'for' calling __iterclose__ than there is about 'for' calling __iter__ or __next__. Definitely this will take some adjustment for those who are used to the old system, but sometimes that's the price of progress ;-). -n -- Nathaniel J. Smith -- https://vorpus.org

Nathaniel, On 2016-10-19 5:02 PM, Nathaniel Smith wrote:
Hi Yury,
Thanks for the detailed comments! Replies inline below.
NP!
Not all code can be written with 'with' statements, see my example with 'self = None' in asyncio. Python code can be quite complex, involving classes with __del__ that do some cleanups etc. Fundamentally, you cannot make GC of such objects deterministic. IOW I'm not convinced that if we implement your proposal we'll fix 90% (or even 30%) of cases where non-deterministic and postponed cleanup is harmful.
Yes, I understand that your proposal really improves some things. OTOH it undeniably complicates the iteration protocol and requires a long period of deprecations, teaching users and library authors new semantics, etc. We only now begin to see Python 3 gaining traction. I don't want us to harm that by introducing another set of things to Python 3 that are significantly different from Python 2. DeprecationWarnings/future imports don't excite users either.
Yes, mainly iterator wrappers. You'll also will need to educate users to refactor (more on that below) their __del__ methods to __(a)iterclose__ in 3.6.
Now imagine that being applied throughout the stdlib, plus some of it will have to be implemented in C. I'm not saying it's impossible, I'm saying that it will require additional effort for CPython and ecosystem. [..]
We don't often change the behavior of basic statements like 'for', if ever.
A lot of code that you find on stackoverflow etc will be broken. Porting code from Python2/<3.6 will be challenging. People are still struggling to understand 'dict.keys()'-like views in Python 3.
Right now we can implement the __del__ method to cleanup iterators. And it works for both partial iteration and cases where people forgot to close the iterator explicitly. With you proposal, to achieve the same (and make the code compatible with new for-loop semantics), users will have to implement both __iterclose__ and __del__.
It's not about shutting down the interpreter or exiting the process. The majority of async applications just run the loop until they exit. The point of PEP 525 and how the finalization is handled in asyncio is that AGs will be properly cleaned up for the absolute majority of time (while the loop is running). [..]
I don't think it's a bug. When the loop is closed, the hook will do nothing, so the asynchronous generator will be cleaned up by the interpreter. If it has an 'await' expression in its 'finally' statement, the interpreter will issue a warning. I'll add a comment explaining this.
Yes, I agree it's not an easy thing to digest. Good thing is that asyncio has a reference implementation of PEP 525 support, so people can learn from it. I'll definitely add more comments to make the code easier to read.
The __(a)iterclose__ semantics is clear. What's not clear is how much harm changing the semantics of for-loops will do (and how to quantify the amount of good :)) [..]
Perhaps we should focus on teaching people that using 'with' statements inside (async-) generators is a bad idea. What you should do instead is to have a 'with' statement wrapping the code that uses the generator. Yury

On Wed, Oct 19, 2016 at 05:52:34PM -0400, Yury Selivanov wrote:
Just because something doesn't solve ALL problems doesn't mean it isn't worth doing. Reference counting doesn't solve the problem of cycles, but Python worked really well for many years even though cycles weren't automatically broken. Then a second GC was added, but it didn't solve the problem of cycles with __del__ finalizers. And recently (a year or two ago) there was an improvement that made the GC better able to deal with such cases -- but I expect that there are still edge cases where objects aren't collected. Had people said "garbage collection doesn't solve all the edge cases, therefore its not worth doing" where would we be? I don't know how big a problem the current lack of deterministic GC of resources opened in generators actually is. I guess that users of CPython will have *no idea*, because most of the time the ref counter will cleanup quite early. But not all Pythons are CPython, and despite my earlier post, I now think I've changed my mind and support this proposal. One reason for this is that I thought hard about my own code where I use the double-for-loop idiom: for x in iterator: if cond: break ... # later for y in iterator: # same iterator ... and I realised: (1) I don't do this *that* often; (2) when I do, it really wouldn't be that big a problem for me to guard against auto-closing: for x in protect(iterator): if cond: break ... (3) if I need to write hybrid code that runs over multiple versions, that's easy too: try: from itertools import protect except ImportError: def protect(it): return it
Couldn't __(a)iterclose__ automatically call __del__ if it exists? Seems like a reasonable thing to inherit from object.
A lot of code that you find on stackoverflow etc will be broken.
"A lot"? Or a little? Are you guessing, or did you actually count it? If we are worried about code like this: it = iter([1, 2, 3]) a = list(it) # currently b will be [], with this proposal it will raise RuntimeError b = list(it) we can soften the proposal's recommendation that iterators raise RuntimeError on calling next() when they are closed. I've suggested that "whatever exception makes sense" should be the rule. Iterators with no resources to close can simply raise StopIteration instead. That will preserve the current behaviour.
I spend a lot of time on the tutor and python-list mailing lists, and a little bit of time on Reddit /python, and I don't think I've ever seen anyone struggle with those. I'm sure it happens, but I don't think it happens often. After all, for the most common use-case, there's no real difference between Python 2 and 3: for key, value in mydict.items(): ... [...]
As I ask above, couldn't we just inherit a default __(a)iterclose__ from object that looks like this? def __iterclose__(self): finalizer = getattr(type(self), '__del__', None) if finalizer: finalizer(self) I know it looks a bit funny for non-iterables to have an iterclose method, but they'll never actually be called. [...]
The "easy" way to find out (easy for those who aren't volunteering to do the work) is to fork Python, make the change, and see what breaks. I suspect not much, and most of the breakage will be easy to fix. As for the amount of good, this proposal originally came from PyPy. I expect that CPython users won't appreciate it as much as PyPy users, and Jython/IronPython users when they eventually support Python 3.x. -- Steve

On 2016-10-21 6:29 AM, Steven D'Aprano wrote:
No, we can't call __del__ from __iterclose__. Otherwise we'd break even more code that this proposal already breaks: for i in iter: ... iter.something() # <- this would be call after iter.__del__() [..]
AFAIK the proposal came "for" PyPy, not "from". And the issues Nathaniel tries to solve do also exist in CPython. It's only a question if changing 'for' statement and iteration protocol is worth the trouble. Yury

Personally, I hadn't realised we had this problem in asyncio until now. Does this problem happen in asyncio at all? Or does asyncio somehow work around it by making sure to always explicitly destroy the frames of all coroutine objects, as long as someone waits on each task? On 21 October 2016 at 16:08, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
-- Gustavo J. A. M. Carneiro Gambit Research "The universe is always one step beyond logic." -- Frank Herbert

On 2016-10-21 11:19 AM, Gustavo Carneiro wrote:
No, I think asyncio code is free of the problem this proposal is trying to address. We might have some "problem" in 3.6 when people start using async generators more often. But I think it's important for us to teach people to manage the associated resources from the outside of the generator (i.e. don't put 'async with' or 'with' inside the generator's body; instead, wrap the code that uses the generator with 'async with' or 'with'). Yury

On Fri, Oct 21, 2016 at 3:29 AM, Steven D'Aprano <steve@pearwood.info> wrote:
As for the amount of good, this proposal originally came from PyPy.
Just to be clear, I'm not a PyPy dev, and the PyPy devs' contribution here was mostly to look over a draft I circulated and to agree that it seemed like something that'd be useful to them. -n -- Nathaniel J. Smith -- https://vorpus.org

On 20 October 2016 at 07:02, Nathaniel Smith <njs@pobox.com> wrote:
At this point your code is starting to look a whole lot like the code in contextlib.ExitStack.__exit__ :) Accordingly, I'm going to suggest that while I agree the problem you describe is one that genuinely emerges in large production applications and other complex systems, this particular solution is simply far too intrusive to be accepted as a language change for Python - you're talking a fundamental change to the meaning of iteration for the sake of the relatively small portion of the community that either work on such complex services, or insist on writing their code as if it might become part of such a service, even when it currently isn't. Given that simple applications vastly outnumber complex ones, and always will, I think making such a change would be a bad trade-off that didn't come close to justifying the costs imposed on the rest of the ecosystem to adjust to it. A potentially more fruitful direction of research to pursue for 3.7 would be the notion of "frame local resources", where each Python level execution frame implicitly provided a lazily instantiated ExitStack instance (or an equivalent) for resource management. Assuming that it offered an "enter_frame_context" function that mapped to "contextlib.ExitStack.enter_context", such a system would let us do things like: from frame_resources import enter_frame_context def readlines_1(fname): return enter_frame_context(open(fname)).readlines() def readlines_2(fname): return [*enter_frame_context(open(fname))] def readlines_3(fname): return [line for line in enter_frame_context(open(fname))] def iterlines_1(fname): yield from enter_frame_context(open(fname)) def iterlines_2(fname): for line in enter_frame_context(open(fname)): yield line def iterlines_3(fname): f = enter_frame_context(open(fname)) while True: try: yield next(f) except StopIteration: pass to indicate "clean up this file handle when this frame terminates, regardless of the GC implementation used by the interpreter". Such a feature already gets you a long way towards the determinism you want, as frames are already likely to be cleaned up deterministically even in Python implementations that don't use automatic reference counting - the bit that's non-deterministic is cleaning up the local variables referenced *from* those frames. And then further down the track, once such a system had proven its utility, *then* we could talk about expanding the iteration protocol to allow for implicit registration of iterable cleanup functions as frame local resources. With the cleanup functions not firing until the *frame* exits, then the backwards compatibility break would be substantially reduced (for __main__ module code there'd essentially be no compatibility break at all, and similarly for CPython local variables), and the level of impact on language implementations would also be much lower (reduced to supporting the registration of cleanup functions with frame objects, and executing those cleanup functions when the frame terminates) Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sat, Oct 22, 2016 at 9:02 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
One of the versions I tried but didn't include in my email used ExitStack :-). It turns out not to work here: the problem is that we effectively need to enter *all* the contexts before unwinding, even if trying to enter one of them fails. ExitStack is nested like (try (try (try ... finally) finally) finally), and we need (try finally (try finally (try finally ...))) But this is just a small side-point anyway, since most code is not implementing complicated meta-iterators; I'll address your real proposal below.
So basically a 'with expression', that gives up the block syntax -- taking its scope from the current function instead -- in return for being usable in expression context? That's a really interesting, and I see the intuition that it might be less disruptive if our implicit iterclose calls are scoped to the function rather than the 'for' loop. But having thought about it and investigated some... I don't think function-scoping addresses my problem, and I don't see evidence that it's meaningfully less disruptive to existing code. First, "my problem": Obviously, Python's a language that should be usable for folks doing one-off scripts, and for paranoid folks trying to write robust complex systems, and for everyone in between -- these are all really important constituencies. And unfortunately, there is a trade-off here, where the changes we're discussing effect these constituencies differently. But it's not just a matter of shifting around a fixed amount of pain; the *quality* of the pain really changes under the different proposals. In the status quo: - for one-off scripts: you can just let the GC worry about generator and file handle cleanup, re-use iterators, whatever, it's cool - for robust systems: because it's the *caller's* responsibility to ensure that iterators are cleaned up, you... kinda can't really use generators without -- pick one -- (a) draconian style guides (like forbidding 'with' inside generators or forbidding bare 'for' loops entirely), (b) lots of auditing (every time you write a 'for' loop, go read the source to the generator you're iterating over -- no modularity for you and let's hope the answer doesn't change!), or (c) introducing really subtle bugs. Or all of the above. It's true that a lot of the time you can ignore this problem and get away with it one way or another, but if you're trying to write robust code then this doesn't really help -- it's like saying the footgun only has 1 bullet in the chamber. Not as reassuring as you'd think. It's like if every time you called a function, you had to explicitly say whether you wanted exception handling to be enabled inside that function, and if you forgot then the interpreter might just skip the 'finally' blocks while unwinding. There's just *isn't* a good solution available. In my proposal (for-scoped-iterclose): - for robust systems: life is great -- you're still stopping to think a little about cleanup every time you use an iterator (because that's what it means to write robust code!), but since the iterators now know when they need cleanup and regular 'for' loops know how to invoke it, then 99% of the time (i.e., whenever you don't intend to re-use an iterator) you can be confident that just writing 'for' will do exactly the right thing, and the other 1% of the time (when you do want to re-use an iterator), you already *know* you're doing something clever. So the cognitive overhead on each for-loop is really low. - for one-off scripts: ~99% of the time (actual measurement, see below) everything just works, except maybe a little bit better. 1% of the time, you deploy the clever trick of re-using an iterator with multiple for loops, and it breaks, so this is some pain. Here's what you see: gen_obj = ... for first_line in gen_obj: break for lines in gen_obj: ... Traceback (most recent call last): File "/tmp/foo.py", line 5, in <module> for lines in gen_obj: AlreadyClosedIteratorError: this iterator was already closed, possibly by a previous 'for' loop. (Maybe you want itertools.preserve?) (We could even have a PYTHONDEBUG flag that when enabled makes that error message include the file:line of the previous 'for' loop that called __iterclose__.) So this is pain! But the pain is (a) rare, not pervasive, (b) immediately obvious (an exception, the code doesn't work at all), not subtle and delayed, (c) easily googleable, (d) easy to fix and the fix is reliable. It's a totally different type of pain than the pain that we currently impose on folks who want to write robust code. Now compare to the new proposal (function-scoped-iterclose): - For those who want robust cleanup: Usually, I only need an iterator for as long as I'm iterating over it; that may or may not correspond to the end of the function (often won't). When these don't coincide, it can cause problems. E.g., consider the original example from my proposal: def read_newline_separated_json(path): with open(path) as f: for line in f: yield json.loads(line) but now suppose that I'm a Data Scientist (tm) so instead of having 1 file full of newline-separated JSON, I have a 100 gigabytes worth of the stuff stored in lots of files in a directory tree. Well, that's no problem, I'll just wrap that generator: def read_newline_separated_json_tree(tree): for root, _, paths in os.walk(tree): for path in paths: for document in read_newline_separated_json(join(root, path)): yield document And then I'll run it on PyPy, because that's what you do when you have 100 GB of string processing, and... it'll crash, because the call to read_newline_separated_tree ends up doing thousands of calls to read_newline_separated_json, but never cleans up any of them up until the function exits, so eventually we run out of file descriptors. A similar situation arises in the main loop of something like an HTTP server: while True: request = read_request(sock) for response_chunk in application_handler(request): send_response_chunk(sock) Here we'll accumulate arbitrary numbers of un-closed application_handler generators attached to the stack frame, which is no good at all. And this has the interesting failure mode that you'll probably miss it in testing, because most clients will only re-use a connection a small number of times. So what this means is that every time I write a for loop, I can't just do a quick "am I going to break out of the for-loop and then re-use this iterator?" check -- I have to stop and think about whether this for-loop is nested inside some other loop, etc. And, again, if I get it wrong, then it's a subtle bug that will bite me later. It's true that with the status quo, we need to wrap, X% of for-loops with 'with' blocks, and with this proposal that number would drop to, I don't know, (X/5)% or something. But that's not the most important cost: the most important cost is the cognitive overhead of figuring out which for-loops need the special treatment, and in this proposal that checking is actually *more* complicated than the status quo. - For those who just want to write a quick script and not think about it: here's a script that does repeated partial for-loops over a generator object: https://github.com/python/cpython/blob/553a84c4c9d6476518e2319acda6ba29b8588... (and note that the generator object even has an ineffective 'with open(...)' block inside it!) With the function-scoped-iterclose, this script would continue to work as it does now. Excellent. But, suppose that I decide that that main() function is really complicated and that it would be better to refactor some of those loops out into helper functions. (Probably actually true in this example.) So I do that and... suddenly the code breaks. And in a rather confusing way, because it has to do with this complicated long-distance interaction between two different 'for' loops *and* where they're placed with respect to the original function versus the helper function. If I were an intermediate-level Python student (and I'm pretty sure anyone who is starting to get clever with re-using iterators counts as "intermediate level"), then I'm pretty sure I'd actually prefer the immediate obvious feedback from the for-scoped-iterclose. This would actually be a good time to teach folks about this aspect of resource handling, actually -- it's certainly an important thing to learn eventually on your way to Python mastery, even if it isn't needed for every script. In the pypy-dev thread about this proposal, there's some very distressed emails from someone who's been writing Python for a long time but only just realized that generator cleanup relies on the garbage collector: https://mail.python.org/pipermail/pypy-dev/2016-October/014709.html https://mail.python.org/pipermail/pypy-dev/2016-October/014720.html It's unpleasant to have the rug pulled out from under you like this and suddenly realize that you might have to go re-evaluate all the code you've ever written, and making for loops safe-by-default and fail-fast-when-unsafe avoids that. Anyway, in summary: function-scoped-iterclose doesn't seem to accomplish my goal of getting rid of the *type* of pain involved when you have to run a background thread in your brain that's doing constant paranoid checking every time you write a for loop. Instead it arguably takes that type of pain and spreads it around both the experts and the novices :-/. ------------- Now, let's look at some evidence about how disruptive the two proposals are for real code: As mentioned else-thread, I wrote a stupid little CPython hack [1] to report when the same iterator object gets passed to multiple 'for' loops, and ran the CPython and Django testsuites with it [2]. Looking just at generator objects [3], across these two large codebases there are exactly 4 places where this happens. (Rough idea of prevalence: these 4 places together account for a total of 8 'for' loops; this is out of a total of 11,503 'for' loops total, of which 665 involve generator objects.) The 4 places are: 1) CPython's Lib/test/test_collections.py:1135, Lib/_collections_abc.py:378 This appears to be a bug in the CPython test suite -- the little MySet class does 'def __init__(self, itr): self.contents = itr', which assumes that itr is a container that can be repeatedly iterated. But a bunch of the methods on collections.abc.Set like to pass in a generator object here instead, which breaks everything. If repeated 'for' loops on generators raised an error then this bug would have been caught much sooner. 2) CPython's Tools/scripts/gprof2html.py lines 45, 54, 59, 75 Discussed above -- as written, for-scoped-iterclose would break this script, but function-scoped-iterclose would not, so here function-scoped-iterclose wins. 3) Django django/utils/regex_helper.py:236 This code is very similar to the previous example in its general outline, except that the 'for' loops *have* been factored out into utility functions. So in this case for-scoped-iterclose and function-scoped-iterclose are equally disruptive. 4) CPython's Lib/test/test_generators.py:723 I have to admit I cannot figure out what this code is doing, besides showing off :-). But the different 'for' loops are in different stack frames, so I'm pretty sure that for-scoped-iterclose and function-scoped-iterclose would be equally disruptive. Obviously there's a bias here in that these are still relatively "serious" libraries; I don't have a big corpus of one-off scripts that are just a big __main__, though gprof2html.py isn't far from that. (If anyone knows where to find such a thing let me know...) But still, the tally here is that out of 4 examples, we have 1 subtle bug that iterclose might have caught, 2 cases where for-scoped-iterclose and function-scoped-iterclose are equally disruptive, and only 1 where function-scoped-iterclose is less disruptive -- and in that case it's arguably just avoiding an obvious error now in favor of a more confusing error later. If this reduced the backwards-incompatible cases by a factor of, like, 10x or 100x, then that would be a pretty strong argument in its favor. But it seems to be more like... 1.5x. -n [1] https://github.com/njsmith/cpython/commit/2b9d60e1c1b89f0f1ac30cbf0a5dceee83... [2] CPython: revision b0a272709b from the github mirror; Django: revision 90c3b11e87 [3] I also looked at "all iterators" and "all iterators with .close methods", but this email is long enough... basically the pattern is the same: there are another 13 'for' loops that involve repeated iteration over non-generator objects, and they're roughly equally split between spurious effects due to bugs in the CPython test-suite or my instrumentation, cases where for-scoped-iterclose and function-scoped-iterclose both cause the same problems, and cases where function-scoped-iterclose is less disruptive. -n -- Nathaniel J. Smith -- https://vorpus.org

...Doh. I spent all that time evaluating the function-scoped-cleanup proposal from the high-level design perspective, and then immediately after hitting send, I suddenly realized that I'd missed a much more straightforward technical problem. One thing that 'with' blocks / for-scoped-iterclose do is that they put an upper bound on the lifetime of generator objects. That's important if you're using a non-refcounting-GC, or if there might be reference cycles. But it's not all they do: they also arrange to make sure that any cleanup code is executed in the context of the code that's using the generator. This is *also* really important: if you have an exception in your cleanup code, and the GC runs your cleanup code, then that exception will just disappear into nothingness (well, it'll get printed to the console, but that's hardly better). So you don't want to let the GC run your cleanup code. If you have an async generator, you want to run the cleanup code under supervision of the calling functions coroutine runner, and ideally block the running coroutine while you do it; doing this from the GC is difficult-to-impossible (depending on how picky you are -- PEP 525 does part of it, but not all). Again, letting the GC get involved is bad. So for the function-scoped-iterclose proposal: does this implicit ExitStack-like object take a strong reference to iterators, or just a weak one? If it takes a strong reference, then suddenly we're pinning all iterators in memory until the end of the enclosing function, which will often look like a memory leak. I think this would break a *lot* more existing code than the for-scoped-iterclose proposal does, and in more obscure ways that are harder to detect and warn about ahead of time. So that's out. If it takes a weak reference, ... then there's a good chance that iterators will get garbage collected before the ExitStack has a chance to clean them up properly. So we still have no guarantee that the cleanup will happen in the right context, that exceptions will not be lost, and so forth. In fact, it becomes literally non-deterministic: you might see an exception propagate properly on one run, and not on the next, depending on exactly when the garbage collector happened to run. IMHO that's *way* too spooky to be allowed, but I can't see any way to fix it within the function-scoping framework :-( -n On Tue, Oct 25, 2016 at 3:25 PM, Nathaniel Smith <njs@pobox.com> wrote:
-- Nathaniel J. Smith -- https://vorpus.org

On 26 October 2016 at 08:48, Nathaniel Smith <njs@pobox.com> wrote:
It would take a strong reference, which is another reason why close_resources() would be an essential part of the explicit API (since it would drop the references in addition to calling the __exit__() and close() methods of the declared resources), and also yet another reason why you've convinced me that the only implicit API that would ever make sense is one that was scoped specifically to the iteration process. However, I still think the explicit-API-only suggestion is a much better path to pursue than any implicit proposal - it will give folks that see it for the first something to Google, and it's a general purpose technique rather than being restricted specifically to the cases where the resource to be managed and the iterator being iterated over are one and the same object. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 26 October 2016 at 08:25, Nathaniel Smith <njs@pobox.com> wrote:
Regardless of any other outcome from this thread, it may be useful to have a "contextlib.ResourceSet" as an abstraction for collective management of resources, regardless of whatever else happens. As you say, the main difference is that the invocation of the cleanup functions wouldn't be nested at all and could be called in an arbitrary order (if that's not sufficient for a particular use case, then you'd need to define an ExitStack for the items where the order of cleanup matters, and then register *that* with the ResourceSet).
(Note: I've changed my preferred API name from "function_resource" + "frame_resource" to the general purpose "scoped_resource" - while it's somewhat jargony, which I consider unfortunate, the goal is to make the runtime scope of the resource match the lexical scope of the reference as closely as is feasible, and if folks are going to understand how Python manages references and resources, they're going to need to learn the basics of Python's scope management at some point) Given your points below, the defensive coding recommendation here would be to - always wrap your iterators in scoped_resource() to tell Python to clean them up when the function is done - explicitly call close_resources() after the affected for loops to clean the resources up early You'd still be vulnerable to resource leaks in libraries you didn't write, but would have decent control over your own code without having to make overly draconian changes to your style guide - you'd only need one new rule, which is "Whenever you're iterating over something, pass it through scoped_resource first". To simplify this from a forwards compatibility perspective (i.e. so it can implicitly adjust when an existing type gains a cleanup method), we'd make scoped_resource() quite permissive, accepting arbitrary objects with the following behaviours: - if it's a context manager, enter it, and register the exit callback - if it's not a context manager, but has a close() method, register the close method - otherwise, pass it straight through without taking any other action This would allow folks to always declare something as a scoped resource without impeding their ability to handle objects that aren't resources at all. The long term question would then become whether it made sense to have certain language constructs implicitly mark their targets as scoped resources *by default*, and clean them up selectively after the loop rather than using the blunt instrument of cleaning up all previously registered resources. If we did start seriously considering such a change, then there would be potential utility in an "unmanaged_iter()" wrapper which forwarded *only* the iterator protocol methods, thus hiding any __exit__() or close() methods from scoped_resource(). However, the time to consider such a change in default behaviour would be *after* we had some experience with explicit declarations and management of scoped resources - plenty of folks are writing plenty of software today in garbage collected languages (including Python), and coping with external resource management problems as they arise, so we don't need to do anything hasty here. I personally think an explicit solution is likely to be sufficient (given the caveat of adding a "gc.collect()" counterpart), with an API like `scoped_resource` being adopted over time in libraries, frameworks and applications based on actual defects found in running production systems as well as the defensive coding style, and your example below makes me even more firmly convinced that that's a better way to go.
In mine, if your style guide says "Use scoped_resource() and an explicit close_resources() call when iterating", you'd add it (or your automated linter would complain that it was missing). So the cognitive overhead is higher, but it would remain where it belongs (i.e. on professional developers being paid to write robust code).
And it's completely unecessary - with explicit scoped_resource() calls absolutely nothing changes for the scripting use case, and even with implicit ones, re-use *within the same scope* would still be fine (you'd only get into trouble if the resource escaped the scope where it was first marked as a scoped resource).
If you're being paid to write robust code and are using Python 3.7+, then you'd add scoped_resource() around the read_newline_separated_json() call and then add a close_resources() call after that loop. That'd be part of your job, and just another point in the long list of reasons why developing software as a profession isn't the same thing as doing it as a hobby. We'd design scoped_resource() in such a way that it could be harmlessly wrapped around "paths" as well, even though we know that's technically not necessary (since it's just a list of strings). As noted above, I'm also open to the notion of some day making all for loops implicitly declare the iterators they operate on as scoped resources, but I don't think we should do that without gaining some experience with the explicit form first (where we can be confident that any unexpected negative consequences will be encountered by folks already well equipped to deal with them).
And we'll go "Oops", and refactor our code to better control the scope of our resources, either by adding a with statement around the innermost loop or using the new scoped resources API (if such a thing gets added). The *whole point* of iterative development is to solve the problems you know you have, not the problems you or someone else might potentially have at some point in the indeterminate future.
And the fixed code (given the revised API proposal above) looks like this: while True: request = read_request(sock) for response_chunk in scoped_resource(application_handler(request)): send_response_chunk(sock) close_resources() This pattern has the advantage of also working if the resources you want to manage aren't precisely what your iterating over, or if you're iterating over them in a while loop rather than a for loop.
Or you unconditionally add the scoped_resource/close_resources calls to force non-reference-counted implementations to behave a bit more like CPython and don't worry about it further.
As it would with the explicit scoped_resource/close_resources API.
I do agree the fact that it would break common code refactoring patterns is a good counter-argument against the idea of ever calling scoped_resource() implicitly.
Does the addition of the explicit close_resources() API mitigate your concern?
The standard library and a web framework are in no way typical of Python application and scripting code.
But explicitly scoped resource management leaves it alone.
And explicitly scoped resource management again leaves it alone.
The explicit-API-only aspect of the proposal eliminates 100% of the backwards incompatibilities :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tuesday, October 25, 2016 at 6:26:17 PM UTC-4, Nathaniel Smith wrote:
I still don't understand why you can't write it like this: def read_newline_separated_json_tree(tree): for root, _, paths in os.walk(tree): for path in paths: with read_newline_separated_json(join(root, path)) as iterable: yield from iterable Zero extra lines. Works today. Does everything you want.
Same thing: while True: request = read_request(sock) with application_handler(request) as iterable: for response_chunk in iterable: send_response_chunk(sock) I'll stop posting about this, but I don't see the motivation behind this proposals except replacing one explicit context management line with a hidden "line" of cognitive overhead. I think the solution is to stop returning an iterable when you have state needing a cleanup. Instead, return a context manager and force the caller to open it to get at the iterable. Best, Neil

Hey Nathaniel - I like the intent here, but I think perhaps it would be better if the problem is approached differently. Seems to me that making *generators* have a special 'you are done now' interface is special casing, which usually makes things harder to learn and predict; and that more the net effect is that all loop constructs will need to learn about that special case, whether looping over a list, a generator, or whatever. Generators already have a well defined lifecycle - but as you say its not defined consistently across Python VM's. The language has no guarantees about when finalisation will occur :(. The PEP 525 aclose is a bit awkward itself in this way - but unlike regular generators it does have a reason, which is that the language doesn't define an event loop context as a built in thing - so finalisation can't reliably summon one up. So rather than adding a special case to finalise objects used in one particular iteration - which will play havoc with break statements, can we instead look at making escape analysis a required part of the compiler: the borrow checker in rust is getting pretty good at managing a very similar problem :). I haven't fleshed out exactly what would be entailed, so consider this a 'what if' and YMMV :). -Rob On 19 October 2016 at 17:38, Nathaniel Smith <njs@pobox.com> wrote:

On 10/19/2016 12:38 AM, Nathaniel Smith wrote:
I'd like to propose that Python's iterator protocol be enhanced to add a first-class notion of completion / cleanup.
With respect the the standard iterator protocol, a very solid -1 from me. (I leave commenting specifically on __aiterclose__ to Yury.) 1. I consider the introduction of iterables and the new iterator protocol in 2.2 and their gradual replacement of lists in many situations to be the greatest enhancement to Python since 1.3 (my first version). They are, to me, they one of Python's greatest features and the minimal nature of the protocol an essential part of what makes them great. 2. I think you greatly underestimate the negative impact, just as we did with changing str is bytes to str is unicode. The change itself, embodied in for loops, will break most non-trivial programs. You yourself note that there will have to be pervasive changes in the stdlib just to begin fixing the breakage. 3. Though perhaps common for what you do, the need for the change is extremely rare in the overall Python world. Iterators depending on an external resource are rare (< 1%, I would think). Incomplete iteration is also rare (also < 1%, I think). And resources do not always need to releases immediately. 4. Previous proposals to officially augment the iterator protocol, even with optional methods, have been rejected, and I think this one should be too. a. Add .__len__ as an option. We added __length_hint__, which an iterator may implement, but which is not part of the iterator protocol. It is also ignored by bool(). b., c. Add __bool__ and/or peek(). I posted a LookAhead wrapper class that implements both for most any iterable. I suspect that the is rarely used.
One problem with passing paths around is that it makes the receiving function hard to test. I think functions should at least optionally take an iterable of lines, and make the open part optional. But then closing should also be conditional. If the combination of 'with', 'for', and 'yield' do not work together, then do something else, rather than changing the meaning of 'for'. Moving responsibility for closing the file from 'with' to 'for', makes 'with' pretty useless, while overloading 'for' with something that is rarely needed. This does not strike me as the right solution to the problem.
for document in read_newline_separated_json(path): # <-- outer for loop ...
If the outer loop determines when the file should be closed, then why not open it there? What fails with try: lines = open(path) gen = read_newline_separated_json(lines) for doc in gen: do_something(doc) finally: lines.close # and/or gen.throw(...) to stop the generator. -- Terry Jan Reedy

On Wed, Oct 19, 2016 at 7:07 PM, Terry Reedy <tjreedy@udel.edu> wrote:
Minimalism for its own sake isn't really a core Python value, and in any case the minimalism ship has kinda sailed -- we effectively already have send/throw/close as optional parts of the protocol (they're most strongly associated with generators, but you're free to add them to your own iterators and e.g. yield from will happily work with that). This proposal is basically "we formalize and start automatically calling the 'close' methods that are already there".
The long-ish list of stdlib changes is about enabling the feature everywhere, not about fixing backwards incompatibilities. It's an important question though what programs will break and how badly. To try and get a better handle on it I've been playing a bit with an instrumented version of CPython that logs whenever the same iterator is passed to multiple 'for' loops. I'll write up the results in more detail, but the summary so far is that there seem to be ~8 places in the stdlib that would need preserve() calls added, and ~3 in django. Maybe 2-3 hours and 1 hour of work respectively to fix? It's not a perfect measure, and the cost certainly isn't zero, but it's at a completely different order of magnitude than the str changes. Among other things, this is a transition that allows for gradual opt-in via a __future__, and fine-grained warnings pointing you at what you need to fix, neither of which were possible for str->unicode.
This could equally well be an argument that the change is fine -- e.g. if you're always doing complete iteration, or just iterating over lists and stuff, then it literally doesn't affect you at all either way...
Sure, that's all true, but this is the problem with tiny documentation examples :-). The point here was to explain the surprising interaction between generators and with blocks in the simplest way, not to demonstrate the ideal solution to the problem of reading newline-separated JSON. Everything you want is still doable in a post-__iterclose__ world -- in particular, if you do for doc in read_newline_separated_json(lines_generator()): ... then both iterators will be closed when the for loop exits. But if you want to re-use the lines_generator, just write: it = lines_generator() for doc in read_newline_separated_json(preserve(it)): ... for more_lines in it: ...
Sure, that works in this trivial case, but they aren't all trivial :-). See the example from my first email about a WSGI-like interface where response handlers are generators: in that use case, your suggestion that we avoid all resource management inside generators would translate to: "webapps can't open files". (Or database connections, proxy requests, ... or at least, can't hold them open while streaming out response data.) Or sticking to concrete examples, here's a toy-but-plausible generator where the put-the-with-block-outside strategy seems rather difficult to implement: # Yields all lines in all files in 'directory' that contain the substring 'needle' def recursive_grep(directory, needle): for dirpath, _, filenames in os.walk(directory): for filename in filenames: with open(os.path.join(dirpath, filename)) as file_handle: for line in file_handle: if needle in line: yield line -n -- Nathaniel J. Smith -- https://vorpus.org

NOTE: This is my first post to this mailing list, I'm not really sure how to post a message, so I'm attempting a reply-all. I like Nathaniel's idea for __iterclose__. I suggest the following changes to deal with a few of the complex issues he discussed. 1. Missing __iterclose__, or a value of none, works as before, no changes. 2. An iterator can be used in one of three ways: A. 'for' loop, which will call __iterclose__ when it exits B. User controlled, in which case the user is responsible to use the iterator inside a with statement. C. Old style. The user is responsible for calling __iterclose__ 3. An iterator keeps track of __iter__ calls, this allows it to know when to cleanup. The two key additions, above, are: #2B. User can use iterator with __enter__ & __exit cleanly. #3. By tracking __iter__ calls, it makes complex user cases easier to handle. Specification ============= An iterator may implement the following method: __iterclose__. A missing method, or a value of None is allowed. When the user wants to control the iterator, the user is expected to use the iterator with a with clause. The core proposal is the change in behavior of ``for`` loops. Given this Python code: for VAR in ITERABLE: LOOP-BODY else: ELSE-BODY we desugar to the equivalent of: _iter = iter(ITERABLE) _iterclose = getattr(_iter, '__iterclose__', None) if _iterclose is none: traditional-for VAR in _iter: LOOP-BODY else: ELSE-BODY else: _stop_exception_seen = False try: traditional-for VAR in _iter: LOOP-BODY else: _stop_exception_seen = True ELSE-BODY finally: if not _stop_exception_seen: _iterclose(_iter) The test for 'none' allows us to skip the setup of a try/finally clause. Also we don't bother to call __iterclose__ if the iterator threw StopException at us. Modifications to basic iterator types ===================================== An iterator will implement something like the following: _cleanup - Private funtion, does the following: _enter_count = _itercount = -1 Do any neccessary cleanup, release resources, etc. NOTE: Is also called internally by the iterator, before throwing StopIterator _iter_count - Private value, starts at 0. _enter_count - Private value, starts at 0. __iter__ - if _iter_count >= 0: _iter_count += 1 return self __iterclose__ - if _iter_count is 0: if _enter_count is 0: _cleanup() elif _iter_count > 0: _iter_count -= 1 __enter__ - if _enter_count >= 0: _enter_count += 1 Return itself. __exit__ - if _enter_count is > 0 _enter_count -= 1 if _enter_count is _iter_count is 0: _cleanup() The suggetions on _iter_count & _enter_count are just example; internal details can differ (and better error handling). Examples: ========= NOTE: Example are givin using xrange() or [1, 2, 3, 4, 5, 6, 7] for simplicity. For real use, the iterator would have resources such as open files it needs to close on cleanup. 1. Simple example: for v in xrange(7): print v Creates an iterator with a _usage_count of 0. The iterator exits normally (by throwing StopException), we don't bother to call __iterclose__ 2. Break example: for v in [1, 2, 3, 4, 5, 6, 7]: print v if v == 3: break Creates an iterator with a _usage_count of 0. The iterator exists after generating 4 numbers, we then call __iterclose__ & the iterator does any necessary cleanup. 3. Convert example #2 to print the next value: with iter([1, 2, 3, 4, 5, 6, 7]) as seven: for v in seven: print v if v == 3: break print 'Next value is: ', seven.next() This will print: 1 2 3 Next value is: 4 How this works: 1. We create an iterator named seven (by calling list.__iter__). 2. We call seven.__enter__ 3. The for loop calls: seven.next() 3 times, and then calls: seven.__iterclose__ Since the _enter_count is 1, the iterator does not do cleanup yet. 4. We call seven.next() 5. We call seven.__exit. The iterator does its cleanup now. 4. More complicated example: with iter([1, 2, 3, 4, 5, 6, 7]) as seven: for v in seven: print v if v == 1: for v in seven: print 'stolen: ', v if v == 3: break if v == 5: break for v in seven: print v * v This will print: 1 stolen: 2 stolen: 3 4 5 36 49 How this works: 1. Same as #3 above, cleanup is done by the __exit__ 5. Alternate way of doing #4. seven = iter([1, 2, 3, 4, 5, 6, 7]) for v in seven: print v if v == 1: for v in seven: print 'stolen: ', v if v == 3: break if v == 5: break for v in seven: print v * v break # Different from #4 seven.__iterclose__() This will print: 1 stolen: 2 stolen: 3 4 5 36 How this works: 1. We create an iterator named seven. 2. The for loops all call seven.__iter__, causing _iter_count to increment. 3. The for loops all call seven.__iterclose__ on exit, decrement _iter_count. 4. The user calls the final __iterclose_, which close the iterator. NOTE: Method #5 is NOT recommended, the 'with' syntax is better. However, something like itertools.zip could call __iterclose__ during cleanup Change to iterators =================== All python iterators would need to add __iterclose__ (possibly with a value of None), __enter__, & __exit__. Third party iterators that do not implenent __iterclose__ cannot be used in a with clause. A new function could be added to itertools, something like: with itertools.with_wrapper(third_party_iterator) as x: ... The 'with_wrapper' would attempt to call __iterclose__ when its __exit__ function is called. On Wed, Oct 19, 2016 at 12:38 AM, Nathaniel Smith <njs@pobox.com> wrote:

On 10/21/2016 03:48 PM, Amit Green wrote:
NOTE: This is my first post to this mailing list, I'm not really sure how to post a message, so I'm attempting a reply-all.
Seems to have worked! :)
Your examples are interesting, but they don't seem to address the issue of closing down for loops that are using generators when those loops exit early: ----------------------------- def some_work(): with some_resource(): for widget in resource: yield widget for pane in some_work(): break: # what happens here? ----------------------------- How does your solution deal with that situation? Or are you saying that this would be closed with your modifications, and if I didn't want the generator to be closed I would have to do: ----------------------------- with some_work() as temp_gen: for pane in temp_gen: break: for another_pane in temp_gen: # temp_gen is still alive here ----------------------------- In other words, instead using the preserve() function, we would use a with statement? -- ~Ethan~

On Fri, Oct 21, 2016 at 3:48 PM, Amit Green <amit.mixie@gmail.com> wrote:
These are interesting ideas! A few general comments: - I don't think we want the "don't bother to call __iterclose__ on exhaustion" functionality --it's actually useful to be able to distinguish between # closes file_handle for line in file_handle: ... and # leaves file_handle open for line in preserve(file_handle): ... To be able to distinguish these cases, it's important that the 'for' loop always call __iterclose__ (which preserve() might then cancel out). - I think it'd be practically difficult and maybe too much magic to add __enter__/__exit__/nesting-depth counts to every iterator implementation. But, the idea of using a context manager for repeated partial iteration is a great idea :-). How's this for a simplified version that still covers the main use cases? @contextmanager def reuse_then_close(it): # TODO: come up with a better name it = iter(it) try: yield preserve(it) finally: iterclose(it) with itertools.reuse_then_close(some_generator(...)) as it: for obj in it: ... # still open here, because our reference to the iterator is wrapped in preserve(...) for obj in it: ... # but then closed here, by the 'with' block -n -- Nathaniel J. Smith -- https://vorpus.org
participants (20)
-
Amit Green
-
Brendan Barnwell
-
Chris Angelico
-
Chris Barker
-
Ethan Furman
-
Gustavo Carneiro
-
Nathaniel Smith
-
Neil Girdhar
-
Nick Coghlan
-
Oscar Benjamin
-
Paul Moore
-
Random832
-
Robert Collins
-
Ronan Lamy
-
Stephen J. Turnbull
-
Steven D'Aprano
-
Terry Reedy
-
Todd
-
Vincent Michel
-
Yury Selivanov