<div dir="ltr"><div>NOTE: This is my first post to this mailing list, I'm not really sure<br> how to post a message, so I'm attempting a reply-all.<br><br>I like Nathaniel's idea for __iterclose__.<br><br>I suggest the following changes to deal with a few of the complex issues<br>he discussed.<br><br>1. Missing __iterclose__, or a value of none, works as before,<br> no changes.<br><br>2. An iterator can be used in one of three ways:<br><br> A. 'for' loop, which will call __iterclose__ when it exits<br><br> B. User controlled, in which case the user is responsible to use the<br> iterator inside a with statement.<br><br> C. Old style. The user is responsible for calling __iterclose__<br><br>3. An iterator keeps track of __iter__ calls, this allows it to know<br> when to cleanup.<br><br><br>The two key additions, above, are:<br><br> #2B. User can use iterator with __enter__ & __exit cleanly.<br><br> #3. By tracking __iter__ calls, it makes complex user cases easier<br> to handle.<br><br>Specification<br>=============<br><br>An iterator may implement the following method: __iterclose__. A missing<br>method, or a value of None is allowed.<br><br>When the user wants to control the iterator, the user is expected to<br>use the iterator with a with clause.<br><br>The core proposal is the change in behavior of ``for`` loops. Given this<br>Python code:<br><br> for VAR in ITERABLE:<br> LOOP-BODY<br> else:<br> ELSE-BODY<br><br>we desugar to the equivalent of:<br><br> _iter = iter(ITERABLE)<br> _iterclose = getattr(_iter, '__iterclose__', None)<br><br> if _iterclose is none:<br> traditional-for VAR in _iter:<br> LOOP-BODY<br> else:<br> ELSE-BODY<br> else:<br> _stop_exception_seen = False try:<br> traditional-for VAR in _iter:<br> LOOP-BODY<br> else:<br> _stop_exception_seen = True<br> ELSE-BODY<br> finally:<br> if not _stop_exception_seen:<br> _iterclose(_iter)<br><br>The test for 'none' allows us to skip the setup of a try/finally clause.<br><br>Also we don't bother to call __iterclose__ if the iterator threw<br>StopException at us.<br><br>Modifications to basic iterator types<br>=====================================<br><br>An iterator will implement something like the following:<br><br> _cleanup - Private funtion, does the following:<br><br> _enter_count = _itercount = -1<br><br> Do any neccessary cleanup, release resources, etc.<br><br> NOTE: Is also called internally by the iterator,<br> before throwing StopIterator<br><br> _iter_count - Private value, starts at 0.<br><br> _enter_count - Private value, starts at 0.<br><br> __iter__ - if _iter_count >= 0:<br> _iter_count += 1<br><br> return self<br><br> __iterclose__ - if _iter_count is 0:<br> if _enter_count is 0:<br> _cleanup()<br> elif _iter_count > 0:<br> _iter_count -= 1<br><br> __enter__ - if _enter_count >= 0:<br> _enter_count += 1<br><br> Return itself.<br><br> __exit__ - if _enter_count is > 0<br> _enter_count -= 1<br><br> if _enter_count is _iter_count is 0:<br> _cleanup()<br><br>The suggetions on _iter_count & _enter_count are just example; internal<br>details can differ (and better error handling).<br><br><br>Examples:<br>=========<br><br>NOTE: Example are givin using xrange() or [1, 2, 3, 4, 5, 6, 7] for<br> simplicity. For real use, the iterator would have resources such<br> as open files it needs to close on cleanup.<br><br><br>1. Simple example:<br><br> for v in xrange(7):<br> print v<br><br> Creates an iterator with a _usage_count of 0. The iterator exits<br> normally (by throwing StopException), we don't bother to call<br> __iterclose__<br><br><br>2. Break example:<br><br> for v in [1, 2, 3, 4, 5, 6, 7]:<br> print v<br><br> if v == 3:<br> break<br><br> Creates an iterator with a _usage_count of 0.<br><br> The iterator exists after generating 4 numbers, we then call<br> __iterclose__ & the iterator does any necessary cleanup.<br><br>3. Convert example #2 to print the next value:<br><br> with iter([1, 2, 3, 4, 5, 6, 7]) as seven:<br> for v in seven:<br> print v<br><br> if v == 3:<br> break<br><br> print 'Next value is: ', seven.next()<br><br> This will print:<br><br> 1<br> 2<br> 3<br> Next value is: 4<br><br> How this works:<br><br> 1. We create an iterator named seven (by calling list.__iter__).<br><br> 2. We call seven.__enter__<br><br> 3. The for loop calls: seven.next() 3 times, and then calls:<br> seven.__iterclose__<br><br> Since the _enter_count is 1, the iterator does not do<br> cleanup yet.<br><br> 4. We call seven.next()<br><br> 5. We call seven.__exit. The iterator does its cleanup now.<br><br>4. More complicated example:<br><br> with iter([1, 2, 3, 4, 5, 6, 7]) as seven:<br> for v in seven:<br> print v<br><br> if v == 1:<br> for v in seven:<br> print 'stolen: ', v<br><br> if v == 3:<br> break<br><br> if v == 5:<br> break<br><br> for v in seven:<br> print v * v<br><br> This will print:<br><br> 1<br> stolen: 2<br> stolen: 3<br> 4<br> 5<br> 36<br> 49<br><br> How this works:<br><br> 1. Same as #3 above, cleanup is done by the __exit__<br><br>5. Alternate way of doing #4.<br><br> seven = iter([1, 2, 3, 4, 5, 6, 7])<br><br> for v in seven:<br> print v<br><br> if v == 1:<br> for v in seven:<br> print 'stolen: ', v<br><br> if v == 3:<br> break<br><br> if v == 5:<br> break<br><br> for v in seven:<br> print v * v<br> break # Different from #4<br><br> seven.__iterclose__()<br><br> This will print:<br><br> 1<br> stolen: 2<br> stolen: 3<br> 4<br> 5<br> 36<br><br></div> How this works:<br><br> 1. We create an iterator named seven.<br><br> 2. The for loops all call seven.__iter__, causing _iter_count<br> to increment.<br><br> 3. The for loops all call seven.__iterclose__ on exit, decrement<br> _iter_count.<br><br> 4. The user calls the final __iterclose_, which close the<br> iterator.<br><br> NOTE:<br> Method #5 is NOT recommended, the 'with' syntax is better.<br><br> However, something like itertools.zip could call __iterclose__<br> during cleanup<br><br><br>Change to iterators<br>===================<br><br>All python iterators would need to add __iterclose__ (possibly with a<br>value of None), __enter__, & __exit__.<br><br>Third party iterators that do not implenent __iterclose__ cannot be<br>used in a with clause. A new function could be added to itertools,<br>something like:<br><br> with itertools.with_wrapper(third_party_iterator) as x:<br> ...<br><br>The 'with_wrapper' would attempt to call __iterclose__ when its __exit__<br>function is called.<br></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Oct 19, 2016 at 12:38 AM, Nathaniel Smith <span dir="ltr"><<a href="mailto:njs@pobox.com" target="_blank">njs@pobox.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi all,<br>
<br>
I'd like to propose that Python's iterator protocol be enhanced to add<br>
a first-class notion of completion / cleanup.<br>
<br>
This is mostly motivated by thinking about the issues around async<br>
generators and cleanup. Unfortunately even though PEP 525 was accepted<br>
I found myself unable to stop pondering this, and the more I've<br>
pondered the more convinced I've become that the GC hooks added in PEP<br>
525 are really not enough, and that we'll regret it if we stick with<br>
them, or at least with them alone :-/. The strategy here is pretty<br>
different -- it's an attempt to dig down and make a fundamental<br>
improvement to the language that fixes a number of long-standing rough<br>
spots, including async generators.<br>
<br>
The basic concept is relatively simple: just adding a '__iterclose__'<br>
method that 'for' loops call upon completion, even if that's via break<br>
or exception. But, the overall issue is fairly complicated + iterators<br>
have a large surface area across the language, so the text below is<br>
pretty long. Mostly I wrote it all out to convince myself that there<br>
wasn't some weird showstopper lurking somewhere :-). For a first pass<br>
discussion, it probably makes sense to mainly focus on whether the<br>
basic concept makes sense? The main rationale is at the top, but the<br>
details are there too for those who want them.<br>
<br>
Also, for *right* now I'm hoping -- probably unreasonably -- to try to<br>
get the async iterator parts of the proposal in ASAP, ideally for<br>
3.6.0 or 3.6.1. (I know this is about the worst timing for a proposal<br>
like this, which I apologize for -- though async generators are<br>
provisional in 3.6, so at least in theory changing them is not out of<br>
the question.) So again, it might make sense to focus especially on<br>
the async parts, which are a pretty small and self-contained part, and<br>
treat the rest of the proposal as a longer-term plan provided for<br>
context. The comparison to PEP 525 GC hooks comes right after the<br>
initial rationale.<br>
<br>
Anyway, I'll be interested to hear what you think!<br>
<br>
-n<br>
<br>
------------------<br>
<br>
Abstract<br>
========<br>
<br>
We propose to extend the iterator protocol with a new<br>
``__(a)iterclose__`` slot, which is called automatically on exit from<br>
``(async) for`` loops, regardless of how they exit. This allows for<br>
convenient, deterministic cleanup of resources held by iterators<br>
without reliance on the garbage collector. This is especially valuable<br>
for asynchronous generators.<br>
<br>
<br>
Note on timing<br>
==============<br>
<br>
In practical terms, the proposal here is divided into two separate<br>
parts: the handling of async iterators, which should ideally be<br>
implemented ASAP, and the handling of regular iterators, which is a<br>
larger but more relaxed project that can't start until 3.7 at the<br>
earliest. But since the changes are closely related, and we probably<br>
don't want to end up with async iterators and regular iterators<br>
diverging in the long run, it seems useful to look at them together.<br>
<br>
<br>
Background and motivation<br>
=========================<br>
<br>
Python iterables often hold resources which require cleanup. For<br>
example: ``file`` objects need to be closed; the `WSGI spec<br>
<<a href="https://www.python.org/dev/peps/pep-0333/" rel="noreferrer" target="_blank">https://www.python.org/dev/<wbr>peps/pep-0333/</a>>`_ adds a ``close`` method<br>
on top of the regular iterator protocol and demands that consumers<br>
call it at the appropriate time (though forgetting to do so is a<br>
`frequent source of bugs<br>
<<a href="http://blog.dscpl.com.au/2012/10/obligations-for-calling-close-on.html" rel="noreferrer" target="_blank">http://blog.dscpl.com.au/<wbr>2012/10/obligations-for-<wbr>calling-close-on.html</a>>`_);<br>
and PEP 342 (based on PEP 325) extended generator objects to add a<br>
``close`` method to allow generators to clean up after themselves.<br>
<br>
Generally, objects that need to clean up after themselves also define<br>
a ``__del__`` method to ensure that this cleanup will happen<br>
eventually, when the object is garbage collected. However, relying on<br>
the garbage collector for cleanup like this causes serious problems in<br>
at least two cases:<br>
<br>
- In Python implementations that do not use reference counting (e.g.<br>
PyPy, Jython), calls to ``__del__`` may be arbitrarily delayed -- yet<br>
many situations require *prompt* cleanup of resources. Delayed cleanup<br>
produces problems like crashes due to file descriptor exhaustion, or<br>
WSGI timing middleware that collects bogus times.<br>
<br>
- Async generators (PEP 525) can only perform cleanup under the<br>
supervision of the appropriate coroutine runner. ``__del__`` doesn't<br>
have access to the coroutine runner; indeed, the coroutine runner<br>
might be garbage collected before the generator object. So relying on<br>
the garbage collector is effectively impossible without some kind of<br>
language extension. (PEP 525 does provide such an extension, but it<br>
has a number of limitations that this proposal fixes; see the<br>
"alternatives" section below for discussion.)<br>
<br>
Fortunately, Python provides a standard tool for doing resource<br>
cleanup in a more structured way: ``with`` blocks. For example, this<br>
code opens a file but relies on the garbage collector to close it::<br>
<br>
def read_newline_separated_json(<wbr>path):<br>
for line in open(path):<br>
yield json.loads(line)<br>
<br>
for document in read_newline_separated_json(<wbr>path):<br>
...<br>
<br>
and recent versions of CPython will point this out by issuing a<br>
``ResourceWarning``, nudging us to fix it by adding a ``with`` block::<br>
<br>
def read_newline_separated_json(<wbr>path):<br>
with open(path) as file_handle: # <-- with block<br>
for line in file_handle:<br>
yield json.loads(line)<br>
<br>
for document in read_newline_separated_json(<wbr>path): # <-- outer for loop<br>
...<br>
<br>
But there's a subtlety here, caused by the interaction of ``with``<br>
blocks and generators. ``with`` blocks are Python's main tool for<br>
managing cleanup, and they're a powerful one, because they pin the<br>
lifetime of a resource to the lifetime of a stack frame. But this<br>
assumes that someone will take care of cleaning up the stack frame...<br>
and for generators, this requires that someone ``close`` them.<br>
<br>
In this case, adding the ``with`` block *is* enough to shut up the<br>
``ResourceWarning``, but this is misleading -- the file object cleanup<br>
here is still dependent on the garbage collector. The ``with`` block<br>
will only be unwound when the ``read_newline_separated_json`<wbr>`<br>
generator is closed. If the outer ``for`` loop runs to completion then<br>
the cleanup will happen immediately; but if this loop is terminated<br>
early by a ``break`` or an exception, then the ``with`` block won't<br>
fire until the generator object is garbage collected.<br>
<br>
The correct solution requires that all *users* of this API wrap every<br>
``for`` loop in its own ``with`` block::<br>
<br>
with closing(read_newline_<wbr>separated_json(path)) as genobj:<br>
for document in genobj:<br>
...<br>
<br>
This gets even worse if we consider the idiom of decomposing a complex<br>
pipeline into multiple nested generators::<br>
<br>
def read_users(path):<br>
with closing(read_newline_<wbr>separated_json(path)) as gen:<br>
for document in gen:<br>
yield User.from_json(document)<br>
<br>
def users_in_group(path, group):<br>
with closing(read_users(path)) as gen:<br>
for user in gen:<br>
if user.group == group:<br>
yield user<br>
<br>
In general if you have N nested generators then you need N+1 ``with``<br>
blocks to clean up 1 file. And good defensive programming would<br>
suggest that any time we use a generator, we should assume the<br>
possibility that there could be at least one ``with`` block somewhere<br>
in its (potentially transitive) call stack, either now or in the<br>
future, and thus always wrap it in a ``with``. But in practice,<br>
basically nobody does this, because programmers would rather write<br>
buggy code than tiresome repetitive code. In simple cases like this<br>
there are some workarounds that good Python developers know (e.g. in<br>
this simple case it would be idiomatic to pass in a file handle<br>
instead of a path and move the resource management to the top level),<br>
but in general we cannot avoid the use of ``with``/``finally`` inside<br>
of generators, and thus dealing with this problem one way or another.<br>
When beauty and correctness fight then beauty tends to win, so it's<br>
important to make correct code beautiful.<br>
<br>
Still, is this worth fixing? Until async generators came along I would<br>
have argued yes, but that it was a low priority, since everyone seems<br>
to be muddling along okay -- but async generators make it much more<br>
urgent. Async generators cannot do cleanup *at all* without some<br>
mechanism for deterministic cleanup that people will actually use, and<br>
async generators are particularly likely to hold resources like file<br>
descriptors. (After all, if they weren't doing I/O, they'd be<br>
generators, not async generators.) So we have to do something, and it<br>
might as well be a comprehensive fix to the underlying problem. And<br>
it's much easier to fix this now when async generators are first<br>
rolling out, then it will be to fix it later.<br>
<br>
The proposal itself is simple in concept: add a ``__(a)iterclose__``<br>
method to the iterator protocol, and have (async) ``for`` loops call<br>
it when the loop is exited, even if this occurs via ``break`` or<br>
exception unwinding. Effectively, we're taking the current cumbersome<br>
idiom (``with`` block + ``for`` loop) and merging them together into a<br>
fancier ``for``. This may seem non-orthogonal, but makes sense when<br>
you consider that the existence of generators means that ``with``<br>
blocks actually depend on iterator cleanup to work reliably, plus<br>
experience showing that iterator cleanup is often a desireable feature<br>
in its own right.<br>
<br>
<br>
Alternatives<br>
============<br>
<br>
PEP 525 asyncgen hooks<br>
----------------------<br>
<br>
PEP 525 proposes a `set of global thread-local hooks managed by new<br>
``sys.{get/set}_asyncgen_<wbr>hooks()`` functions<br>
<<a href="https://www.python.org/dev/peps/pep-0525/#finalization" rel="noreferrer" target="_blank">https://www.python.org/dev/<wbr>peps/pep-0525/#finalization</a>>`_<wbr>, which<br>
allow event loops to integrate with the garbage collector to run<br>
cleanup for async generators. In principle, this proposal and PEP 525<br>
are complementary, in the same way that ``with`` blocks and<br>
``__del__`` are complementary: this proposal takes care of ensuring<br>
deterministic cleanup in most cases, while PEP 525's GC hooks clean up<br>
anything that gets missed. But ``__aiterclose__`` provides a number of<br>
advantages over GC hooks alone:<br>
<br>
- The GC hook semantics aren't part of the abstract async iterator<br>
protocol, but are instead restricted `specifically to the async<br>
generator concrete type <XX find and link Yury's email saying this>`_.<br>
If you have an async iterator implemented using a class, like::<br>
<br>
class MyAsyncIterator:<br>
async def __anext__():<br>
...<br>
<br>
then you can't refactor this into an async generator without<br>
changing its semantics, and vice-versa. This seems very unpythonic.<br>
(It also leaves open the question of what exactly class-based async<br>
iterators are supposed to do, given that they face exactly the same<br>
cleanup problems as async generators.) ``__aiterclose__``, on the<br>
other hand, is defined at the protocol level, so it's duck-type<br>
friendly and works for all iterators, not just generators.<br>
<br>
- Code that wants to work on non-CPython implementations like PyPy<br>
cannot in general rely on GC for cleanup. Without ``__aiterclose__``,<br>
it's more or less guaranteed that developers who develop and test on<br>
CPython will produce libraries that leak resources when used on PyPy.<br>
Developers who do want to target alternative implementations will<br>
either have to take the defensive approach of wrapping every ``for``<br>
loop in a ``with`` block, or else carefully audit their code to figure<br>
out which generators might possibly contain cleanup code and add<br>
``with`` blocks around those only. With ``__aiterclose__``, writing<br>
portable code becomes easy and natural.<br>
<br>
- An important part of building robust software is making sure that<br>
exceptions always propagate correctly without being lost. One of the<br>
most exciting things about async/await compared to traditional<br>
callback-based systems is that instead of requiring manual chaining,<br>
the runtime can now do the heavy lifting of propagating errors, making<br>
it *much* easier to write robust code. But, this beautiful new picture<br>
has one major gap: if we rely on the GC for generator cleanup, then<br>
exceptions raised during cleanup are lost. So, again, with<br>
``__aiterclose__``, developers who care about this kind of robustness<br>
will either have to take the defensive approach of wrapping every<br>
``for`` loop in a ``with`` block, or else carefully audit their code<br>
to figure out which generators might possibly contain cleanup code.<br>
``__aiterclose__`` plugs this hole by performing cleanup in the<br>
caller's context, so writing more robust code becomes the path of<br>
least resistance.<br>
<br>
- The WSGI experience suggests that there exist important<br>
iterator-based APIs that need prompt cleanup and cannot rely on the<br>
GC, even in CPython. For example, consider a hypothetical WSGI-like<br>
API based around async/await and async iterators, where a response<br>
handler is an async generator that takes request headers + an async<br>
iterator over the request body, and yields response headers + the<br>
response body. (This is actually the use case that got me interested<br>
in async generators in the first place, i.e. this isn't hypothetical.)<br>
If we follow WSGI in requiring that child iterators must be closed<br>
properly, then without ``__aiterclose__`` the absolute most<br>
minimalistic middleware in our system looks something like::<br>
<br>
async def noop_middleware(handler, request_header, request_body):<br>
async with aclosing(handler(request_body, request_body)) as aiter:<br>
async for response_item in aiter:<br>
yield response_item<br>
<br>
Arguably in regular code one can get away with skipping the ``with``<br>
block around ``for`` loops, depending on how confident one is that one<br>
understands the internal implementation of the generator. But here we<br>
have to cope with arbitrary response handlers, so without<br>
``__aiterclose__``, this ``with`` construction is a mandatory part of<br>
every middleware.<br>
<br>
``__aiterclose__`` allows us to eliminate the mandatory boilerplate<br>
and an extra level of indentation from every middleware::<br>
<br>
async def noop_middleware(handler, request_header, request_body):<br>
async for response_item in handler(request_header, request_body):<br>
yield response_item<br>
<br>
So the ``__aiterclose__`` approach provides substantial advantages<br>
over GC hooks.<br>
<br>
This leaves open the question of whether we want a combination of GC<br>
hooks + ``__aiterclose__``, or just ``__aiterclose__`` alone. Since<br>
the vast majority of generators are iterated over using a ``for`` loop<br>
or equivalent, ``__aiterclose__`` handles most situations before the<br>
GC has a chance to get involved. The case where GC hooks provide<br>
additional value is in code that does manual iteration, e.g.::<br>
<br>
agen = fetch_newline_separated_json_<wbr>from_url(...)<br>
while True:<br>
document = await type(agen).__anext__(agen)<br>
if document["id"] == needle:<br>
break<br>
# doesn't do 'await agen.aclose()'<br>
<br>
If we go with the GC-hooks + ``__aiterclose__`` approach, this<br>
generator will eventually be cleaned up by GC calling the generator<br>
``__del__`` method, which then will use the hooks to call back into<br>
the event loop to run the cleanup code.<br>
<br>
If we go with the no-GC-hooks approach, this generator will eventually<br>
be garbage collected, with the following effects:<br>
<br>
- its ``__del__`` method will issue a warning that the generator was<br>
not closed (similar to the existing "coroutine never awaited"<br>
warning).<br>
<br>
- The underlying resources involved will still be cleaned up, because<br>
the generator frame will still be garbage collected, causing it to<br>
drop references to any file handles or sockets it holds, and then<br>
those objects's ``__del__`` methods will release the actual operating<br>
system resources.<br>
<br>
- But, any cleanup code inside the generator itself (e.g. logging,<br>
buffer flushing) will not get a chance to run.<br>
<br>
The solution here -- as the warning would indicate -- is to fix the<br>
code so that it calls ``__aiterclose__``, e.g. by using a ``with``<br>
block::<br>
<br>
async with aclosing(fetch_newline_<wbr>separated_json_from_url(...)) as agen:<br>
while True:<br>
document = await type(agen).__anext__(agen)<br>
if document["id"] == needle:<br>
break<br>
<br>
Basically in this approach, the rule would be that if you want to<br>
manually implement the iterator protocol, then it's your<br>
responsibility to implement all of it, and that now includes<br>
``__(a)iterclose__``.<br>
<br>
GC hooks add non-trivial complexity in the form of (a) new global<br>
interpreter state, (b) a somewhat complicated control flow (e.g.,<br>
async generator GC always involves resurrection, so the details of PEP<br>
442 are important), and (c) a new public API in asyncio (``await<br>
loop.shutdown_asyncgens()``) that users have to remember to call at<br>
the appropriate time. (This last point in particular somewhat<br>
undermines the argument that GC hooks provide a safe backup to<br>
guarantee cleanup, since if ``shutdown_asyncgens()`` isn't called<br>
correctly then I *think* it's possible for generators to be silently<br>
discarded without their cleanup code being called; compare this to the<br>
``__aiterclose__``-only approach where in the worst case we still at<br>
least get a warning printed. This might be fixable.) All this<br>
considered, GC hooks arguably aren't worth it, given that the only<br>
people they help are those who want to manually call ``__anext__`` yet<br>
don't want to manually call ``__aiterclose__``. But Yury disagrees<br>
with me on this :-). And both options are viable.<br>
<br>
<br>
Always inject resources, and do all cleanup at the top level<br>
------------------------------<wbr>------------------------------<br>
<br>
It was suggested on python-dev (XX find link) that a pattern to avoid<br>
these problems is to always pass resources in from above, e.g.<br>
``read_newline_separated_json`<wbr>` should take a file object rather than<br>
a path, with cleanup handled at the top level::<br>
<br>
def read_newline_separated_json(<wbr>file_handle):<br>
for line in file_handle:<br>
yield json.loads(line)<br>
<br>
def read_users(file_handle):<br>
for document in read_newline_separated_json(<wbr>file_handle):<br>
yield User.from_json(document)<br>
<br>
with open(path) as file_handle:<br>
for user in read_users(file_handle):<br>
...<br>
<br>
This works well in simple cases; here it lets us avoid the "N+1<br>
``with`` blocks problem". But unfortunately, it breaks down quickly<br>
when things get more complex. Consider if instead of reading from a<br>
file, our generator was reading from a streaming HTTP GET request --<br>
while handling redirects and authentication via OAUTH. Then we'd<br>
really want the sockets to be managed down inside our HTTP client<br>
library, not at the top level. Plus there are other cases where<br>
``finally`` blocks embedded inside generators are important in their<br>
own right: db transaction management, emitting logging information<br>
during cleanup (one of the major motivating use cases for WSGI<br>
``close``), and so forth. So this is really a workaround for simple<br>
cases, not a general solution.<br>
<br>
<br>
More complex variants of __(a)iterclose__<br>
------------------------------<wbr>-----------<br>
<br>
The semantics of ``__(a)iterclose__`` are somewhat inspired by<br>
``with`` blocks, but context managers are more powerful:<br>
``__(a)exit__`` can distinguish between a normal exit versus exception<br>
unwinding, and in the case of an exception it can examine the<br>
exception details and optionally suppress propagation.<br>
``__(a)iterclose__`` as proposed here does not have these powers, but<br>
one can imagine an alternative design where it did.<br>
<br>
However, this seems like unwarranted complexity: experience suggests<br>
that it's common for iterables to have ``close`` methods, and even to<br>
have ``__exit__`` methods that call ``self.close()``, but I'm not<br>
aware of any common cases that make use of ``__exit__``'s full power.<br>
I also can't think of any examples where this would be useful. And it<br>
seems unnecessarily confusing to allow iterators to affect flow<br>
control by swallowing exceptions -- if you're in a situation where you<br>
really want that, then you should probably use a real ``with`` block<br>
anyway.<br>
<br>
<br>
Specification<br>
=============<br>
<br>
This section describes where we want to eventually end up, though<br>
there are some backwards compatibility issues that mean we can't jump<br>
directly here. A later section describes the transition plan.<br>
<br>
<br>
Guiding principles<br>
------------------<br>
<br>
Generally, ``__(a)iterclose__`` implementations should:<br>
<br>
- be idempotent,<br>
- perform any cleanup that is appropriate on the assumption that the<br>
iterator will not be used again after ``__(a)iterclose__`` is called.<br>
In particular, once ``__(a)iterclose__`` has been called then calling<br>
``__(a)next__`` produces undefined behavior.<br>
<br>
And generally, any code which starts iterating through an iterable<br>
with the intention of exhausting it, should arrange to make sure that<br>
``__(a)iterclose__`` is eventually called, whether or not the iterator<br>
is actually exhausted.<br>
<br>
<br>
Changes to iteration<br>
--------------------<br>
<br>
The core proposal is the change in behavior of ``for`` loops. Given<br>
this Python code::<br>
<br>
for VAR in ITERABLE:<br>
LOOP-BODY<br>
else:<br>
ELSE-BODY<br>
<br>
we desugar to the equivalent of::<br>
<br>
_iter = iter(ITERABLE)<br>
_iterclose = getattr(type(_iter), "__iterclose__", lambda: None)<br>
try:<br>
traditional-for VAR in _iter:<br>
LOOP-BODY<br>
else:<br>
ELSE-BODY<br>
finally:<br>
_iterclose(_iter)<br>
<br>
where the "traditional-for statement" here is meant as a shorthand for<br>
the classic 3.5-and-earlier ``for`` loop semantics.<br>
<br>
Besides the top-level ``for`` statement, Python also contains several<br>
other places where iterators are consumed. For consistency, these<br>
should call ``__iterclose__`` as well using semantics equivalent to<br>
the above. This includes:<br>
<br>
- ``for`` loops inside comprehensions<br>
- ``*`` unpacking<br>
- functions which accept and fully consume iterables, like<br>
``list(it)``, ``tuple(it)``, ``itertools.product(it1, it2, ...)``, and<br>
others.<br>
<br>
<br>
Changes to async iteration<br>
--------------------------<br>
<br>
We also make the analogous changes to async iteration constructs,<br>
except that the new slot is called ``__aiterclose__``, and it's an<br>
async method that gets ``await``\ed.<br>
<br>
<br>
Modifications to basic iterator types<br>
------------------------------<wbr>-------<br>
<br>
Generator objects (including those created by generator comprehensions):<br>
- ``__iterclose__`` calls ``self.close()``<br>
- ``__del__`` calls ``self.close()`` (same as now), and additionally<br>
issues a ``ResourceWarning`` if the generator wasn't exhausted. This<br>
warning is hidden by default, but can be enabled for those who want to<br>
make sure they aren't inadverdantly relying on CPython-specific GC<br>
semantics.<br>
<br>
Async generator objects (including those created by async generator<br>
comprehensions):<br>
- ``__aiterclose__`` calls ``self.aclose()``<br>
- ``__del__`` issues a ``RuntimeWarning`` if ``aclose`` has not been<br>
called, since this probably indicates a latent bug, similar to the<br>
"coroutine never awaited" warning.<br>
<br>
QUESTION: should file objects implement ``__iterclose__`` to close the<br>
file? On the one hand this would make this change more disruptive; on<br>
the other hand people really like writing ``for line in open(...):<br>
...``, and if we get used to iterators taking care of their own<br>
cleanup then it might become very weird if files don't.<br>
<br>
<br>
New convenience functions<br>
-------------------------<br>
<br>
The ``itertools`` module gains a new iterator wrapper that can be used<br>
to selectively disable the new ``__iterclose__`` behavior::<br>
<br>
# QUESTION: I feel like there might be a better name for this one?<br>
class preserve(iterable):<br>
def __init__(self, iterable):<br>
self._it = iter(iterable)<br>
<br>
def __iter__(self):<br>
return self<br>
<br>
def __next__(self):<br>
return next(self._it)<br>
<br>
def __iterclose__(self):<br>
# Swallow __iterclose__ without passing it on<br>
pass<br>
<br>
Example usage (assuming that file objects implements ``__iterclose__``)::<br>
<br>
with open(...) as handle:<br>
# Iterate through the same file twice:<br>
for line in itertools.preserve(handle):<br>
...<br>
handle.seek(0)<br>
for line in itertools.preserve(handle):<br>
...<br>
<br>
The ``operator`` module gains two new functions, with semantics<br>
equivalent to the following::<br>
<br>
def iterclose(it):<br>
if hasattr(type(it), "__iterclose__"):<br>
type(it).__iterclose__(it)<br>
<br>
async def aiterclose(ait):<br>
if hasattr(type(ait), "__aiterclose__"):<br>
await type(ait).__aiterclose__(ait)<br>
<br>
These are particularly useful when implementing the changes in the next section:<br>
<br>
<br>
__iterclose__ implementations for iterator wrappers<br>
------------------------------<wbr>---------------------<br>
<br>
Python ships a number of iterator types that act as wrappers around<br>
other iterators: ``map``, ``zip``, ``itertools.accumulate``,<br>
``csv.reader``, and others. These iterators should define a<br>
``__iterclose__`` method which calls ``__iterclose__`` in turn on<br>
their underlying iterators. For example, ``map`` could be implemented<br>
as::<br>
<br>
class map:<br>
def __init__(self, fn, *iterables):<br>
self._fn = fn<br>
self._iters = [iter(iterable) for iterable in iterables]<br>
<br>
def __iter__(self):<br>
return self<br>
<br>
def __next__(self):<br>
return self._fn(*[next(it) for it in self._iters])<br>
<br>
def __iterclose__(self):<br>
for it in self._iters:<br>
operator.iterclose(it)<br>
<br>
In some cases this requires some subtlety; for example,<br>
```itertools.tee``<br>
<<a href="https://docs.python.org/3/library/itertools.html#itertools.tee" rel="noreferrer" target="_blank">https://docs.python.org/3/<wbr>library/itertools.html#<wbr>itertools.tee</a>>`_<br>
should not call ``__iterclose__`` on the underlying iterator until it<br>
has been called on *all* of the clone iterators.<br>
<br>
<br>
Example / Rationale<br>
-------------------<br>
<br>
The payoff for all this is that we can now write straightforward code like::<br>
<br>
def read_newline_separated_json(<wbr>path):<br>
for line in open(path):<br>
yield json.loads(line)<br>
<br>
and be confident that the file will receive deterministic cleanup<br>
*without the end-user having to take any special effort*, even in<br>
complex cases. For example, consider this silly pipeline::<br>
<br>
list(map(lambda key: key.upper(),<br>
doc["key"] for doc in read_newline_separated_json(<wbr>path)))<br>
<br>
If our file contains a document where ``doc["key"]`` turns out to be<br>
an integer, then the following sequence of events will happen:<br>
<br>
1. ``key.upper()`` raises an ``AttributeError``, which propagates out<br>
of the ``map`` and triggers the implicit ``finally`` block inside<br>
``list``.<br>
2. The ``finally`` block in ``list`` calls ``__iterclose__()`` on the<br>
map object.<br>
3. ``map.__iterclose__()`` calls ``__iterclose__()`` on the generator<br>
comprehension object.<br>
4. This injects a ``GeneratorExit`` exception into the generator<br>
comprehension body, which is currently suspended inside the<br>
comprehension's ``for`` loop body.<br>
5. The exception propagates out of the ``for`` loop, triggering the<br>
``for`` loop's implicit ``finally`` block, which calls<br>
``__iterclose__`` on the generator object representing the call to<br>
``read_newline_separated_json`<wbr>`.<br>
6. This injects an inner ``GeneratorExit`` exception into the body of<br>
``read_newline_separated_json`<wbr>`, currently suspended at the ``yield``.<br>
7. The inner ``GeneratorExit`` propagates out of the ``for`` loop,<br>
triggering the ``for`` loop's implicit ``finally`` block, which calls<br>
``__iterclose__()`` on the file object.<br>
8. The file object is closed.<br>
9. The inner ``GeneratorExit`` resumes propagating, hits the boundary<br>
of the generator function, and causes<br>
``read_newline_separated_json`<wbr>`'s ``__iterclose__()`` method to return<br>
successfully.<br>
10. Control returns to the generator comprehension body, and the outer<br>
``GeneratorExit`` continues propagating, allowing the comprehension's<br>
``__iterclose__()`` to return successfully.<br>
11. The rest of the ``__iterclose__()`` calls unwind without incident,<br>
back into the body of ``list``.<br>
12. The original ``AttributeError`` resumes propagating.<br>
<br>
(The details above assume that we implement ``file.__iterclose__``; if<br>
not then add a ``with`` block to ``read_newline_separated_json`<wbr>` and<br>
essentially the same logic goes through.)<br>
<br>
Of course, from the user's point of view, this can be simplified down to just:<br>
<br>
1. ``int.upper()`` raises an ``AttributeError``<br>
1. The file object is closed.<br>
2. The ``AttributeError`` propagates out of ``list``<br>
<br>
So we've accomplished our goal of making this "just work" without the<br>
user having to think about it.<br>
<br>
<br>
Transition plan<br>
===============<br>
<br>
While the majority of existing ``for`` loops will continue to produce<br>
identical results, the proposed changes will produce<br>
backwards-incompatible behavior in some cases. Example::<br>
<br>
def read_csv_with_header(lines_<wbr>iterable):<br>
lines_iterator = iter(lines_iterable)<br>
for line in lines_iterator:<br>
column_names = line.strip().split("\t")<br>
break<br>
for line in lines_iterator:<br>
values = line.strip().split("\t")<br>
record = dict(zip(column_names, values))<br>
yield record<br>
<br>
This code used to be correct, but after this proposal is implemented<br>
will require an ``itertools.preserve`` call added to the first ``for``<br>
loop.<br>
<br>
[QUESTION: currently, if you close a generator and then try to iterate<br>
over it then it just raises ``Stop(Async)Iteration``, so code the<br>
passes the same generator object to multiple ``for`` loops but forgets<br>
to use ``itertools.preserve`` won't see an obvious error -- the second<br>
``for`` loop will just exit immediately. Perhaps it would be better if<br>
iterating a closed generator raised a ``RuntimeError``? Note that<br>
files don't have this problem -- attempting to iterate a closed file<br>
object already raises ``ValueError``.]<br>
<br>
Specifically, the incompatibility happens when all of these factors<br>
come together:<br>
<br>
- The automatic calling of ``__(a)iterclose__`` is enabled<br>
- The iterable did not previously define ``__(a)iterclose__``<br>
- The iterable does now define ``__(a)iterclose__``<br>
- The iterable is re-used after the ``for`` loop exits<br>
<br>
So the problem is how to manage this transition, and those are the<br>
levers we have to work with.<br>
<br>
First, observe that the only async iterables where we propose to add<br>
``__aiterclose__`` are async generators, and there is currently no<br>
existing code using async generators (though this will start changing<br>
very soon), so the async changes do not produce any backwards<br>
incompatibilities. (There is existing code using async iterators, but<br>
using the new async for loop on an old async iterator is harmless,<br>
because old async iterators don't have ``__aiterclose__``.) In<br>
addition, PEP 525 was accepted on a provisional basis, and async<br>
generators are by far the biggest beneficiary of this PEP's proposed<br>
changes. Therefore, I think we should strongly consider enabling<br>
``__aiterclose__`` for ``async for`` loops and async generators ASAP,<br>
ideally for 3.6.0 or 3.6.1.<br>
<br>
For the non-async world, things are harder, but here's a potential<br>
transition path:<br>
<br>
In 3.7:<br>
<br>
Our goal is that existing unsafe code will start emitting warnings,<br>
while those who want to opt-in to the future can do that immediately:<br>
<br>
- We immediately add all the ``__iterclose__`` methods described above.<br>
- If ``from __future__ import iterclose`` is in effect, then ``for``<br>
loops and ``*`` unpacking call ``__iterclose__`` as specified above.<br>
- If the future is *not* enabled, then ``for`` loops and ``*``<br>
unpacking do *not* call ``__iterclose__``. But they do call some other<br>
method instead, e.g. ``__iterclose_warning__``.<br>
- Similarly, functions like ``list`` use stack introspection (!!) to<br>
check whether their direct caller has ``__future__.iterclose``<br>
enabled, and use this to decide whether to call ``__iterclose__`` or<br>
``__iterclose_warning__``.<br>
- For all the wrapper iterators, we also add ``__iterclose_warning__``<br>
methods that forward to the ``__iterclose_warning__`` method of the<br>
underlying iterator or iterators.<br>
- For generators (and files, if we decide to do that),<br>
``__iterclose_warning__`` is defined to set an internal flag, and<br>
other methods on the object are modified to check for this flag. If<br>
they find the flag set, they issue a ``PendingDeprecationWarning`` to<br>
inform the user that in the future this sequence would have led to a<br>
use-after-close situation and the user should use ``preserve()``.<br>
<br>
In 3.8:<br>
<br>
- Switch from ``PendingDeprecationWarning`` to ``DeprecationWarning``<br>
<br>
In 3.9:<br>
<br>
- Enable the ``__future__`` unconditionally and remove all the<br>
``__iterclose_warning__`` stuff.<br>
<br>
I believe that this satisfies the normal requirements for this kind of<br>
transition -- opt-in initially, with warnings targeted precisely to<br>
the cases that will be effected, and a long deprecation cycle.<br>
<br>
Probably the most controversial / risky part of this is the use of<br>
stack introspection to make the iterable-consuming functions sensitive<br>
to a ``__future__`` setting, though I haven't thought of any situation<br>
where it would actually go wrong yet...<br>
<br>
<br>
Acknowledgements<br>
================<br>
<br>
Thanks to Yury Selivanov, Armin Rigo, and Carl Friedrich Bolz for<br>
helpful discussion on earlier versions of this idea.<br>
<span class="HOEnZb"><font color="#888888"><br>
--<br>
Nathaniel J. Smith -- <a href="https://vorpus.org" rel="noreferrer" target="_blank">https://vorpus.org</a><br>
______________________________<wbr>_________________<br>
Python-ideas mailing list<br>
<a href="mailto:Python-ideas@python.org">Python-ideas@python.org</a><br>
<a href="https://mail.python.org/mailman/listinfo/python-ideas" rel="noreferrer" target="_blank">https://mail.python.org/<wbr>mailman/listinfo/python-ideas</a><br>
Code of Conduct: <a href="http://python.org/psf/codeofconduct/" rel="noreferrer" target="_blank">http://python.org/psf/<wbr>codeofconduct/</a><br>
</font></span></blockquote></div><br></div>