[Python-ideas] Deterministic iterator cleanup
Yury Selivanov
yselivanov.ml at gmail.com
Wed Oct 19 11:51:37 EDT 2016
I'm -1 on the idea. Here's why:
1. Python is a very dynamic language with GC and that is one of its
fundamental properties. This proposal might make GC of iterators more
deterministic, but that is only one case.
For instance, in some places in asyncio source code we have statements
like this: "self = None". Why? When an exception occurs and we want to
save it (for instance to log it), it holds a reference to the Traceback
object. Which in turn references frame objects. Which means that a lot
of objects in those frames will be alive while the exception object is
alive. So in asyncio we go to great lengths to avoid unnecessary runs
of GC, but this is an exception! Most of Python code out there today
doesn't do this sorts of tricks.
And this is just one example of how you can have cycles that require a
run of GC. It is not possible to have deterministic GC in real life
Python applications. This proposal addresses only *one* use case,
leaving 100s of others unresolved.
IMO, while GC-related issues can be annoying to debug sometimes, it's
not worth it to change the behaviour of iteration in Python only to
slightly improve on this.
2. This proposal will make writing iterators significantly harder.
Consider 'itertools.chain'. We will have to rewrite it to add the
proposed __iterclose__ method. The Chain iterator object will have to
track all of its iterators, call __iterclose__ on them when it's
necessary (there are a few corner cases). Given that this object is
implemented in C, it's quite a bit of work. And we'll have a lot of
objects to fix.
We can probably update all iterators in standard library (in 3.7), but
what about third-party code? It will take many years until you can say
with certainty that most of Python code supports __iterclose__ /
__aiterclose__.
3. This proposal changes the behaviour of 'for' and 'async for'
statements significantly. To do partial iteration you will have to use
a special builtin function to guard the iterator from being closed.
This is completely non-obvious to any existing Python user and will be
hard to explain to newcomers.
4. This proposal only addresses iteration with 'for' and 'async for'
statements. If you iterate using a 'while' loop and 'next()' function,
this proposal wouldn't help you. Also see the point #2 about
third-party code.
5. Asynchronous generators (AG) introduced by PEP 525 are finalized in a
very similar fashion to synchronous generators. There is an API to help
Python to call event loop to finalize AGs. asyncio in 3.6 (and other
event loops in the near future) already uses this API to ensure that
*all AGs in a long-running program are properly finalized* while it is
being run.
There is an extra loop method (`loop.shutdown_asyncgens`) that should be
called right before stopping the loop (exiting the program) to make sure
that all AGs are finalized, but if you forget to call it the world won't
end. The process will end and the interpreter will shutdown, maybe
issuing a couple of ResourceWarnings.
No exception will pass silently in the current PEP 525 implementation.
And if some AG isn't properly finalized a warning will be issued.
The current AG finalization mechanism must stay even if this proposal
gets accepted, as it ensures that even manually iterated AGs are
properly finalized.
6. If this proposal gets accepted, I think we shouldn't introduce it in
any form in 3.6. It's too late to implement it for both sync- and
async-generators. Implementing it only for async-generators will only
add cognitive overhead. Even implementing this only for
async-generators will (and should!) delay 3.6 release significantly.
7. To conclude: I'm not convinced that this proposal fully solves the
issue of non-deterministic GC of iterators. It cripples iteration
protocols to partially solve the problem for 'for' and 'async for'
statements, leaving manual iteration unresolved. It will make it harder
to write *correct* (async-) iterators. It introduces some *implicit*
context management to 'for' and 'async for' statements -- something that
IMO should be done by user with an explicit 'with' or 'async with'.
Yury
More information about the Python-ideas
mailing list