[Python-ideas] Deterministic iterator cleanup

Yury Selivanov yselivanov.ml at gmail.com
Wed Oct 19 11:51:37 EDT 2016


I'm -1 on the idea.  Here's why:


1. Python is a very dynamic language with GC and that is one of its 
fundamental properties.  This proposal might make GC of iterators more 
deterministic, but that is only one case.

For instance, in some places in asyncio source code we have statements 
like this: "self = None".  Why?  When an exception occurs and we want to 
save it (for instance to log it), it holds a reference to the Traceback 
object.  Which in turn references frame objects.  Which means that a lot 
of objects in those frames will be alive while the exception object is 
alive.  So in asyncio we go to great lengths to avoid unnecessary runs 
of GC, but this is an exception!  Most of Python code out there today 
doesn't do this sorts of tricks.

And this is just one example of how you can have cycles that require a 
run of GC.  It is not possible to have deterministic GC in real life 
Python applications.  This proposal addresses only *one* use case, 
leaving 100s of others unresolved.

IMO, while GC-related issues can be annoying to debug sometimes, it's 
not worth it to change the behaviour of iteration in Python only to 
slightly improve on this.


2. This proposal will make writing iterators significantly harder. 
Consider 'itertools.chain'.  We will have to rewrite it to add the 
proposed __iterclose__ method.  The Chain iterator object will have to 
track all of its iterators, call __iterclose__ on them when it's 
necessary (there are a few corner cases).  Given that this object is 
implemented in C, it's quite a bit of work.  And we'll have a lot of 
objects to fix.

We can probably update all iterators in standard library (in 3.7), but 
what about third-party code?  It will take many years until you can say 
with certainty that most of Python code supports __iterclose__ / 
__aiterclose__.


3. This proposal changes the behaviour of 'for' and 'async for' 
statements significantly.  To do partial iteration you will have to use 
a special builtin function to guard the iterator from being closed.  
This is completely non-obvious to any existing Python user and will be 
hard to explain to newcomers.


4. This proposal only addresses iteration with 'for' and 'async for' 
statements.  If you iterate using a 'while' loop and 'next()' function, 
this proposal wouldn't help you.  Also see the point #2 about 
third-party code.


5. Asynchronous generators (AG) introduced by PEP 525 are finalized in a 
very similar fashion to synchronous generators.  There is an API to help 
Python to call event loop to finalize AGs.  asyncio in 3.6 (and other 
event loops in the near future) already uses this API to ensure that 
*all AGs in a long-running program are properly finalized* while it is 
being run.

There is an extra loop method (`loop.shutdown_asyncgens`) that should be 
called right before stopping the loop (exiting the program) to make sure 
that all AGs are finalized, but if you forget to call it the world won't 
end.  The process will end and the interpreter will shutdown, maybe 
issuing a couple of ResourceWarnings.

No exception will pass silently in the current PEP 525 implementation.  
And if some AG isn't properly finalized a warning will be issued.

The current AG finalization mechanism must stay even if this proposal 
gets accepted, as it ensures that even manually iterated AGs are 
properly finalized.


6. If this proposal gets accepted, I think we shouldn't introduce it in 
any form in 3.6.  It's too late to implement it for both sync- and 
async-generators.  Implementing it only for async-generators will only 
add cognitive overhead.  Even implementing this only for 
async-generators will (and should!) delay 3.6 release significantly.


7. To conclude: I'm not convinced that this proposal fully solves the 
issue of non-deterministic GC of iterators.  It cripples iteration 
protocols to partially solve the problem for 'for' and 'async for' 
statements, leaving manual iteration unresolved.  It will make it harder 
to write *correct* (async-) iterators.  It introduces some *implicit* 
context management to 'for' and 'async for' statements -- something that 
IMO should be done by user with an explicit 'with' or 'async with'.


Yury


More information about the Python-ideas mailing list