[Python-Dev] PEP 525, third round, better finalization
yselivanov.ml at gmail.com
Sat Sep 3 15:13:14 EDT 2016
On 2016-09-02 2:13 AM, Nathaniel Smith wrote:
> On Thu, Sep 1, 2016 at 3:34 PM, Yury Selivanov <yselivanov.ml at gmail.com> wrote:
>> I've spent quite a while thinking and experimenting with PEP 525 trying to
>> figure out how to make asynchronous generators (AG) finalization reliable.
>> I've tried to replace the callback for GCed with a callback to intercept
>> first iteration of AGs. Turns out it's very hard to work with weak-refs and
>> make asyncio event loop to reliably track and shutdown all open AGs.
>> My new approach is to replace the "sys.set_asyncgen_finalizer(finalizer)"
>> function with "sys.set_asyncgen_hooks(firstiter=None, finalizer=None)".
> 1) Can/should these hooks be used by other types besides async
> generators? (e.g., async iterators that are not async generators?)
> What would that look like?
Asynchronous iterators (classes implementing __aiter__, __anext__)
should use __del__ for any cleanup purposes.
sys.set_asyncgen_hooks only supports asynchronous generators.
> 2) In the asyncio design it's legal for an event loop to be stopped
> and then started again. Currently (I guess for this reason?) asyncio
> event loops do not forcefully clean up resources associated with them
> on shutdown. For example, if I open a StreamReader, loop.stop() and
> loop.close() will not automatically close it for me. When, concretely,
> are you imagining that asyncio will run these finalizers?
I think we will add another API method to asyncio event loop, which
users will call before closing the loop. In my reference implementation
I added `loop.shutdown()` synchronous method.
> 3) Should the cleanup code in the generator be able to distinguish
> between "this iterator has left scope" versus "the event loop is being
> violently shut down"?
This is already handled in the reference implementation. When an AG is
iterated for the first time, the loop starts tracking it by adding it to
a weak set. When the AG is about to be GCed, the loop removes it from
the weak set, and schedules its 'aclose()'.
If 'loop.shutdown' is called it means that the loop is being "violently
shutdown", so we schedule 'aclose' for all AGs in the weak set.
> 4) More fundamentally -- this revision is definitely an improvement,
> but it doesn't really address the main concern I have. Let me see if I
> can restate it more clearly.
> Let's define 3 levels of cleanup handling:
> Level 0: resources (e.g. file descriptors) cannot be reliably cleaned up.
> Level 1: resources are cleaned up reliably, but at an unpredictable time.
> Level 2: resources are cleaned up both reliably and promptly.
> In Python 3.5, unless you're very anal about writing cumbersome 'async
> with' blocks around every single 'async for', resources owned by aysnc
> iterators land at level 0. (Because the only cleanup method available
> is __del__, and __del__ cannot make async calls, so if you need async
> calls to do clean up then you're just doomed.)
> I think at the revised draft does a good job of moving async
> generators from level 0 to level 1 -- the finalizer hook gives a way
> to effectively call back into the event loop from __del__, and the
> shutdown hook gives us a way to guarantee that the cleanup happens
> while the event loop is still running.
Right. It's good to hear that you agree that the latest revision of the
PEP makes AGs cleanup reliable (albeit unpredictable when exactly that
will happen, more on that below).
My goal was exactly this - make the mechanism reliable, with the same
predictability as what we have for __del__.
> But... IIUC, it's now generally agreed that for Python code, level 1
> is simply *not good enough*. (Or to be a little more precise, it's
> good enough for the case where the resource being cleaned up is
> memory, because the garbage collector knows when memory is short, but
> it's not good enough for resources like file descriptors.) The classic
> example of this is code like:
I think this is where I don't agree with you 100%. There are no strict
guarantees when an object will be GCed in a timely manner in CPython or
PyPy. If it's part of a ref cycle, it might not be cleaned up at all.
All in all, in all your examples I don't see the exact place where AGs
are different from let's say synchronous generators.
> async def read_json_lines_from_server(host, port):
> async for line in asyncio.open_connection(host, port):
> yield json.loads(line)
> You would expect to use this like:
> async for data in read_json_lines_from_server(host, port):
If you rewrite the above code without the 'async' keyword, you'd have a
synchronous generator with *exactly* the same problems.
> tl;dr: AFAICT this revision of PEP 525 is enough to make it work
> reliably on CPython, but I have serious concerns that it bakes a
> CPython-specific design into the language. I would prefer a design
> that actually aims for "level 2" cleanup semantics (for example, )
I honestly don't see why PEP 525 can't be implemented in PyPy. The
finalizing mechanism is built on top of existing finalization of
synchronous generators which is already implemented in PyPy.
The design of PEP 525 doesn't exploit any CPython-specific features
(like ref counting). If an alternative implementation of Python
interpreter implements __del__ semantics properly, it shouldn't have any
problems with implementing PEP 525.
More information about the Python-Dev