[Python-Dev] PEP 525, third round, better finalization

Nathaniel Smith njs at pobox.com
Fri Sep 2 05:13:34 EDT 2016


On Thu, Sep 1, 2016 at 3:34 PM, Yury Selivanov <yselivanov.ml at gmail.com> wrote:
> Hi,
>
> I've spent quite a while thinking and experimenting with PEP 525 trying to
> figure out how to make asynchronous generators (AG) finalization reliable.
> I've tried to replace the callback for GCed with a callback to intercept
> first iteration of AGs.  Turns out it's very hard to work with weak-refs and
> make asyncio event loop to reliably track and shutdown all open AGs.
>
> My new approach is to replace the "sys.set_asyncgen_finalizer(finalizer)"
> function with "sys.set_asyncgen_hooks(firstiter=None, finalizer=None)".

1) Can/should these hooks be used by other types besides async
generators? (e.g., async iterators that are not async generators?)
What would that look like?

2) In the asyncio design it's legal for an event loop to be stopped
and then started again. Currently (I guess for this reason?) asyncio
event loops do not forcefully clean up resources associated with them
on shutdown. For example, if I open a StreamReader, loop.stop() and
loop.close() will not automatically close it for me. When, concretely,
are you imagining that asyncio will run these finalizers?

3) Should the cleanup code in the generator be able to distinguish
between "this iterator has left scope" versus "the event loop is being
violently shut down"?

4) More fundamentally -- this revision is definitely an improvement,
but it doesn't really address the main concern I have. Let me see if I
can restate it more clearly.

Let's define 3 levels of cleanup handling:

  Level 0: resources (e.g. file descriptors) cannot be reliably cleaned up.

  Level 1: resources are cleaned up reliably, but at an unpredictable time.

  Level 2: resources are cleaned up both reliably and promptly.

In Python 3.5, unless you're very anal about writing cumbersome 'async
with' blocks around every single 'async for', resources owned by aysnc
iterators land at level 0. (Because the only cleanup method available
is __del__, and __del__ cannot make async calls, so if you need async
calls to do clean up then you're just doomed.)

I think at the revised draft does a good job of moving async
generators from level 0 to level 1 -- the finalizer hook gives a way
to effectively call back into the event loop from __del__, and the
shutdown hook gives us a way to guarantee that the cleanup happens
while the event loop is still running.

But... IIUC, it's now generally agreed that for Python code, level 1
is simply *not good enough*. (Or to be a little more precise, it's
good enough for the case where the resource being cleaned up is
memory, because the garbage collector knows when memory is short, but
it's not good enough for resources like file descriptors.) The classic
example of this is code like:

 # used to be good, now considered poor style:
 def get_file_contents(path):
      handle = open(path)
      return handle.read()

This works OK on CPython because the reference-counting gc will call
handle.__del__() at the end of the scope (so on CPython it's at level
2), but it famously causes huge problems when porting to PyPy with
it's much faster and more sophisticated gc that only runs when
triggered by memory pressure. (Or for "PyPy" you can substitute
"Jython", "IronPython", whatever.) Technically this code doesn't
actually "leak" file descriptors on PyPy, because handle.__del__()
will get called *eventually* (this code is at level 1, not level 0),
but by the time "eventually" arrives your server process has probably
run out of file descriptors and crashed. Level 1 isn't good enough. So
now we have all learned to instead write

 # good modern Python style:
 def get_file_contents(path):
      with open(path) as handle:
          return handle.read()

and we have fancy tools like the ResourceWarning machinery to help us
catch these bugs.

Here's the analogous example for async generators. This is a useful,
realistic async generator, that lets us incrementally read from a TCP
connection that streams newline-separated JSON documents:

  async def read_json_lines_from_server(host, port):
      async for line in asyncio.open_connection(host, port)[0]:
          yield json.loads(line)

You would expect to use this like:

  async for data in read_json_lines_from_server(host, port):
      ...

BUT, with the current PEP 525 proposal, trying to use this generator
in this way is exactly analogous to the open(path).read() case: on
CPython it will work fine -- the generator object will leave scope at
the end of the 'async for' loop, cleanup methods will be called, etc.
But on PyPy, the weakref callback will not be triggered until some
arbitrary time later, you will "leak" file descriptors, and your
server will crash. For correct operation, you have to replace the
simple 'async for' loop with this lovely construct:

  async with aclosing(read_json_lines_from_server(host, port)) as ait:
      async for data in ait:
          ...

Of course, you only have to do this on loops whose iterator might
potentially hold resources like file descriptors, either currently or
in the future. So... uh... basically that's all loops, I guess? If you
want to be a good defensive programmer?

Conclusion: if you care about PyPy support then AFAICT the current PEP
525 cleanup design doesn't provide any benefits -- you still have to
write exactly the same cumbersome defensive code as you would if the
finalizer hooks were left out entirely. If anything, the PEP 525
finalizer hooks are actually harmful, because they encourage people to
write CPython-specific code that blows up in hard-to-test-for-ways on
PyPy.

As a practical note, this is particularly concerning since the
impression I got from PyCon this year is that PyPy's big production
use case is running big async network servers. Currently these are on
twisted, but PyPy landed asyncio/async/await support, like, last week:

  https://pypy35syntax.blogspot.com/

and on top of that they just got at least $250k in funding to further
polish their Python 3 support, so people are going to be actually
running code like my example on PyPy very soon now.

tl;dr: AFAICT this revision of PEP 525 is enough to make it work
reliably on CPython, but I have serious concerns that it bakes a
CPython-specific design into the language. I would prefer a design
that actually aims for "level 2" cleanup semantics (for example, [1])

-n

[1] https://mail.python.org/pipermail/python-ideas/2016-August/041868.html

-- 
Nathaniel J. Smith -- https://vorpus.org


More information about the Python-Dev mailing list