[Python-Dev] PEP 525, third round, better finalization

Yury Selivanov yselivanov.ml at gmail.com
Thu Sep 1 18:34:06 EDT 2016


Hi,

I've spent quite a while thinking and experimenting with PEP 525 trying 
to figure out how to make asynchronous generators (AG) finalization 
reliable.  I've tried to replace the callback for GCed with a callback 
to intercept first iteration of AGs.  Turns out it's very hard to work 
with weak-refs and make asyncio event loop to reliably track and 
shutdown all open AGs.

My new approach is to replace the 
"sys.set_asyncgen_finalizer(finalizer)" function with 
"sys.set_asyncgen_hooks(firstiter=None, finalizer=None)".

This design allows us to:

1. intercept first iteration of an AG.  That makes it possible for event 
loops to keep a weak set of all "open" AGs, and to implement a 
"shutdown" method to close the loop and close all AGs *reliably*.

2. intercept AGs GC.  That makes it possible to call "aclose" on GCed 
AGs to guarantee that 'finally' and 'async with' statements are properly 
closed.

3. in later Python versions we can add more hooks, although I can't 
think of anything else we need to add right now.

I'm posting below the only updated PEP section. The latest PEP revision 
should also be available on python.org shortly.

All new proposed changes are available to play with in my fork of 
CPython here: https://github.com/1st1/cpython/tree/async_gen


Finalization

------------

PEP 492 requires an event loop or a scheduler to run coroutines.
Because asynchronous generators are meant to be used from coroutines,
they also require an event loop to run and finalize them.

Asynchronous generators can have ``try..finally`` blocks, as well as
``async with``.  It is important to provide a guarantee that, even
when partially iterated, and then garbage collected, generators can
be safely finalized.  For example::

     async def square_series(con, to):
         async with con.transaction():
             cursor = con.cursor(
                 'SELECT generate_series(0, $1) AS i', to)
             async for row in cursor:
                 yield row['i'] ** 2

     async for i in square_series(con, 1000):
         if i == 100:
             break

The above code defines an asynchronous generator that uses
``async with`` to iterate over a database cursor in a transaction.
The generator is then iterated over with ``async for``, which interrupts
the iteration at some point.

The ``square_series()`` generator will then be garbage collected,
and without a mechanism to asynchronously close the generator, Python
interpreter would not be able to do anything.

To solve this problem we propose to do the following:

1. Implement an ``aclose`` method on asynchronous generators
    returning a special *awaitable*.  When awaited it
    throws a ``GeneratorExit`` into the suspended generator and
    iterates over it until either a ``GeneratorExit`` or
    a ``StopAsyncIteration`` occur.

    This is very similar to what the ``close()`` method does to regular
    Python generators, except that an event loop is required to execute
    ``aclose()``.

2. Raise a ``RuntimeError``, when an asynchronous generator executes
    a ``yield`` expression in its ``finally`` block (using ``await``
    is fine, though)::

         async def gen():
             try:
                 yield
             finally:
                 await asyncio.sleep(1)   # Can use 'await'.

                 yield                    # Cannot use 'yield',
                                          # this line will trigger a
                                          # RuntimeError.

3. Add two new methods to the ``sys`` module:
    ``set_asyncgen_hooks()`` and ``get_asyncgen_hooks()``.

The idea behind ``sys.set_asyncgen_hooks()`` is to allow event
loops to intercept asynchronous generators iteration and finalization,
so that the end user does not need to care about the finalization
problem, and everything just works.

``sys.set_asyncgen_hooks()`` accepts two arguments:

* ``firstiter``: a callable which will be called when an asynchronous
   generator is iterated for the first time.

* ``finalizer``: a callable which will be called when an asynchronous
   generator is about to be GCed.

When an asynchronous generator is iterated for the first time,
it stores a reference to the current finalizer.  If there is none,
a ``RuntimeError`` is raised.  This provides a strong guarantee that
every asynchronous generator object will always have a finalizer
installed by the correct event loop.

When an asynchronous generator is about to be garbage collected,
it calls its cached finalizer.  The assumption is that the finalizer
will schedule an ``aclose()`` call with the loop that was active
when the iteration started.

For instance, here is how asyncio is modified to allow safe
finalization of asynchronous generators::

    # asyncio/base_events.py

    class BaseEventLoop:

        def run_forever(self):
            ...
            old_hooks = sys.get_asyncgen_hooks()
sys.set_asyncgen_hooks(finalizer=self._finalize_asyncgen)
            try:
                ...
            finally:
                sys.set_asyncgen_hooks(*old_hooks)
                ...

        def _finalize_asyncgen(self, gen):
            self.create_task(gen.aclose())

The second argument, ``firstiter``, allows event loops to maintain
a weak set of asynchronous generators instantiated under their control.
This makes it possible to implement "shutdown" mechanisms to safely
finalize all open generators and close the event loop.

``sys.set_asyncgen_hooks()`` is thread-specific, so several event
loops running in parallel threads can use it safely.

``sys.get_asyncgen_hooks()`` returns a namedtuple-like structure
with ``firstiter`` and ``finalizer`` fields.



More information about the Python-Dev mailing list