[Python-ideas] PEP 525: Asynchronous Generators

Sat Aug 6 04:29:05 EDT 2016

Yury Selivanov schrieb am 03.08.2016 um 17:32:
> On 2016-08-03 2:45 AM, Stefan Behnel wrote:
>> Yury Selivanov schrieb am 03.08.2016 um 00:31:
>>> PEP 492 requires an event loop or a scheduler to run coroutines.
>>> Because asynchronous generators are meant to be used from coroutines,
>>> they also require an event loop to run and finalize them.
>> Well, or *something* that uses them in the same way as an event loop would.
>> Doesn't have to be an event loop.
> 
> Sure, I'm just using the same terminology PEP 492 was defined with.
> We can say "coroutine runner" instead of "event loop".
> 
>>> 1. Implement an ``aclose`` method on asynchronous generators
>>>     returning a special *awaitable*.  When awaited it
>>>     throws a ``GeneratorExit`` into the suspended generator and
>>>     iterates over it until either a ``GeneratorExit`` or
>>>     a ``StopAsyncIteration`` occur.
>>>
>>>     This is very similar to what the ``close()`` method does to regular
>>>     Python generators, except that an event loop is required to execute
>>>     ``aclose()``.
>> I don't see a motivation for adding an "aclose()" method in addition to the
>> normal "close()" method. Similar for send/throw. Could you elaborate on
>> that?
> 
> There will be no "close", "send" and "throw" defined for asynchronous
> generators. Only their asynchronous equivalents.
> [...]
> Since all this is quite different from sync generators' close
> method, it was decided to have a different name for this method
> for async generators: aclose.

Ok, why not. Different names for similar things that behave differently enough.

> The 'agen' generator, on the lowest level of generators implementation
> will yield two things -- 'spam', and a wrapped 123 value.  Because
> 123 is wrapped, the async generators machinery can distinguish async
> yields from normal yields.

This is actually going to be tricky to backport for Cython (which supports
Py2.6+) since it seems to depend on a globally known C implemented wrapper
object type. We'd have to find a way to share that across different
packages and also across different Cython versions (types are only shared
within the same Cython version). I guess we'd have to store a reference to
that type in some well hidden global place somewhere, and then never touch
its implementation again...

Is that wrapper type going to be exposed anywhere in the Python visible
world, or is it purely internal? (Not that I see a use case for making it
visible to Python code...)

BTW, why wouldn't "async yield from" work if the only distinction point is
whether a yielded object is wrapped or not? That should work at any level
of delegation, shouldn't it?

>>> 3. Add two new methods to the ``sys`` module:
>>>     ``set_asyncgen_finalizer()`` and ``get_asyncgen_finalizer()``.
>>>
>>> The idea behind ``sys.set_asyncgen_finalizer()`` is to allow event
>>> loops to handle generators finalization, so that the end user
>>> does not need to care about the finalization problem, and it just
>>> works.
>>>
>>> When an asynchronous generator is iterated for the first time,
>>> it stores a reference to the current finalizer.  If there is none,
>>> a ``RuntimeError`` is raised.  This provides a strong guarantee that
>>> every asynchronous generator object will always have a finalizer
>>> installed by the correct event loop.
>>>
>>> When an asynchronous generator is about to be garbage collected,
>>> it calls its cached finalizer.  The assumption is that the finalizer
>>> will schedule an ``aclose()`` call with the loop that was active
>>> when the iteration started.
>>>
>>> For instance, here is how asyncio is modified to allow safe
>>> finalization of asynchronous generators::
>>>
>>>     # asyncio/base_events.py
>>>
>>>     class BaseEventLoop:
>>>
>>>         def run_forever(self):
>>>             ...
>>>             old_finalizer = sys.get_asyncgen_finalizer()
>>>             sys.set_asyncgen_finalizer(self._finalize_asyncgen)
>>>             try:
>>>                 ...
>>>             finally:
>>>                 sys.set_asyncgen_finalizer(old_finalizer)
>>>                 ...
>>>
>>>         def _finalize_asyncgen(self, gen):
>>>             self.create_task(gen.aclose())
>>>
>>> ``sys.set_asyncgen_finalizer()`` is thread-specific, so several event
>>> loops running in parallel threads can use it safely.
>> Phew, this adds quite some complexity and magic. That is a problem. For
>> one, this uses a global setup, so There Can Only Be One of these
>> finalizers. ISTM that if special cleanup is required, either the asyncgen
>> itself should know how to do that, or we should provide some explicit API
>> that does something when *initialising* the asyncgen. That seems better
>> than doing something global behind the users' back. Have you considered
>> providing some kind of factory in asyncio that wraps asyncgens or so?
> 
> set_asyncgen_finalizer is thread-specific, so you can have one
> finalizer set up per thread.
> 
> The reference implementation actually integrates this all into
> asyncio.  The idea is to setup loop async gens finalizer just
> before the loop starts, and reset the finalizer to the previous
> one (usually it's None) just before it stops.
> 
> The finalizer is attached to a generator when it is yielding
> for the first time -- this guarantees that every generators will
> have a correct finalizer attached to it.
> 
> It's not right to attach the finalizer (or wrap the generator)
> when the generator is initialized.  Consider this code:
> 
>    async def foo():
>      async with smth():
>        yield
> 
>    async def coro(gen):
>      async for i in foo():
>        ...
> 
>    loop.run_until_complete(coro(foo()))
> 
> ^^ In the above example, when the 'foo()' is instantiated, there
> is no loop or finalizers set up at all.  BUT since a loop (or
> coroutine wrapper) is required to iterate async generators, there
> is a strong guarantee that it *will* present on the first iteration.

Correct. And it also wouldn't help to generally extend the Async-Iterator
protocol with an aclose() method because ending an (async-)for loop doesn't
mean we are done with the async iterator, so this would just burden the
users with unnecessary cleanup handling. That's unfortunate...

> Regarding "async gen itself should know how to cleanup" -- that's not
> possible. async gen could just have an async with block and then
> GCed (after being partially consumed).  Users won't expect to do
> anything besides using try..finally or async with, so it's the
> responsibility of the coroutine runner to cleanup async gen.  Hence
> 'aclose' is a coroutine, and hence this set_asyncgen_finalizer API
> for coroutine runners.
> 
> This is indeed the most magical part of the proposal.  Although it's
> important to understand that the regular Python users will likely
> never encounter this in their life -- finalizers will be set up
> by the framework they use (asyncio, Tornado, Twisted, you name it).

I think my main problem is that you keep speaking about event loops (of
which There Can Be Only One, by design), whereas coroutines are a much more
general concept and I cannot overlook all possible forms of using them in
the future. What I would like to avoid is the case where we globally
require setting up one finalizer handler (or context), and then prevent
users from doing their own cleanup handling in some module context
somewhere. It feels to me like there should be some kind of stacking for
this (which in turn feels like a context manager) in order to support
adapters and the like that need to do their own cleanup handling (or
somehow intercept the global handling), regardless of what else is running.

But I am having a hard time trying to come up with an example where the
thread context really doesn't work. Even if an async generator is shared
across multiple threads, then those threads would most likely be controlled
by the coroutine runner, which would simply set up a proper finalizer
context for each of the threads that points back to itself, so that it
would not be the pure chance of first iteration to determine who gets to
clean up afterwards.

It's possible that such (contrieved?) cases can be handled in one way or
another by changing the finalization itself, instead of changing the
finalizer context. And that would be done by wrapping async iterators in a
delegator that uses try-finally, be it via a decorator or an explicit wrapper.

It's just difficult to come to a reasonable conclusion at this level of
uncertainty. That's why I'm bouncing this consideration here, in case
others have an idea how this problem could be avoided all together.

Stefan