So here's an interesting issue I just discovered while experimenting with async generators. It caught me by surprise, anyway. Maybe this was already obvious to everyone else. But I wanted to get some more perspectives. Starting axiom: async functions / async generators need to be prepared for the case where they get garbage collected before being run to completion, because, well... this is a thing that can happen. Currently, the way garbage collection is handled is that their __del__ method calls their .close() method, which does something like: class GeneratorType: ... def close(self): try: self.throw(GeneratorExit) except GeneratorExit, StopIteration: return # it worked, all is good except: raise # double-fault, propagate else: raise RuntimeError("generator ignored GeneratorExit") (see PEP 342). So far, so obvious -- an async function that gets a GeneratorExit has to propagate that exception immediately. This is a regular method, not an async method, because it Dismaying conclusion: inside an async function / async generator, finally: blocks must never yield (and neither can __aexit__ callbacks), because they might be invoked during garbage collection. For async functions this is... arguably a problem but not super urgent, because async functions rarely get garbage collected without being run to completion. For async generators it's a much bigger problem. There's some more discussion, and a first sketch at conventions we might want to use for handling this, here: https://github.com/dabeaz/curio/issues/70 -n -- Nathaniel J. Smith -- https://vorpus.org
On 6 Jul 2016, at 01:56, Nathaniel Smith <njs@pobox.com> wrote:
There's some more discussion, and a first sketch at conventions we might want to use for handling this, here:
This feels like a problem with very few good solutions. Finalizers (e.g. __del__ and weakref callbacks) obviously cannot await for anything (they have no coroutine runner to yield to, and there is currently no Python API for handing off to a coroutine runner to execute your finalizer). That strongly suggests that all cleanup inside coroutines and async generators must be synchronous. This is especially problematic given the existence of “async with”, which nominally promises to do asynchronous cleanup. What’s not entirely clear to me is why we need __aexit__ to *actually* be an async function. The example in curio is socket closure, which seems to be like it absolutely does not need to be awaitable. Why can’t close() just tell the event loop (or curio kernel) that I’m done with the socket, and to clean up that socket on its own time (again, this is what Twisted does). In the worst case you could rule that __aexit__ cannot use coroutines, but of course it may continue to use Futures and other fun things. I know that callbacks aren’t The Asyncio Way, but they have their uses, and this is probably one of them. Allowing Futures/Deferreds allows your __aexit__ to do some actual I/O if that’s required (e.g. send a HTTP/2 GOAWAY frame). The only reason I can think of that __aexit__ needs to be a coroutine is to guarantee that the resource in question is genuinely cleaned up by the time the with block is exited. It is not entirely clear to me what the value of this guarantee is: does anyone have a good use-case for it that doesn’t seem like it violates the spirit of the context manager? That leaves us with finally, and here I have no good solution except that finally inside generators has *always* been a problem. However, in this case, I think we’ve got a good analogy. In synchronous generators, if you yield inside a finally *and* leak your coroutine, the interpreter moans at you. I don’t see any reason not to treat await/yield from inside finally in exactly the same way: if you do it, you eventually explode. Basically, there doesn’t seem to be an obvious way to make garbage collection of coroutines work while allowing them to be coroutines without having some way to register a coroutine runner with the garbage collector. That strikes me as a *terrible* idea (e.g. you may have multiple event loops, which requires that you register a unique coroutine runner per coroutine). Therefore, the only logical thing to do is to have only synchronous functions invoked in cleanup methods (e.g. __aexit__ and finally), and if those need to do some form of asynchronous I/O they need to use a Future-like construct to actually achieve it. Cory
On Jul 5, 2016, at 8:56 PM, Nathaniel Smith <njs@pobox.com> wrote: [..] Starting axiom: async functions / async generators need to be prepared for the case where they get garbage collected before being run to completion, because, well... this is a thing that can happen. [..] Dismaying conclusion: inside an async function / async generator, finally: blocks must never yield (and neither can __aexit__ callbacks), because they might be invoked during garbage collection.
I agree with David here: coroutines are always running under a scheduler which, in one way or another, keeps strong references to them. In curio it’s a global table of all tasks, in asyncio it’s a chain of references to callbacks of Tasks/Futures. The only time a coroutine can be GC’ed in asyncio is when the caller did not use ‘await’ on it. A running coroutine being GC’ed is an exceptional situation, that means that there is a bug in your scheduler. And people actually rely on this thing. It’s not so much about __aexit__ returning an awaitable, it’s about people writing code that awaits in ‘finally’ statements. That’s why I’m big -1 on changing __aexit__. If we change __aexit__, we should also prohibit awaiting in finally blocks, which is not an option.
For async functions this is... arguably a problem but not super urgent, because async functions rarely get garbage collected without being run to completion. For async generators it's a much bigger problem.
I’m not sure I understand why it’d a problem for async generators. Since coroutines shouldn’t ever be GC’ed while running, async generators generally won’t be GC’ed while running too, because they will have a strong ref from the running coroutine. I think to makes things simple, we shouldn’t have a ‘close()’ method on async generators at all. When a running async generator is GC’ed we’ll make the interpreter to issue a warning. We might want to add an ‘aclose()’ coroutine method, which will throw a GeneratorExit exception (or GeneratorAsyncExit) with the following semantics: 1. If the running async generator ignores GeneratorAsyncExit and keeps ‘async yielding’, we throw a RuntimeError. 2. If the running async generator receives a GeneratorAsyncExit exception outside of a try..finally block, the async generator is closed silently. 3. If the running async generator receives a GeneratorAsyncExit exception inside a ‘finally’ block, it will be able to await on any number of coroutines inside that block. Thanks, Yury
On Jul 5, 2016, at 5:56 PM, Nathaniel Smith <njs@pobox.com> wrote:
Starting axiom: async functions / async generators need to be prepared for the case where they get garbage collected before being run to completion, because, well... this is a thing that can happen.
This thread is really confusing, so I'm going to try to just attack this axiom (which isn't really axiomatic, it's a conclusion drawn from a few other properties of the interpreter and the event loop) and see if it holds :-). Yury already pointed out that coroutines run under a scheduler that keeps strong references to them. Backing that up a little bit - by definition, if your coroutine is not strongly referenced by a scheduler, it can't be doing anything interesting; nobody's going to call .send on it. This is why 'foo()' is a no-op whereas 'await foo()' actually causes something to happen. Similarly, this is why you need Task() to do something in parallel, and you can't just toss a coroutine out into the void. Furthermore, Task.cancel()'s documentation holds another clue about asynchronous cleanup: "Unlike Future.cancel(), this does not guarantee that the task will be cancelled: the exception might be caught and acted upon, delaying cancellation of the task or preventing cancellation completely". Deferred.cancel() behaves in much the same way, for the same reason. It is explicitly allowed that an asynchronous task do asynchronous clean-up. Now, if you have a task that was scheduled but has come forcibly detached from its scheduler, you are in a bit of a mess. But this is an inherently unresolvable mess, for the same reason that Thread.kill() is an inherently unresolvable mess: you cannot forcibly terminate a thread of execution and end up with your program in a known state. It's OK to kill a generator (as distinct from a 'coroutine' in the sense that it does not expect .send to be called on it) from the GC because a generator is just yielding values and since it doesn't expect .send it can't expect to keep running, and it can do purely synchronous try/finally. But as soon as you are allowed to invoke asynchronous work, you have to be allowed to let that asynchronous work complete. So: async functions should not and cannot be prepared for the case where they get garbage collected; this is not a thing that can happen unless the coroutine scheduler is damaged beyond repair and it's time to crash your process. -glyph
I tried to implement asyncronious generators based on asyncio recently, you can see result here: https://github.com/germn/aiogen I also faced problem with cleanup. First thing is where to call `.close` (or async `.aclose` for AG since it raises exception inside async function). While regular generator calls it inside `__del__` as I understand there's no guarantee AG's event loop wouldn't be closed at this moment. I think AG can be closed at the moment it's parent task done. Since this is non-async callback we can start task to call `.aclose`: https://github.com/germn/aiogen/blob/master/aiogen/agenerator.py#L35 Nothing awaits for cleanup task so we should sure that is would be finished before event loop is closed. I found nothings better than to decorate event loop's `.close` method: https://github.com/germn/aiogen/blob/master/aiogen/agenerator.py#L52 That all doesn't look like ideal solution, but it works.
participants (5)
-
Cory Benfield
-
Glyph Lefkowitz
-
Nathaniel Smith
-
Yury Selivanov
-
Герасимов Михаил