Cancelling a coroutine from a signal handler?
I am trying to understand some unexpected behavior in asyncio. My goal is to use a custom signal handler to cleanly unwind an asyncio program that has many different tasks running. Here's a simplified test case: 1 import asyncio, logging, random, signal, sys 2 3 logging.basicConfig(level=logging.DEBUG) 4 logger = logging.getLogger() 5 6 async def main(): 7 try: 8 await asyncio.Event().wait() 9 except asyncio.CancelledError: 10 logger.info('cancelled main()') 11 # cleanup logic 12 13 def handle_sigint(signal, frame): 14 global sigint_count 15 global main_coro 16 sigint_count += 1 17 if sigint_count == 1: 18 logger.warn('Received interrupt: shutting down...') 19 main_coro.throw(asyncio.CancelledError()) 20 # missing event loop logic? 21 else: 22 logger.warn('Received 2nd interrupt: exiting!') 23 main_coro.throw(SystemExit(1)) 24 25 sigint_count = 0 26 signal.signal(signal.SIGINT, handle_sigint) 27 loop = asyncio.get_event_loop() 28 main_coro = main() 29 try: 30 loop.run_until_complete(main_coro) 31 except StopIteration: 32 logger.info('run_until_complete() finished') The main() function is a placeholder that represents some long running task, e.g. a server that is waiting for new connections. The handle_sigint() function is supposed to attempt to cancel main() so that it can gracefully exit, but if it receives a second interrupt, then the process exits immediately. Here's an example running the program and then typing Ctrl+C. $ python test.py DEBUG:asyncio:Using selector: EpollSelector ^CWARNING:root:Received interrupt: shutting down... INFO:root:cancelled main() INFO:root:run_until_complete() finished This works as I expect it to. Of course my cleanup logic (line 10) isn't actually doing anything. In a real server, I might want to send goodbye messages to connected clients. To mock this behavior, I'll modify line 11: 11 await asyncio.sleep(0) Surprisingly, now my cleanup code hangs: $ python test.py DEBUG:asyncio:Using selector: EpollSelector ^CWARNING:root:Received interrupt: shutting down... INFO:root:cancelled main() ^CWARNING:root:Received 2nd interrupt: exiting! Notice that the program doesn't exit after the first interrupt. It enters the exception handler and appears to hang on the await expression on line 11. I have to interrupt it a second time, which throws SystemExit instead. I puzzled over this for quite some time until I realized that I can force main() to resume by changing line 20: 20 main_coro.send(None) With this change, the interrupt causes the cleanup logic to run to completion and the program exits normally. Of course, if I add a second await expression: 11 await asyncio.sleep(0); await asyncio.sleep(0) Then I also have to step twice: 20 main_coro.send(None); main_coro.send(None) My mental model of how the event loop works is pretty poor, but I roughly understand that the event loop is responsible for driving coroutines. It appears here that the event loop has stopped driving my main() coroutine, and so the only way to force it to complete is to call send() from my code. Can somebody explain *why* the event loop is not driving my coroutine? Is this a bug or am I missing something conceptually? More broadly, handling KeyboardInterrupt in async code seems very tricky, but I also cannot figure out how to make this interrupt approach work. Is one of these better than the other? What is the best practice here? Would it be terrible to add `while True: main_coro.send(None)` to my signal handler? Thanks, Mark
On Tue, Apr 24, 2018 at 2:25 PM, Mark E. Haase <mehaase@gmail.com> wrote:
My mental model of how the event loop works is pretty poor, but I roughly understand that the event loop is responsible for driving coroutines. It appears here that the event loop has stopped driving my main() coroutine, and so the only way to force it to complete is to call send() from my code.
It hasn't stopped driving your main() coroutine – as far as it knows, main() is still waiting for the Event.wait() call to complete, and as soon as it does it will start iterating the coroutine again. You really, really, definitely should not be trying to manually iterate a coroutine object associate with a Task.
More broadly, handling KeyboardInterrupt in async code seems very tricky, but I also cannot figure out how to make this interrupt approach work. Is one of these better than the other? What is the best practice here? Would it be terrible to add `while True: main_coro.send(None)` to my signal handler?
Yes, it would be terrible :-). Instead of trying to throw exceptions manually, you should call the cancel() method on the Task object. (Of if you want to abort immediately because the previous control-C was ignored, use something like os._exit() or os.abort().) The other complication is that doing *anything* from a signal handler is fraught with peril, because of reentrancy issues. I actually don't think there are *any* functions in asyncio that are guaranteed to be safe to call from a signal handler. Looking at the code for Task.cancel, I definitely don't trust that it's safe to call from a signal handler. The simplest solution would be to use asyncio's native signal handler support instead of the signal module: https://docs.python.org/3/library/asyncio-eventloop.html#unix-signals However, there are some trade-offs: - it's not implemented on Windows - it relies on the event loop running. In particular, if the event loop is stalled (e.g. because some task got stuck in an infinite loop), then your signal handler will never be called, so your "emergency abort" code won't work. Alternatively, you can define a handler using signal.signal, and then arrange to re-enter the asyncio main loop yourself before calling Task.cancel. I believe that the only guaranteed-to-be-safe way to do this is: - in your signal handler, spawn a new thread (!) - from the new thread, call loop.call_soon_threadsafe(your_main_task.cancel) (Trio's version of call_soon_threadsafe *is* guaranteed to be both thread- and signal-safe, but asyncio's isn't, and in asyncio there are multiple event loop implementations so even if one happens to be signal-safe by chance you don't know about the others... also Trio handles control-C automatically so you don't need to worry about this in the first place. But I don't know how to port Trio's generic solution to asyncio :-(.) -n -- Nathaniel J. Smith -- https://vorpus.org
Perhaps it's good to distinguish between graceful shutdown signal (cancel all head/logical tasks, or even all tasks, let finally blocks run) and hard stop signal. In the past, synchronous code, I've used following paradigm: def custom_signal(): alarm(5) raise KeyboardInterrupt() Keyboard interrupt was chosen so that manual execution is stopped with ^C in the same way server process, this makes testing much easier :) Also it inherits from BaseException, which is nice. I think that something similar can be done for you asynchronous case -- graceful shutdown using asyncio builtin signal handling and hard stop using signal.SIG_DFL and signal number where that means termination. On 25 April 2018 at 09:54, Nathaniel Smith <njs@pobox.com> wrote:
On Tue, Apr 24, 2018 at 2:25 PM, Mark E. Haase <mehaase@gmail.com> wrote:
My mental model of how the event loop works is pretty poor, but I roughly understand that the event loop is responsible for driving coroutines. It appears here that the event loop has stopped driving my main() coroutine, and so the only way to force it to complete is to call send() from my code.
It hasn't stopped driving your main() coroutine – as far as it knows, main() is still waiting for the Event.wait() call to complete, and as soon as it does it will start iterating the coroutine again.
You really, really, definitely should not be trying to manually iterate a coroutine object associate with a Task.
More broadly, handling KeyboardInterrupt in async code seems very tricky, but I also cannot figure out how to make this interrupt approach work. Is one of these better than the other? What is the best practice here? Would it be terrible to add `while True: main_coro.send(None)` to my signal handler?
Yes, it would be terrible :-).
Instead of trying to throw exceptions manually, you should call the cancel() method on the Task object. (Of if you want to abort immediately because the previous control-C was ignored, use something like os._exit() or os.abort().)
The other complication is that doing *anything* from a signal handler is fraught with peril, because of reentrancy issues. I actually don't think there are *any* functions in asyncio that are guaranteed to be safe to call from a signal handler. Looking at the code for Task.cancel, I definitely don't trust that it's safe to call from a signal handler.
The simplest solution would be to use asyncio's native signal handler support instead of the signal module: https://docs.python.org/3/library/asyncio-eventloop.html#unix-signals However, there are some trade-offs: - it's not implemented on Windows - it relies on the event loop running. In particular, if the event loop is stalled (e.g. because some task got stuck in an infinite loop), then your signal handler will never be called, so your "emergency abort" code won't work.
Alternatively, you can define a handler using signal.signal, and then arrange to re-enter the asyncio main loop yourself before calling Task.cancel. I believe that the only guaranteed-to-be-safe way to do this is:
- in your signal handler, spawn a new thread (!) - from the new thread, call loop.call_soon_threadsafe(your_main_task.cancel)
(Trio's version of call_soon_threadsafe *is* guaranteed to be both thread- and signal-safe, but asyncio's isn't, and in asyncio there are multiple event loop implementations so even if one happens to be signal-safe by chance you don't know about the others... also Trio handles control-C automatically so you don't need to worry about this in the first place. But I don't know how to port Trio's generic solution to asyncio :-(.)
-n
-- Nathaniel J. Smith -- https://vorpus.org _______________________________________________ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
On Tue, Apr 24, 2018 at 9:54 PM, Nathaniel Smith <njs@pobox.com> wrote:
The simplest solution would be to use asyncio's native signal handler support instead of the signal module: https://docs.python.org/3/library/asyncio-eventloop.html#unix-signals
Ahh, wow, I don't know how I missed this. I've been obsessing over coroutines and event loops for hours, now I realize that I misunderstood the voodoo in the signal module. Thank you for pointing me in this direction! Alternatively, you can define a handler using signal.signal, and then
arrange to re-enter the asyncio main loop yourself before calling Task.cancel. I believe that the only guaranteed-to-be-safe way to do this is:
This is also an interesting approach that I will experiment with. I guess this solves problem #1 (works on Windows) but not #2 (task stuck in loop), right? (The latter is a feature of all cooperative multitasking systems, yeah?) Great blog post today! I really enjoy your writing style and Trio is really exciting. Cheers, Mark
On Wed, Apr 25, 2018, 06:34 Mark E. Haase <mehaase@gmail.com> wrote:
This is also an interesting approach that I will experiment with. I guess this solves problem #1 (works on Windows) but not #2 (task stuck in loop), right? (The latter is a feature of all cooperative multitasking systems, yeah?)
If a task is hogging the loop, then you won't be able to shut down politely using Task.cancel or similar. But if you're using signal.signal directly then it would mean that your signal handler would still *run* while the loop was blocked, so you'd at least have the option of escalating to os._exit or similar. I'm not sure I *really* advocate spawning a thread from your signal handler just to call one loop method, but, hey, at least you know your options :-).
Great blog post today! I really enjoy your writing style and Trio is really exciting.
Thanks! -n
participants (3)
-
Dima Tisnek
-
Mark E. Haase
-
Nathaniel Smith