Protecting finally clauses of interruptions

Hi, I'd like to propose a way to protect `finally` clauses from interruptions (either by KeyboardInterrupt or by timeout, or any other way). I think frame may be extended to have `f_in_finally` attribute (or pick a better name). Internally it should probably be implemented as a counter of nested finally clauses, but interface should probably expose only boolean attribute. For `__exit__` method some flag in `co_flags` should be introduced, which says that for whole function `f_in_finally` should be true. Having this attribute you can then inspect stack and check whether it's safe to interrupt it or not. Coroutine library which interrupts by timeout, can then sleep a bit and try again (probably for finite number of retries). For signal handler there are also several options to wait when thread escapes finally clause: use another thread, use alert signal, use sys.settrace, or exit only inside main loop. To be clear: I do not propose to change default SIGINT behavior, only to implement a frame flag, and give library developers experiment with the rest. -- Paul

On 2012-04-02, at 3:43 PM, Paul Colomiets wrote:
Paul, First of all sorry for not replying to your previous email in the thread. I've been thinking about the mechanism that will be both useful for thread interruption + for the new emerging coroutine libraries. And I think that we need to draft a PEP. Your current approach with only 'f_in_finally' flag is a half measure, as you will have to somehow monitor frame execution. I think a better solution would be to: 1. Implement a mechanism to throw exceptions in running threads. It should be possible to wake up thread if it waits on a lock, or any other syscall. 2. Add 'f_in_finally' counter, as you proposed. 3. Either add a special base exception, that can be thrown in a currently executing frame to interrupt it, or add a special method to frame object 'f_interrupt()'. Once a frame is attempted to be interrupted, it checks its 'f_in_finally' counter. If it is 0, then throw exception, if not - wait till it sets back to 0 and throw exception immediately. This approach would give you enough flexibility to cover the following cases: 1. Thread interruption 2. Greenlet-based coroutines (throw exception in your event hub) 3. Generator-based coroutines Plus, proper 'finally' statements execution will be guaranteed by the interpreter. - Yury

Hi Yury, On Mon, Apr 2, 2012 at 11:37 PM, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
1. Implement a mechanism to throw exceptions in running threads. It should be possible to wake up thread if it waits on a lock, or any other syscall.
It's complex, because if thread waits on a lock you can't determine if it's interrupted after lock or before. E.g. it's common to write: l.lock() try: ... finally: l.unlock() Which will break if you interrupted just after lock is acquired.
2. Add 'f_in_finally' counter, as you proposed.
Ack
Not sure how it supposed to work. If it's coroutine it may yield while in finally, and you want it be interrupted only when it exits from finally. -- Paul

On 2012-04-02, at 4:49 PM, Paul Colomiets wrote:
Yes, that's a good question. However, I fail to see how just adding 'f_in_finally' solves the problem.
And what's the problem with that? It should be able to yield in its finally freely. @coroutine def read_data(connection): try: yield connection.recv() finally: yield connection.close() print('this shouldn't be printed if a timeout occurs') yield read_data().with_timeout(0.1) In the above example, if 'connection.recv()' takes longer than 0.1s to execute, the scheduler (trampoline) should interrupt the coroutine, 'connection.abort()' line will be executed, and once connection is aborted, it should stop the coroutine immediately. As of now, if you throw an exception while generator is in its 'try' block, everything will work as I explained. The interpreter will execute the 'finally' block, and propagate the exception at the end of it. However, if you throw an exception while generator in its 'finally' block (!), then your coroutine will be aborted too early. With your 'f_in_finally' flag, scheduler simply won't try to interrupt the coroutine, but, then the 'print(...)' line will be executed (!!) (and it shouldn't really). So, we need to shift the control of when a frame is best to be interrupted to the interpreter, not the user code. - Yury

Hi Yury,
Same with open files, and with all other kinds of contexts. I'd go he route of making __enter__ also uninterruptable (and make timeout inside a lock itself). On Tue, Apr 3, 2012 at 12:15 AM, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
You've probably not explained your proposal well. If I call frame.f_interrupt() what should it do? Return anything yield from a generator? And how you supposed to continue generator iteration in this case? Or are you going to iterate result of `f_interrupt()`? What it should do if it's not topmost frame? In all my use cases it doesn't matter if "print" is executed, just like it doesn't matter if timeout occured after 1000 ms or after 1001 or 1010 ms or even after 1500 ms as it actually could. So sleeping a bit and trying again is OK. You need to make all __exit__ and finally clauses fast, but it usually not a problem. -- Paul

With Python 3.3, you can easily write a context manager disabling interruptions using signal.pthread_sigmask(). If a signal is send, the signal will be waiting in a queue, and the signal handler will be called when the signals are unblocked. (On some OSes, the signal handler is not called immediatly.) pthread_sigmask() only affects the current thread. If you have two threads, and you block all signals in thread A, the C signal handler will be called in the thread B. But if I remember correctly, the Python signal handler is always called in the main thread. pthread_sigmask() is not available on all platforms (e.g. not on Windows), and some OSes have a poor support of signals+threads (e.g. OpenBSD and old versions of FreeBSD). Calling pthread_sigmask() twice (enter and exit the final block) has a cost, I don't think that it should be done by default. It may also have unexpected behaviours. I prefer to make it explicit. -- You may hack ceval.c to not call the Python signal handler in a final block, but system calls will still be interrupted (EINTR). Victor

Hi Victor, On Sat, Apr 7, 2012 at 1:04 PM, Victor Stinner <victor.stinner@gmail.com> wrote:
And now you need to patch every library which happens to use `finally` statement, to make it work. Doesn't seem to be realistic.
You may hack ceval.c to not call the Python signal handler in a final block, but system calls will still be interrupted (EINTR).
This is not a problem for networking IO as it is always prepared for EINTR, and posix mutexes never return EINTR. So for the primary use-cases it's ok. But at least I'll add this consideration to the PEP. -- Paul

On 2012-04-02, at 3:43 PM, Paul Colomiets wrote:
Paul, First of all sorry for not replying to your previous email in the thread. I've been thinking about the mechanism that will be both useful for thread interruption + for the new emerging coroutine libraries. And I think that we need to draft a PEP. Your current approach with only 'f_in_finally' flag is a half measure, as you will have to somehow monitor frame execution. I think a better solution would be to: 1. Implement a mechanism to throw exceptions in running threads. It should be possible to wake up thread if it waits on a lock, or any other syscall. 2. Add 'f_in_finally' counter, as you proposed. 3. Either add a special base exception, that can be thrown in a currently executing frame to interrupt it, or add a special method to frame object 'f_interrupt()'. Once a frame is attempted to be interrupted, it checks its 'f_in_finally' counter. If it is 0, then throw exception, if not - wait till it sets back to 0 and throw exception immediately. This approach would give you enough flexibility to cover the following cases: 1. Thread interruption 2. Greenlet-based coroutines (throw exception in your event hub) 3. Generator-based coroutines Plus, proper 'finally' statements execution will be guaranteed by the interpreter. - Yury

Hi Yury, On Mon, Apr 2, 2012 at 11:37 PM, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
1. Implement a mechanism to throw exceptions in running threads. It should be possible to wake up thread if it waits on a lock, or any other syscall.
It's complex, because if thread waits on a lock you can't determine if it's interrupted after lock or before. E.g. it's common to write: l.lock() try: ... finally: l.unlock() Which will break if you interrupted just after lock is acquired.
2. Add 'f_in_finally' counter, as you proposed.
Ack
Not sure how it supposed to work. If it's coroutine it may yield while in finally, and you want it be interrupted only when it exits from finally. -- Paul

On 2012-04-02, at 4:49 PM, Paul Colomiets wrote:
Yes, that's a good question. However, I fail to see how just adding 'f_in_finally' solves the problem.
And what's the problem with that? It should be able to yield in its finally freely. @coroutine def read_data(connection): try: yield connection.recv() finally: yield connection.close() print('this shouldn't be printed if a timeout occurs') yield read_data().with_timeout(0.1) In the above example, if 'connection.recv()' takes longer than 0.1s to execute, the scheduler (trampoline) should interrupt the coroutine, 'connection.abort()' line will be executed, and once connection is aborted, it should stop the coroutine immediately. As of now, if you throw an exception while generator is in its 'try' block, everything will work as I explained. The interpreter will execute the 'finally' block, and propagate the exception at the end of it. However, if you throw an exception while generator in its 'finally' block (!), then your coroutine will be aborted too early. With your 'f_in_finally' flag, scheduler simply won't try to interrupt the coroutine, but, then the 'print(...)' line will be executed (!!) (and it shouldn't really). So, we need to shift the control of when a frame is best to be interrupted to the interpreter, not the user code. - Yury

Hi Yury,
Same with open files, and with all other kinds of contexts. I'd go he route of making __enter__ also uninterruptable (and make timeout inside a lock itself). On Tue, Apr 3, 2012 at 12:15 AM, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
You've probably not explained your proposal well. If I call frame.f_interrupt() what should it do? Return anything yield from a generator? And how you supposed to continue generator iteration in this case? Or are you going to iterate result of `f_interrupt()`? What it should do if it's not topmost frame? In all my use cases it doesn't matter if "print" is executed, just like it doesn't matter if timeout occured after 1000 ms or after 1001 or 1010 ms or even after 1500 ms as it actually could. So sleeping a bit and trying again is OK. You need to make all __exit__ and finally clauses fast, but it usually not a problem. -- Paul

With Python 3.3, you can easily write a context manager disabling interruptions using signal.pthread_sigmask(). If a signal is send, the signal will be waiting in a queue, and the signal handler will be called when the signals are unblocked. (On some OSes, the signal handler is not called immediatly.) pthread_sigmask() only affects the current thread. If you have two threads, and you block all signals in thread A, the C signal handler will be called in the thread B. But if I remember correctly, the Python signal handler is always called in the main thread. pthread_sigmask() is not available on all platforms (e.g. not on Windows), and some OSes have a poor support of signals+threads (e.g. OpenBSD and old versions of FreeBSD). Calling pthread_sigmask() twice (enter and exit the final block) has a cost, I don't think that it should be done by default. It may also have unexpected behaviours. I prefer to make it explicit. -- You may hack ceval.c to not call the Python signal handler in a final block, but system calls will still be interrupted (EINTR). Victor

Hi Victor, On Sat, Apr 7, 2012 at 1:04 PM, Victor Stinner <victor.stinner@gmail.com> wrote:
And now you need to patch every library which happens to use `finally` statement, to make it work. Doesn't seem to be realistic.
You may hack ceval.c to not call the Python signal handler in a final block, but system calls will still be interrupted (EINTR).
This is not a problem for networking IO as it is always prepared for EINTR, and posix mutexes never return EINTR. So for the primary use-cases it's ok. But at least I'll add this consideration to the PEP. -- Paul
participants (3)
-
Paul Colomiets
-
Victor Stinner
-
Yury Selivanov