[Python-ideas] Importance of noticing new signals

Fri Nov 3 11:22:07 EDT 2017

On Thu, Nov 2, 2017 at 2:57 PM, Koos Zevenhoven <k7hoven at gmail.com> wrote:

> On Thu, Nov 2, 2017 at 2:22 PM, Antoine Pitrou <solipsis at pitrou.net>
> wrote:
>
>> On Wed, 1 Nov 2017 20:29:56 +0200
>> Koos Zevenhoven <k7hoven at gmail.com> wrote:
>> >
>> > From a correctness point of view, that is absolutely great: if
>> > PyErr_CheckSignals() is called, it is guaranteed to notice a new signal
>> > regardles of how small the number of picoseconds after the `is_tripped`
>> > flag has been set.  But is that really important?
>>
>> I was going to answer "no"... but actually yes.  The important case is
>> the event loop type case:
>>
>>     while (1) {
>>         select([some file descriptors]);
>>         if (errno == EINTR) {
>>             PyErr_CheckSignals();
>>             if (PyErr_Occurred()) break;
>>         }
>>         /* continue select()ing... */
>>     }
>>
>> Now say at a given point in time, no fds are actually active (or even
>> waited for), but some signal arrives (SIGINT perhaps).
>> select() is woken up and returns with errno EINTR.  Two things then can
>> happen:
>>
>> - if PyErr_CheckSignals() notices the signal, it will run the relevant
>>   signal handler, which may raise an exception and trigger the select()
>>   loop to exit (e.g. SIGINT would raise KeyboardInterrupt)
>>
>> - if PyErr_CheckSignals() misses the signal, the loop will enter again,
>>   and select() may sleep for an infinite amount of time
>>
>>
> Oh! So that would provide a proper reason for my just-in-case decision to
> name the faster near-equivalent functionality PyErr_PROBE_SIGNALS instead
> of PyErr_CHECK_SIGNALS.
>
> Cross-referencing to that (thread about making Ctrl-C "always" work):
>
> https://mail.python.org/pipermail/python-ideas/2017-November/047631.html
>
>
> Of course, what we're doing with select() above can already apply for
>> read() or other interruptible syscalls waiting for outside data... and
>> that pattern is present a lot internally, especially since
>> 
>> PEP 475 ("Retry system calls failing with EINTR").
>>
>
>> Now, is the "sequentially consistent" ordering on is_tripped sufficient
>> to guarantee that signals won't be missed on a weak-ordering platform?
>> I *think* so, but an expert would need to check that code (or we
>> cross our fingers and wait for a hypothetical bug report).
>>
>>
> I think the question is: Do we know for sure that is_tripped has been
> stored using sequentially consistent ordering prior to the call to
> PyErr_CheckSignals(), even if an interruptible syscall is involved? I
> suppose so?
>
> (But this is a separate question from the problem I was solving, of
> course. I'm not proposing to remove PyErr_CheckSignals())
>
>
To continue on this: If I understand your question correctly, I'm hesitant
to make strong statements about it. It would be interesting to know what we
can assume about signals that happen at the same time with system calls,
given the various platforms supported. Unfortunately, I don't know that.

Maybe you are concerned about whether some nuances and recent changes to
signal handling could lead to harmful change in behavior in some meaningful
edge cases? I can at least say that my PyErr_PROBE_SIGNALS() proposal does
not introduce such issues, if the difference is documented properly:

"""PyErr_PROBE_SIGNALS() is meant for performance-critical code and is not
100% guaranteed to always see the most recent signals. If a signal being
deferred is a concern, use PyErr_CheckSignals() instead."""

But more generally, if we could assume that trip_signal() and
PyErr_CheckSignals() always happen in the same "CPU thread", then we
wouldn't need pyatomic.h here at all. The fact that the code currently
assumes that all Python signal handlers should run in the same Python
thread takes care of some of these concerns without needing locks etc.

Some other concerns I can imagine by looking at some of the code in
Modules/signalmodule.c:

(1) If trip_signal() and PyErr_CheckSignals() are executed concurrently,
trip_signal() might set a new signal flag (using relaxed memory order)
while PyErr_CheckSignals is still running. Then if PyErr_CheckSignals()
sets is_tripped to zero *after* trip_signal() sets it to 1, then the new
signal might be deferred until the next time *some* new signal arrives,
which could take an arbitrarily long amount of time, I suppose.

However, it looks like this problem has been solved by always setting
is_tripped to zero (with strict SEQ_CST memory order) *before* handling the
individual signals. So if trip_signal() has already set is_tripped to 1
(with SEQ_CST), that prevents PyErr_CheckSignals from setting is_tripped to
zero (with SEQ_CST) *and not* handling the signal. If trip_signal() has not
yet finished, and therefore not set is_tripped to 1 yet, it will cause the
next call to PyErr_CheckSignals to catch the signal.

(2) Again, if trip_signal() and PyErr_CheckSignals() execute concurrently,
it might happen that PyErr_CheckSignals() handles the signal *before*
trip_signal sets is_tripped to 1. That would cause the next call to
PyErr_CheckSignals() to think there's an unhandled signal, but will most
likely not find one, because it was already handled on the previous call.
But that just effectively means that nothing is done. In fact, there's a
comment in the code that mentions this.

(3, 4, ...) Of course there's more to take care of there, but that's
unrelated to my PyErr_PROBE_SIGNALS() proposal. Anyway, at least (1) and
(2) seem to already have been taken care of, and I assume you are aware of
that.

––Koos

-- 
+ Koos Zevenhoven + http://twitter.com/k7hoven +
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20171103/e1d7bdb8/attachment.html>