On Thu, Nov 2, 2017 at 2:57 PM, Koos Zevenhoven <k7hoven@gmail.com> wrote:

On Thu, Nov 2, 2017 at 2:22 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
On Wed, 1 Nov 2017 20:29:56 +0200
Koos Zevenhoven <k7hoven@gmail.com> wrote:
>
> From a correctness point of view, that is absolutely great: if
> PyErr_CheckSignals() is called, it is guaranteed to notice a new signal
> regardles of how small the number of picoseconds after the `is_tripped`
> flag has been set. But is that really important?

I was going to answer "no"... but actually yes. The important case is
the event loop type case:

while (1) {
select([some file descriptors]);
if (errno == EINTR) {
PyErr_CheckSignals();
if (PyErr_Occurred()) break;
}
/* continue select()ing... */
}

Now say at a given point in time, no fds are actually active (or even
waited for), but some signal arrives (SIGINT perhaps).
select() is woken up and returns with errno EINTR. Two things then can
happen:

- if PyErr_CheckSignals() notices the signal, it will run the relevant
signal handler, which may raise an exception and trigger the select()
loop to exit (e.g. SIGINT would raise KeyboardInterrupt)

- if PyErr_CheckSignals() misses the signal, the loop will enter again,
and select() may sleep for an infinite amount of time

Oh! So that would provide a proper reason for my just-in-case decision to name the faster near-equivalent functionality PyErr_PROBE_SIGNALS instead of PyErr_CHECK_SIGNALS.

Cross-referencing to that (thread about making Ctrl-C "always" work):

https://mail.python.org/pipermail/python-ideas/2017-November/047631.html

Of course, what we're doing with select() above can already apply for
read() or other interruptible syscalls waiting for outside data... and
that pattern is present a lot internally, especially since

PEP 475 ("Retry system calls failing with EINTR").

Now, is the "sequentially consistent" ordering on is_tripped sufficient
to guarantee that signals won't be missed on a weak-ordering platform?
I *think* so, but an expert would need to check that code (or we
cross our fingers and wait for a hypothetical bug report).

I think the question is: Do we know for sure that is_tripped has been stored using sequentially consistent ordering prior to the call to PyErr_CheckSignals(), even if an interruptible syscall is involved? I suppose so?

(But this is a separate question from the problem I was solving, of course. I'm not proposing to remove PyErr_CheckSignals())

To continue on this: If I understand your question correctly, I'm hesitant to make strong statements about it. It would be interesting to know what we can assume about signals that happen at the same time with system calls, given the various platforms supported. Unfortunately, I don't know that.

Maybe you are concerned about whether some nuances and recent changes to signal handling could lead to harmful change in behavior in some meaningful edge cases? I can at least say that my PyErr_PROBE_SIGNALS() proposal does not introduce such issues, if the difference is documented properly: 

"""PyErr_PROBE_SIGNALS() is meant for performance-critical code and is not 100% guaranteed to always see the most recent signals. If a signal being deferred is a concern, use PyErr_CheckSignals() instead."""

But more generally, if we could assume that trip_signal() and PyErr_CheckSignals() always happen in the same "CPU thread", then we wouldn't need pyatomic.h here at all. The fact that the code currently assumes that all Python signal handlers should run in the same Python thread takes care of some of these concerns without needing locks etc.

Some other concerns I can imagine by looking at some of the code in Modules/signalmodule.c:

(1) If trip_signal() and PyErr_CheckSignals() are executed concurrently, trip_signal() might set a new signal flag (using relaxed memory order) while PyErr_CheckSignals is still running. Then if PyErr_CheckSignals() sets is_tripped to zero *after* trip_signal() sets it to 1, then the new signal might be deferred until the next time *some* new signal arrives, which could take an arbitrarily long amount of time, I suppose.

However, it looks like this problem has been solved by always setting is_tripped to zero (with strict SEQ_CST memory order) *before* handling the individual signals. So if trip_signal() has already set is_tripped to 1 (with SEQ_CST), that prevents PyErr_CheckSignals from setting is_tripped to zero (with SEQ_CST) *and not* handling the signal. If trip_signal() has not yet finished, and therefore not set is_tripped to 1 yet, it will cause the next call to PyErr_CheckSignals to catch the signal.

(2) Again, if trip_signal() and PyErr_CheckSignals() execute concurrently, it might happen that PyErr_CheckSignals() handles the signal *before* trip_signal sets is_tripped to 1. That would cause the next call to PyErr_CheckSignals() to think there's an unhandled signal, but will most likely not find one, because it was already handled on the previous call. But that just effectively means that nothing is done. In fact, there's a comment in the code that mentions this.

(3, 4, ...) Of course there's more to take care of there, but that's unrelated to my PyErr_PROBE_SIGNALS() proposal. Anyway, at least (1) and (2) seem to already have been taken care of, and I assume you are aware of that.

––Koos

+ Koos Zevenhoven + http://twitter.com/k7hoven +