SPOILER ALERT! At the moment, Nick's statement is in fact *always* true in *all* cases (at least when ignoring untypical cases and some inaccuracies in phrasing). Another question is whether the statement *should* be true at all.PyErr_CheckSignals(), the function that checks for pending signals, now *implicitly* uses the strictest possible memory-order requirement (SEQ_CST) for checking the `is_tripped` flag, a value which can be used to peek whether there are any pending signals. This means that two threads that call PyErr_CheckSignals can't even *check* the value of that flag at the same time, and they'll have to wait for each other and whatever the CPU hardware implementation does for cache synchronzation.
From a correctness point of view, that is absolutely great: if PyErr_CheckSignals() is called, it is guaranteed to notice a new signal regardles of how small the number of picoseconds after the `is_tripped` flag has been set. But is that really important? Do we really need things to be slowed down by this? And do we want people to "optimize" code by avoiding signal checking?
The signals won't be caught until checked anyway, and as we've seen, one solution is to use a counter to determine if enough time has passed that another check for potential signals should happen. That does, however, raise the question of who should own the counter, and if it's OK for multiple threads to use the same counter. If yes, then would we be able to avoid slow atomic decrements (or increments)?But another solution might be to make a less strict but almost equivalent functionality with much less overhead. Something like this in a header file:static inline int PyErr_PROBE_SIGNALS() {static int volatile *flag = (int volatile *) &is_tripped;if (*flag) {return PyErr_CheckSignals();}else {return 0;}}Here, the flag is casted to volatile to force a check to happen each time from memory/cache. However, to make it more standard and to make sure it works with all compilers/platforms, it might be better to, instead of *flag, use an atomic load with "MEMORY_ORDER_RELAXED". Another thing is that `is_tripped`, which is currently declared as static inside signalmodule.c [4], should somehow be exposed in the API to make it available for the inline function in headers.This solution might be fast enough for a lot of cases, although still slightly slower than the counter approach, at least if the counter approach would completely avoid per-iteration memory access by requiring each function to own the counter that is used for this.