[Python-ideas] Membership of infinite iterators

Thu Nov 2 07:27:17 EDT 2017

On Thu, Nov 2, 2017 at 3:39 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:

> On 2 November 2017 at 04:29, Koos Zevenhoven <k7hoven at gmail.com> wrote:
>
>> SPOILER ALERT! At the moment, Nick's statement is in fact **always**
>> true in **all** cases (at least when ignoring untypical cases and some
>> inaccuracies in phrasing). Another question is whether the statement
>> **should** be true at all.
>> 
>>
>> PyErr_CheckSignals(), the function that checks for pending signals, now
>> **implicitly** uses the strictest possible memory-order requirement
>> (SEQ_CST) for checking the `is_tripped` flag, a value which can be used to
>> peek whether there are any pending signals. This means that two threads
>> that call PyErr_CheckSignals can't even **check** the value of that flag
>> at the same time, and they'll have to wait for each other and whatever the
>> CPU hardware implementation does for cache synchronzation.
>>
>
> Nice, that's deeper than I went - I just assumed there was an
> interlocked_increment in there somewhere, without checking whether or not
> there were any optimised code paths that managed to avoid that call :)
>
>
>> From a correctness point of view, that is absolutely great: if
>> PyErr_CheckSignals() is called, it is guaranteed to notice a new signal
>> regardles of how small the number of picoseconds after the `is_tripped`
>> flag has been set.  But is that really important? Do we really need things
>> to be slowed down by this? And do we want people to "optimize" code by
>> avoiding signal checking?
>>
>
> It isn't signal handling latency that's the concern here, it's potentially
> missing signals. Consider the case where we have 3 threads: A, B, C. A is
> the main thread that will actually handle signals, B and C both happened to
> receive them. We'll also assume  the two threads receive *different*
> signals (otherwise they'll potentially get collapsed into one notification
> regardless).
>
> The scenario we want to avoid is having one or both of the signals set,
> but is_tripped cleared. With an interlocked check (where both 0->1 and 1->0
> are memory barriers), that clearly can't happen. I suspect you're right
> that this could also be achieved with a less strict memory sync requirement
> on the 0->1 check, but that's much harder to reason about than simply
> limiting the number of iterations we make through an iterator consumption
> loop before checking for signals.
>

Missing signals is a concern, and there's another concern that one
received signal might be interpreted as two separate occurrences. But it
doesn't prevent us from *checking* is_tripped in a more relaxed manner, as
long as the 0->1 check is *verified* using something like SEQ_CST and the
associated signal handling is synchronized properly.

And that's why my PyErr_PROBE_SIGNALS below should work, and wouldn't
prevent multiple threads from handling signals either.

>
>
>> The signals won't be caught until checked anyway, and as we've seen, one
>> solution is to use a counter to determine if enough time has passed that
>> another check for potential signals should happen. That does, however,
>> raise the question of who should own the counter, and if it's OK for
>> multiple threads to use the same counter. If yes, then would we be able to
>> avoid slow atomic decrements (or increments)?
>>
>> But another solution might be to make a less strict but almost equivalent
>> functionality with much less overhead. Something like this in a header file:
>>
>> static inline int PyErr_PROBE_SIGNALS() {
>>     static int volatile *flag = (int volatile *) &is_tripped;
>>     if (*flag) {
>>         return PyErr_CheckSignals();
>>     }
>>     else {
>>         return 0;
>>     }
>> }
>>
>> Here, the flag is casted to volatile to force a check to happen each
>> time from memory/cache. However, to make it more standard and to make sure
>> it works with all compilers/platforms, it might be better to, instead of
>> *flag, use an atomic load with "MEMORY_ORDER_RELAXED". Another thing is
>> that `is_tripped`, which is currently declared as static inside
>> signalmodule.c [4], should somehow be exposed in the API to make it
>> available for the inline function in headers.
>>
>> This solution might be fast enough for a lot of cases, although still
>> slightly slower than the counter approach, at least if the counter approach
>> would completely avoid per-iteration memory access by requiring each
>> function to own the counter that is used for this.
>>
>
> One thing I like about a nested inner loop is that it's easy to understand
> the rationale for it *without* knowing any of the details of how memory
> synchronisation works, as it's just:
>
> - we want to check for signals regularly so the latency in interrupt
> handling isn't too high
> - PyErr_CheckSignals() is too expensive to call on every iteration
> - so we only check every N iterations
>
>
My initial "negative" reaction to that potential solution ("I don't believe
nesting loops is strictly a requirement." –K) was because going for
manually splitting the loops into smaller inner loops sounds like preparing
a nightmare for all C programmers. And I don't think it's possible to write
a preprocessor macro that does that for you. Maybe things like Cython could
in principle do this automatically, though.

Using a counter can have a very small overhead, because things can happen
completely in processor registers.

> Memory synchronisation then only comes up if someone asks why
> "PyErr_CheckSignals" is expensive to call. And while it wouldn't surprise
> at all if you're right and there are ways to make that call cheaper,
> they're still never going to be cheaper than explicitly reducing the
> frequency with which it is called.
>
>
There's a limit to how cheap the call to PyErr_CheckSignals() can be. As I
mentioned earlier, even just the fact that it's a C function call can be
too much.

That's why, in the above, I used a new name PyErr_PROBE_SIGNALS() instead
of optimizing the existing PyErr_CheckSignals() –– turning
PyErr_CheckSignals() into a static inline function would change the ABI. I
also didn't call it PyErr_CHECK_SIGNALS() because the functionality is not
strictly equivalent to the existing function.

––Koos

-- 
+ Koos Zevenhoven + http://twitter.com/k7hoven +
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20171102/a816849d/attachment-0001.html>