[Python-Dev] Signals, threads, blocking C functions

Tue Sep 5 21:44:50 CEST 2006

Johan Dahlin <jdahlin at async.com.br> wrote:
>
> Are you saying that we should let less commonly used platforms dictate
> features and functionality for the popular ones?
> I mean, who uses HP/UX, SCO and [insert your favorite flavor] as a modern
> desktop system where this particular bug makes a difference?

You haven't been following the thread.  As I posted, this problem
occurs to a greater or lesser degree on all platforms.  This will be
my last posting on the topic, but I shall try to explain.

The first problem is in the hardware and operating system.  A signal
interrupts the thread, and passes control to a handler with a very
partial environment and (usually) information on the environment
when it was interrupted.  If it interrupted the thread in the middle
of a system call or other library routine that uses non-Python
conventions, the registers and other state may be weird.  There ARE
solutions to this, but they are unbelievably foul, and even Linux
on x86 gas had trouble with this.  And, on return, everything has to
be reversed entirely transparently!

It is VERY common for there to be bugs in the C run-time system and
not rare for there to be ones in the kernel (that area of Linux has
been rewritten MANY times, for this reason).  In many cases, the
run-time system simply doesn't pretend to handle interrupts in
arbitrary code (which is where the C undefined behaviour is used by
vendors).

The second problem is that what you can do depends both on what you
were doing and how your 'primitive' is implemented.  For example, if
you call something that takes out even a very short term lock or uses
a spin loop to emulate an atomic operation, you had better not use it
if you interrupted code that was doing the same.  Your thread may
hang, crash or otherwise go bananas.  Can you guarantee that even
write is free of such things?  No, and certainly not if you are using
a debugger, a profiling library or even tracing system calls.  I have
often used programs that crashed as soon as I did one of those :-(

Related to this is that it is EXTREMELY hard to write synchronisation
primitives (mutexes etc.) that are interrupt-safe - MUCH harder than
to write thread-safe ones - and few people are even aware of the
issues.  There was a thread on some Linux kernel mailing list about
this, and even the kernel developers were having headaches thinking
about the issues.

Even if write is atomic, there are gotchas.  What if the interrupted
code is doing something to that file at the time?  Are you SURE that
an unexpected operation on it (in the same thread) won't cause the
library function of program to get confused?  And can you be sure
that the write will terminate fast enough to not cause time-critical
code to fail?  And have you studied the exact semantics of blocking
on pipes?  They are truly horrible.

So this is NOT a matter of platform X is safe and platform Y isn't.
Even Linux x86 isn't entirely safe - or wasn't, the last time I heard.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  nmm1 at cam.ac.uk
Tel.:  +44 1223 334761    Fax:  +44 1223 334679