[ python-Bugs-756924 ] SIGSEGV causes hung threads (Linux)
SourceForge.net
noreply at sourceforge.net
Fri May 7 12:50:48 EDT 2004
Bugs item #756924, was opened at 2003-06-18 19:28
Message generated for change (Comment added) made by tim_one
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=756924&group_id=5470
Category: Threads
Group: Python 2.2.2
Status: Open
Resolution: None
Priority: 7
Submitted By: Greg Jones (morngnstar)
>Assigned to: Guido van Rossum (gvanrossum)
Summary: SIGSEGV causes hung threads (Linux)
Initial Comment:
When a segmentation fault happens on Linux in any
thread but the main thread, the program exits, but
zombie threads remain behind.
Steps to reproduce:
1. Download attached tar and extract files zombie.py
and zombieCmodule.c.
2. Compile and link zombieCmodule.c as a shared library
(or whatever other method you prefer for making a
Python extension module).
3. Put the output from step 2 (zombieC.so) in your
lib/python directory.
4. Run python2.2 zombie.py.
5. After the program exits, run ps.
zombie.py launches several threads that just loop
forever, and one that calls a C function in zombieC. The
latter prints "NULL!" then segfaults intentionally,
printing "Segmentation fault". Then the program exits,
returning control back to the shell.
Expected, and Python 2.1 behavior:
No Python threads appear in the output of ps.
Actual Python 2.2 behavior:
5 Python threads appear in the output of ps. To kill
them, you have to apply kill -9 to each one individually.
Not only does this bug leave around messy zombie
threads, but the threads left behind hold on to program
resources. For example, if the program binds a socket,
that port cannot be bound again until someone kills the
threads. Of course programs should not generate
segfaults, but if they do they should fail gracefully.
I have identified the cause of this bug. The old Python
2.1 behavior can be restored by removing these lines of
Python/thread_pthread.h:
sigfillset(&newmask);
SET_THREAD_SIGMASK(SIG_BLOCK, &newmask,
&oldmask);
... and ...
SET_THREAD_SIGMASK(SIG_SETMASK, &oldmask, NULL);
I guess even SIGSEGV gets blocked by this code, and
somehow that prevents the default behavior of segfaults
from working correctly.
I'm not suggesting that removing this code is a good
way to fix this bug. This is just an example to show that
it seems to be the blocking of signals that causes this
bug.
----------------------------------------------------------------------
>Comment By: Tim Peters (tim_one)
Date: 2004-05-07 12:50
Message:
Logged In: YES
user_id=31435
Assigned to Guido to get an answer to one of the questions
here: Guido, signal_handler() checks getpid() against
main_pid, and has ever since revision 2.3 (when you first
taught signalmodule.c about threads). But on every pthreads
box except for Linux, get_pid() should always equal main_pid
(even after a fork). What was the intent? I read the
comments the same as Andrew does here, that the intent
was to check thread identity, not process identity.
----------------------------------------------------------------------
Comment By: Andrew Langmead (langmead)
Date: 2004-05-07 09:59
Message:
Logged In: YES
user_id=119306
mwh wrote: "when there's a modern, actually working implementation of
pthreads, I don't think we actually need to block signals at all."
The bug report that caused the patch to be created was originally
reported on Solaris, which has a more correct pthreads implementation.
I'm now wondering if that problem was not caused by signals being
handled by the spawned threads, but rather that the signal handler does
a check for "if (getpid() == main_pid)" rather than
"(PyThread_get_thread_ident() == main_thread)". One a standard's
compliant pthreads implementation, and even on Solaris, getpid() will
always "==" "main_pid".
For the Linux case, we may have a more modern working threads
implementation now, but when the old LinuxThreads style behavior was
out and deployed for 8 years or so, it will probably be around for a
while.
----------------------------------------------------------------------
Comment By: Andrew Langmead (langmead)
Date: 2004-05-07 09:48
Message:
Logged In: YES
user_id=119306
There are two different thread related patches that I submitted,
I agree that
<http://sourceforge.net/tracker/?
func=detail&aid=948614&group_id=5470&atid=305470> is pretty radical.
(Its the one that tests at configure time for LinuxThreads peculiarities
and alters the thread spawning and signal related activities accordingly.)
A different related signal patch
<http://sourceforge.net/tracker/?
func=detail&aid=949332&group_id=5470&atid=305470> might be more
appealing to you. It only unblocks signals like segmentation faults that
creates synchronously sends to itself and that a pthreads implementation
will always send to the faulting thread. (whether it blocks it or not.)
----------------------------------------------------------------------
Comment By: Anthony Baxter (anthonybaxter)
Date: 2004-05-07 09:06
Message:
Logged In: YES
user_id=29957
Any patches in this area, I'd prefer to see on the trunk,
along with tests to exercise it (and confirm that it's not
breaking something else). We can then give it a solid
testing during the 2.4 release cycle.
I don't want to have to stretch the bugfix release cycle out
to have alphas, betas and the like. This seems like huge
piles of no-fun.
----------------------------------------------------------------------
Comment By: Michael Hudson (mwh)
Date: 2004-05-07 08:56
Message:
Logged In: YES
user_id=6656
Note that there is an attempt at a configure test in 948614,
but it seems very LinuxThreads specific.
I agree with Anthony that this area is very scary. The last
thing we want to do a fortnight before release is break
things somewhere they currently work.
On the gripping hand, when there's a modern, actually
working implementation of pthreads, I don't think we
actually need to block signals at all. I certainly don't
have the threads-fu to come up with appropriate
configure/pyport.h magic though. I'm not sure I have the
energy to test a patch on all the testdrive, snake farm and
SF compile farm machines either.
----------------------------------------------------------------------
Comment By: Anthony Baxter (anthonybaxter)
Date: 2004-05-07 08:39
Message:
Logged In: YES
user_id=29957
We're a week out from release-candidate, and this seems (to
me) to be an area that's fraught with risk. The terms
"HP/UX" and "threads" have also cropped up, which, for me,
is a marker of "here be sodding great big dragons".
I don't mind delaying the release if it's necessary, and
there's a definite path to getting a nice clean fix in that
won't break things for some other class of platform. This
stuff looks like being a beast to test for, though.
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2004-05-06 16:05
Message:
Logged In: YES
user_id=31435
Boosting priority, hoping to attract interest before 2.3.4.
Patch 949332 looks relevant.
----------------------------------------------------------------------
Comment By: Kjetil Jacobsen (kjetilja)
Date: 2004-05-05 04:28
Message:
Logged In: YES
user_id=5685
I've experienced similar behaviour with hung threads on
other platforms such as HP/UX, so we should consider letting
through some signals to all threads on all platforms.
For instance, very few apps use signal handlers for SIGILL,
SIGFPE, SIGSEGV, SIGBUS and SIGABRT, so unblocking those
signals should not cause much breakage compared to the
breakage caused by blocking all signals.
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2004-05-04 10:44
Message:
Logged In: YES
user_id=31435
Noting that this has become a semi-frequent topic on the
zope-dev mailing list, most recently in the "Segfault and
Deadlock" thread starting here:
<http://mail.zope.org/pipermail/zope-dev/2004-
May/022813.html>
----------------------------------------------------------------------
Comment By: Andrew Langmead (langmead)
Date: 2004-05-04 10:00
Message:
Logged In: YES
user_id=119306
The issue is that the threading implementation in Linux kernels
previous to 2.6 diverged from the pthreads standard for signal
handling. Normally signals are sent to the process and can be
handled by any thread. In the LinuxThreads implementation of
pthreads, signals are sent to a specific thread. If that thread
blocks signals (which is what happens to all threads spawned in
Python 2.2) then those signals do not get routed to a thread with
them unblocked (what Python calls the "main thread")
The new threading facility in Linux 2.6, the NPTL, does not have
this signal handling bug.
A simple python script that shows the problem is included below.
This will hang in Linux kernels before 2.6 or RedHat customized
kernels before RH9.
#!/usr/bin/python
import signal
import thread
import os
def handle_signals(sig, frame): pass
def send_signals(): os.kill(os.getpid(), signal.SIGSEGV)
signal.signal(signal.SIGSEGV, handle_signals)
thread.start_new_thread(send_signals, ())
signal.pause()
----------------------------------------------------------------------
Comment By: Greg Jones (morngnstar)
Date: 2003-06-18 19:54
Message:
Logged In: YES
user_id=554883
Related to Bug #756940.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=756924&group_id=5470
More information about the Python-bugs-list
mailing list