[ python-Bugs-756924 ] SIGSEGV causes hung threads (Linux)
SourceForge.net
noreply at sourceforge.net
Fri May 7 09:06:47 EDT 2004
Bugs item #756924, was opened at 2003-06-19 09:28
Message generated for change (Comment added) made by anthonybaxter
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=756924&group_id=5470
Category: Threads
Group: Python 2.2.2
Status: Open
Resolution: None
Priority: 7
Submitted By: Greg Jones (morngnstar)
Assigned to: Nobody/Anonymous (nobody)
Summary: SIGSEGV causes hung threads (Linux)
Initial Comment:
When a segmentation fault happens on Linux in any
thread but the main thread, the program exits, but
zombie threads remain behind.
Steps to reproduce:
1. Download attached tar and extract files zombie.py
and zombieCmodule.c.
2. Compile and link zombieCmodule.c as a shared library
(or whatever other method you prefer for making a
Python extension module).
3. Put the output from step 2 (zombieC.so) in your
lib/python directory.
4. Run python2.2 zombie.py.
5. After the program exits, run ps.
zombie.py launches several threads that just loop
forever, and one that calls a C function in zombieC. The
latter prints "NULL!" then segfaults intentionally,
printing "Segmentation fault". Then the program exits,
returning control back to the shell.
Expected, and Python 2.1 behavior:
No Python threads appear in the output of ps.
Actual Python 2.2 behavior:
5 Python threads appear in the output of ps. To kill
them, you have to apply kill -9 to each one individually.
Not only does this bug leave around messy zombie
threads, but the threads left behind hold on to program
resources. For example, if the program binds a socket,
that port cannot be bound again until someone kills the
threads. Of course programs should not generate
segfaults, but if they do they should fail gracefully.
I have identified the cause of this bug. The old Python
2.1 behavior can be restored by removing these lines of
Python/thread_pthread.h:
sigfillset(&newmask);
SET_THREAD_SIGMASK(SIG_BLOCK, &newmask,
&oldmask);
... and ...
SET_THREAD_SIGMASK(SIG_SETMASK, &oldmask, NULL);
I guess even SIGSEGV gets blocked by this code, and
somehow that prevents the default behavior of segfaults
from working correctly.
I'm not suggesting that removing this code is a good
way to fix this bug. This is just an example to show that
it seems to be the blocking of signals that causes this
bug.
----------------------------------------------------------------------
>Comment By: Anthony Baxter (anthonybaxter)
Date: 2004-05-07 23:06
Message:
Logged In: YES
user_id=29957
Any patches in this area, I'd prefer to see on the trunk,
along with tests to exercise it (and confirm that it's not
breaking something else). We can then give it a solid
testing during the 2.4 release cycle.
I don't want to have to stretch the bugfix release cycle out
to have alphas, betas and the like. This seems like huge
piles of no-fun.
----------------------------------------------------------------------
Comment By: Michael Hudson (mwh)
Date: 2004-05-07 22:56
Message:
Logged In: YES
user_id=6656
Note that there is an attempt at a configure test in 948614,
but it seems very LinuxThreads specific.
I agree with Anthony that this area is very scary. The last
thing we want to do a fortnight before release is break
things somewhere they currently work.
On the gripping hand, when there's a modern, actually
working implementation of pthreads, I don't think we
actually need to block signals at all. I certainly don't
have the threads-fu to come up with appropriate
configure/pyport.h magic though. I'm not sure I have the
energy to test a patch on all the testdrive, snake farm and
SF compile farm machines either.
----------------------------------------------------------------------
Comment By: Anthony Baxter (anthonybaxter)
Date: 2004-05-07 22:39
Message:
Logged In: YES
user_id=29957
We're a week out from release-candidate, and this seems (to
me) to be an area that's fraught with risk. The terms
"HP/UX" and "threads" have also cropped up, which, for me,
is a marker of "here be sodding great big dragons".
I don't mind delaying the release if it's necessary, and
there's a definite path to getting a nice clean fix in that
won't break things for some other class of platform. This
stuff looks like being a beast to test for, though.
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2004-05-07 06:05
Message:
Logged In: YES
user_id=31435
Boosting priority, hoping to attract interest before 2.3.4.
Patch 949332 looks relevant.
----------------------------------------------------------------------
Comment By: Kjetil Jacobsen (kjetilja)
Date: 2004-05-05 18:28
Message:
Logged In: YES
user_id=5685
I've experienced similar behaviour with hung threads on
other platforms such as HP/UX, so we should consider letting
through some signals to all threads on all platforms.
For instance, very few apps use signal handlers for SIGILL,
SIGFPE, SIGSEGV, SIGBUS and SIGABRT, so unblocking those
signals should not cause much breakage compared to the
breakage caused by blocking all signals.
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2004-05-05 00:44
Message:
Logged In: YES
user_id=31435
Noting that this has become a semi-frequent topic on the
zope-dev mailing list, most recently in the "Segfault and
Deadlock" thread starting here:
<http://mail.zope.org/pipermail/zope-dev/2004-
May/022813.html>
----------------------------------------------------------------------
Comment By: Andrew Langmead (langmead)
Date: 2004-05-05 00:00
Message:
Logged In: YES
user_id=119306
The issue is that the threading implementation in Linux kernels
previous to 2.6 diverged from the pthreads standard for signal
handling. Normally signals are sent to the process and can be
handled by any thread. In the LinuxThreads implementation of
pthreads, signals are sent to a specific thread. If that thread
blocks signals (which is what happens to all threads spawned in
Python 2.2) then those signals do not get routed to a thread with
them unblocked (what Python calls the "main thread")
The new threading facility in Linux 2.6, the NPTL, does not have
this signal handling bug.
A simple python script that shows the problem is included below.
This will hang in Linux kernels before 2.6 or RedHat customized
kernels before RH9.
#!/usr/bin/python
import signal
import thread
import os
def handle_signals(sig, frame): pass
def send_signals(): os.kill(os.getpid(), signal.SIGSEGV)
signal.signal(signal.SIGSEGV, handle_signals)
thread.start_new_thread(send_signals, ())
signal.pause()
----------------------------------------------------------------------
Comment By: Greg Jones (morngnstar)
Date: 2003-06-19 09:54
Message:
Logged In: YES
user_id=554883
Related to Bug #756940.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=756924&group_id=5470
More information about the Python-bugs-list
mailing list