[ python-Bugs-756924 ] SIGSEGV causes hung threads (Linux)

SourceForge.net noreply at sourceforge.net
Fri May 7 09:06:47 EDT 2004


Bugs item #756924, was opened at 2003-06-19 09:28
Message generated for change (Comment added) made by anthonybaxter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=756924&group_id=5470

Category: Threads
Group: Python 2.2.2
Status: Open
Resolution: None
Priority: 7
Submitted By: Greg Jones (morngnstar)
Assigned to: Nobody/Anonymous (nobody)
Summary: SIGSEGV causes hung threads (Linux)

Initial Comment:
When a segmentation fault happens on Linux in any 
thread but the main thread, the program exits, but 
zombie threads remain behind.

Steps to reproduce:
1. Download attached tar and extract files zombie.py 
and zombieCmodule.c.
2. Compile and link zombieCmodule.c as a shared library 
(or whatever other method you prefer for making a 
Python extension module).
3. Put the output from step 2 (zombieC.so) in your 
lib/python directory.
4. Run python2.2 zombie.py.
5. After the program exits, run ps.

zombie.py launches several threads that just loop 
forever, and one that calls a C function in zombieC. The 
latter prints "NULL!" then segfaults intentionally, 
printing "Segmentation fault". Then the program exits, 
returning control back to the shell.

Expected, and Python 2.1 behavior:
No Python threads appear in the output of ps.

Actual Python 2.2 behavior:
5 Python threads appear in the output of ps. To kill 
them, you have to apply kill -9 to each one individually.

  Not only does this bug leave around messy zombie 
threads, but the threads left behind hold on to program 
resources. For example, if the program binds a socket, 
that port cannot be bound again until someone kills the 
threads. Of course programs should not generate 
segfaults, but if they do they should fail gracefully.

  I have identified the cause of this bug. The old Python 
2.1 behavior can be restored by removing these lines of 
Python/thread_pthread.h:

sigfillset(&newmask);
SET_THREAD_SIGMASK(SIG_BLOCK, &newmask, 
&oldmask);

... and ...

SET_THREAD_SIGMASK(SIG_SETMASK, &oldmask, NULL);

  I guess even SIGSEGV gets blocked by this code, and 
somehow that prevents the default behavior of segfaults 
from working correctly.

  I'm not suggesting that removing this code is a good 
way to fix this bug. This is just an example to show that 
it seems to be the blocking of signals that causes this 
bug.

----------------------------------------------------------------------

>Comment By: Anthony Baxter (anthonybaxter)
Date: 2004-05-07 23:06

Message:
Logged In: YES 
user_id=29957

Any patches in this area, I'd prefer to see on the trunk,
along with tests to exercise it (and confirm that it's not
breaking something else). We can then give it a solid
testing during the 2.4 release cycle. 

I don't want to have to stretch the bugfix release cycle out
to have alphas, betas and the like. This seems like huge
piles of no-fun.


----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2004-05-07 22:56

Message:
Logged In: YES 
user_id=6656

Note that there is an attempt at a configure test in 948614,
but it seems very LinuxThreads specific.

I agree with Anthony that this area is very scary.  The last
thing we want to do a fortnight before release is break
things somewhere they currently work.

On the gripping hand, when there's a modern, actually
working implementation of pthreads, I don't think we
actually need to block signals at all.  I certainly don't
have the threads-fu to come up with appropriate
configure/pyport.h magic though.  I'm not sure I have the
energy to test a patch on all the testdrive, snake farm and
SF compile farm machines either.

----------------------------------------------------------------------

Comment By: Anthony Baxter (anthonybaxter)
Date: 2004-05-07 22:39

Message:
Logged In: YES 
user_id=29957

We're a week out from release-candidate, and this seems (to
me) to be an area that's fraught with risk. The terms
"HP/UX" and "threads" have also cropped up, which, for me,
is a marker of "here be sodding great big dragons". 

I don't mind delaying the release if it's necessary, and
there's a definite path to getting a nice clean fix in that
won't break things for some other class of platform. This
stuff looks like being a beast to test for, though. 


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2004-05-07 06:05

Message:
Logged In: YES 
user_id=31435

Boosting priority, hoping to attract interest before 2.3.4.  
Patch 949332 looks relevant.

----------------------------------------------------------------------

Comment By: Kjetil Jacobsen (kjetilja)
Date: 2004-05-05 18:28

Message:
Logged In: YES 
user_id=5685

I've experienced similar behaviour with hung threads on
other platforms such as HP/UX, so we should consider letting
through some signals to all threads on all platforms.

For instance, very few apps use signal handlers for SIGILL,
SIGFPE, SIGSEGV, SIGBUS and SIGABRT, so unblocking those
signals should not cause much breakage compared to the
breakage caused by blocking all signals.


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2004-05-05 00:44

Message:
Logged In: YES 
user_id=31435

Noting that this has become a semi-frequent topic on the 
zope-dev mailing list, most recently in the "Segfault and 
Deadlock" thread starting here:

<http://mail.zope.org/pipermail/zope-dev/2004-
May/022813.html>

----------------------------------------------------------------------

Comment By: Andrew Langmead (langmead)
Date: 2004-05-05 00:00

Message:
Logged In: YES 
user_id=119306

The issue is that the threading implementation in Linux kernels 
previous to 2.6 diverged from the pthreads standard for signal 
handling. Normally signals are sent to the process and can be 
handled by any thread. In the LinuxThreads implementation of 
pthreads, signals are sent to a specific thread. If that thread 
blocks signals (which is what happens to all threads spawned in 
Python 2.2) then those signals do not get routed to a thread with 
them unblocked (what Python calls the "main thread")

The new threading facility in Linux 2.6, the NPTL, does not have 
this signal handling bug.

A simple python script that shows the problem is included below. 
This will hang in Linux kernels before 2.6 or RedHat customized 
kernels before RH9.

#!/usr/bin/python

import signal
import thread
import os

def handle_signals(sig, frame): pass
def send_signals(): os.kill(os.getpid(), signal.SIGSEGV)

signal.signal(signal.SIGSEGV, handle_signals)
thread.start_new_thread(send_signals, ())
signal.pause()


----------------------------------------------------------------------

Comment By: Greg Jones (morngnstar)
Date: 2003-06-19 09:54

Message:
Logged In: YES 
user_id=554883

Related to Bug #756940.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=756924&group_id=5470



More information about the Python-bugs-list mailing list