[Python-bugs-list] [ python-Bugs-662787 ] test_signal hang on some Solaris boxes

SourceForge.net noreply@sourceforge.net
Mon, 06 Jan 2003 05:44:06 -0800


Bugs item #662787, was opened at 2003-01-05 14:59
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=662787&group_id=5470

Category: Python Interpreter Core
Group: Python 2.3
Status: Open
Resolution: None
Priority: 6
Submitted By: Neal Norwitz (nnorwitz)
Assigned to: Martin v. Löwis (loewis)
Summary: test_signal hang on some Solaris boxes

Initial Comment:
Martin, I'm assigning this to you because you checked
in the patch which caused this problem.  I think your
input on fixes will also be valuable.

When semaphore support was added to
Python/thread_pthread.h in 2.39 originally from patch
525532, it broke tests on some Solaris boxes.  I know
this affects Solaris 8, not sure if any other versions
are affected.  I believe on or more of the following
Solaris 8 patches (108528, 108827) fixes the problem:

http://sunsolve.sun.com/pub-cgi/retrieve.pl?doc=fpatches%2F108528&zone_32=signal+%20hang%20%22Sol
aris%208%22&wholewords=on

http://sunsolve.sun.com/pub-cgi/retrieve.pl?doc=fpatches%2F108827&zone_32=signal+%20hang%20%22Sol
aris%208%22&wholewords=on

Patches can be gotten from here:

 
http://sunsolve.sun.com/pub-cgi/show.pl?target=patches/patch-access
One way to fix the hang is to add #undef USE_SEMAPHORES
at line 113 of Python/thread_pthread.h (ie, after
USE_SEMAPHORES may be set).

I don't know of any other way to fix this problem.  I
don't know if we can test for this in configure and set
USE_SEMAPHORES appropriately (or if it's worth it).  We
can always disable USE_SEMAPHORES and allow the user to
use it by manually setting the macro.  We keep the code
as is, and document the problem.

Suggestions?

----------------------------------------------------------------------

>Comment By: Neal Norwitz (nnorwitz)
Date: 2003-01-06 08:44

Message:
Logged In: YES 
user_id=33168

I thought I could not duplicate the hang on our sun, but now
it's happening.  Your test program also hangs.  

How do you determine the revision of a patch on solaris? 
I'm using showrev -p | grep patch-#.  I'm not sure that is
correct.  The way I read it, it says I have 108528-16 and
108827-12.

In the snake farm 
 proton has:  108528-12 and 108827-12
 fafner has:  108528-13 and 108827-19

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2003-01-06 07:29

Message:
Logged In: YES 
user_id=21627

It would be good to find a system that has 108528-17
installed. I see that this fixes

4498831 system timer stops sending signals

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2003-01-06 07:13

Message:
Logged In: YES 
user_id=21627

I meant revisions of the Sun patches. On one system, I have
108528-16 and 108827-34, and test_signal still hangs.

I found that the blocking occurs when test_queue.py is run
before, and have distilled this into the following example:

import sys
import thread
import time
import signal

fsema = thread.allocate_lock()

def tfunc():
    time.sleep(.1)
    fsema.release()

fsema.acquire()
thread.start_new_thread(tfunc,())
fsema.acquire()
fsema.release()

signal.alarm(3)
signal.pause()

It appears that the alarm is simply lost; the pause call
does not return.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2003-01-05 18:38

Message:
Logged In: YES 
user_id=33168

What do you mean by revision of the patches?  CVS revision
of thread_pthread.h?  I believe 2.38 worked and 2.39 broke.
 I can test that if you'd like.  Do you want me to go back
to the revision before the patch for all files affected? 
Since 2.39 only added the semaphores, in essence, by doing
the #undef that should have the same effect as reverting
thread_pthread.h to 2.38.

Or are you talking about the Solaris patches?  If so, they
are only a guess, we could compare the patch level on all
the Solaris boxes we have access to and see which ones work
and which don't.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2003-01-05 18:11

Message:
Logged In: YES 
user_id=21627

Could you identify a revision of these patches for which the
problem disappears?

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2003-01-05 15:57

Message:
Logged In: YES 
user_id=33168

For (possibly) more info, see:
http://lists.lysator.liu.se/pipermail/snake-farm/2003-January/000617.html

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2003-01-05 15:38

Message:
Logged In: YES 
user_id=33168

Attaching output of truss (like strace for Solaris) which
doesn't provide me with any more info.  Hopefully this will
help someone else.  2 files attached, one with complete
output from;

   truss ./python -E -tt ./Lib/test/regrtest.py test_queue
test_signal

the other is just the end.  It should correspond to these
lines from the test:

signal.alarm(20)                        # Entire test lasts
at most 20 sec.
signal.signal(5, handlerA)
signal.signal(2, handlerB)
signal.signal(3, signal.SIG_IGN)
signal.signal(signal.SIGALRM, signal.default_int_handler)


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=662787&group_id=5470