[Python-bugs-list] [ python-Bugs-662787 ] test_signal hang on some Solaris boxes

SourceForge.net noreply@sourceforge.net
Mon, 20 Jan 2003 14:51:00 -0800


Bugs item #662787, was opened at 2003-01-05 20:59
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=662787&group_id=5470

Category: Python Interpreter Core
Group: Python 2.3
Status: Open
Resolution: None
Priority: 6
Submitted By: Neal Norwitz (nnorwitz)
>Assigned to: Martin v. Löwis (loewis)
Summary: test_signal hang on some Solaris boxes

Initial Comment:
Martin, I'm assigning this to you because you checked
in the patch which caused this problem.  I think your
input on fixes will also be valuable.

When semaphore support was added to
Python/thread_pthread.h in 2.39 originally from patch
525532, it broke tests on some Solaris boxes.  I know
this affects Solaris 8, not sure if any other versions
are affected.  I believe on or more of the following
Solaris 8 patches (108528, 108827) fixes the problem:

http://sunsolve.sun.com/pub-cgi/retrieve.pl?doc=fpatches%2F108528&zone_32=signal+%20hang%20%22Sol
aris%208%22&wholewords=on

http://sunsolve.sun.com/pub-cgi/retrieve.pl?doc=fpatches%2F108827&zone_32=signal+%20hang%20%22Sol
aris%208%22&wholewords=on

Patches can be gotten from here:

 
http://sunsolve.sun.com/pub-cgi/show.pl?target=patches/patch-access
One way to fix the hang is to add #undef USE_SEMAPHORES
at line 113 of Python/thread_pthread.h (ie, after
USE_SEMAPHORES may be set).

I don't know of any other way to fix this problem.  I
don't know if we can test for this in configure and set
USE_SEMAPHORES appropriately (or if it's worth it).  We
can always disable USE_SEMAPHORES and allow the user to
use it by manually setting the macro.  We keep the code
as is, and document the problem.

Suggestions?

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2003-01-20 23:28

Message:
Logged In: YES 
user_id=33168

Martin, how do you want to proceed with this problem?

----------------------------------------------------------------------

Comment By: Inyeol Lee (inyeol)
Date: 2003-01-10 19:15

Message:
Logged In: YES 
user_id=595280

I patched my Solaris8 box and rebuilt 2.3a1 again.
'make test' still fails at test_signal. Martin's test
code also fails.

Installed patches are;
Patch:
108528-07                                                                          
Patch:
108528-13                                                                          
Patch:
108528-18                                                                          
Patch:
108827-07                                                                          
Patch:
108827-15                                                                          
Patch: 108827-35

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2003-01-08 18:21

Message:
Logged In: YES 
user_id=33168

Sorry, that's really what I meant, only for Solaris 8.  I'm
not sure how to do that (figure out that we are on Solaris
8).  I agree it would be nice to know the answer and suspect
you are right that it's fixed.  I don't know how to do that
though.  Hmmm, I think Anthony Baxter may have had Solaris.
 I'll assign this to him, in the hopes he can provide some
more info.  Anythony?

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2003-01-08 18:15

Message:
Logged In: YES 
user_id=21627

I quite object #2, although I could live with

5) disable USE_SEMAPHORES for Solaris 8

I would still like to find out whether applying all patches
solves the problem. I'm quite certain that our code is
correct and that there is a bug in Solaris. I'm reasonably
certain that the bug has been fixed by now, so I would not
want to leave USE_SEMAPHORES disabled forever on Solaris.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2003-01-08 18:11

Message:
Logged In: YES 
user_id=33168

Martin, how do you think we should proceed with this
problem?  Some alternatives include:
 1) try to build an autoconf test to find the problem and
disable USE_SEMAPHORES
 2) always disable USE_SEMAPHORES for Solaris
 3) try to find a work-around
 4) leave it as-is

At least temporarily, I'd like to see #2.  The problem is
that if we do that, it will probably never get fixed.  #1 is
probably a lot of work.  I can't implement #1 since I don't
have access to a machine that works.

Do you have any other ideas or possible solutions/work-arounds?

----------------------------------------------------------------------

Comment By: Inyeol Lee (inyeol)
Date: 2003-01-07 04:46

Message:
Logged In: YES 
user_id=595280

Neal,
This is the patch version of my Solaris8 system you've
requested;

Patch: 108528-07
Patch: 108528-13
Patch: 108827-07
Patch: 108827-15

Inyeol Lee

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2003-01-06 15:46

Message:
Logged In: YES 
user_id=21627

showrev -p is good, as is patchadd -p (this also shows the
patches you could backup to).

I notice that my Solaris 9 machine does not experience the
problem, so Sun has fixed something. It would be good if a
Solaris 8 machine could be brought up-to-date with regard to
patches (they recommend -18 and -35 respectively, at the
moment). On that machine, either those two patches
selectively, or an entire patch cluster (8_Recommended.zip)
should be installed.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2003-01-06 14:44

Message:
Logged In: YES 
user_id=33168

I thought I could not duplicate the hang on our sun, but now
it's happening.  Your test program also hangs.  

How do you determine the revision of a patch on solaris? 
I'm using showrev -p | grep patch-#.  I'm not sure that is
correct.  The way I read it, it says I have 108528-16 and
108827-12.

In the snake farm 
 proton has:  108528-12 and 108827-12
 fafner has:  108528-13 and 108827-19

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2003-01-06 13:29

Message:
Logged In: YES 
user_id=21627

It would be good to find a system that has 108528-17
installed. I see that this fixes

4498831 system timer stops sending signals

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2003-01-06 13:13

Message:
Logged In: YES 
user_id=21627

I meant revisions of the Sun patches. On one system, I have
108528-16 and 108827-34, and test_signal still hangs.

I found that the blocking occurs when test_queue.py is run
before, and have distilled this into the following example:

import sys
import thread
import time
import signal

fsema = thread.allocate_lock()

def tfunc():
    time.sleep(.1)
    fsema.release()

fsema.acquire()
thread.start_new_thread(tfunc,())
fsema.acquire()
fsema.release()

signal.alarm(3)
signal.pause()

It appears that the alarm is simply lost; the pause call
does not return.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2003-01-06 00:38

Message:
Logged In: YES 
user_id=33168

What do you mean by revision of the patches?  CVS revision
of thread_pthread.h?  I believe 2.38 worked and 2.39 broke.
 I can test that if you'd like.  Do you want me to go back
to the revision before the patch for all files affected? 
Since 2.39 only added the semaphores, in essence, by doing
the #undef that should have the same effect as reverting
thread_pthread.h to 2.38.

Or are you talking about the Solaris patches?  If so, they
are only a guess, we could compare the patch level on all
the Solaris boxes we have access to and see which ones work
and which don't.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2003-01-06 00:11

Message:
Logged In: YES 
user_id=21627

Could you identify a revision of these patches for which the
problem disappears?

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2003-01-05 21:57

Message:
Logged In: YES 
user_id=33168

For (possibly) more info, see:
http://lists.lysator.liu.se/pipermail/snake-farm/2003-January/000617.html

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2003-01-05 21:38

Message:
Logged In: YES 
user_id=33168

Attaching output of truss (like strace for Solaris) which
doesn't provide me with any more info.  Hopefully this will
help someone else.  2 files attached, one with complete
output from;

   truss ./python -E -tt ./Lib/test/regrtest.py test_queue
test_signal

the other is just the end.  It should correspond to these
lines from the test:

signal.alarm(20)                        # Entire test lasts
at most 20 sec.
signal.signal(5, handlerA)
signal.signal(2, handlerB)
signal.signal(3, signal.SIG_IGN)
signal.signal(signal.SIGALRM, signal.default_int_handler)


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=662787&group_id=5470