
On Thu, Jul 27, 2000 at 01:17:03AM -0400, Tim Peters wrote:
In parallel, could someone summarize the symptoms? Exactly how does it fail? Does it fail the same way across all platforms on which it does fail? Does it fail every time on all platforms on which it fails, or fail only some of the time on all platforms on which it fails, or fails some of the time on some of the platforms on which it fails but fails all of the time on the rest of the platforms on which it fails <wink>?
If there exists a platform on which it fails but it doesn't fail every time on that platform, that would be strong evidence of a timing hole. Those usually require <gasp!> thought to identify and repair. I'll voluteer to eyeball the code and do some thinking, but not unless the symptoms suggest that's worthwhile.
ignorantly y'rs - tim
Here is what I have found: Machine: [trentm@molotok ~/main/contrib/python.build/dist/src]$ cat /proc/version Linux version 2.2.12-20smp (root@porky.devel.redhat.com) (gcc version egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)) #1 SMP Mon Sep 27 10:34:45 EDT 1999 Note that this is an SMP machine. General symptoms: test_fork1 did *not* fail for me all the time. In fact it seemed, in this run of testing, to pass fine a number of time in a row and then some magical switch flipped and now it fails every time. I don't know what the 'switch' case is, nor do I know how to flip it on and off. This failing everytime is modulo debugging print statements that I have put in test_fork1.py. This indicates that it is a timing issue. Instumented test_fork1.py: -------------------------------------------------------------------------- import os, sys, time, thread try: os.fork except AttributeError: raise ImportError, "os.fork not defined -- skipping test_fork1" LONGSLEEP = 2 SHORTSLEEP = 0.5 NUM_THREADS = 4 alive = {} stop = 0 def f(id): while not stop: alive[id] = os.getpid() print 'thread %s: pid=%s' % (str(id), str(alive[id])) try: time.sleep(SHORTSLEEP) except IOError: pass def main(): print 'start main' for i in range(NUM_THREADS): thread.start_new(f, (i,)) print 'before sleep' time.sleep(LONGSLEEP) print 'after sleep (threads should be started now)' a = alive.keys() a.sort() assert a == range(NUM_THREADS) prefork_lives = alive.copy() print 'before fork' cpid = os.fork() print 'after fork' if cpid == 0: print 'child: start' # Child time.sleep(LONGSLEEP) n = 0 for key in alive.keys(): if alive[key] != prefork_lives[key]: n = n+1 print 'child: done, exit_value=%d' % n os._exit(n) else: print 'parent: start' # Parent spid, status = os.waitpid(cpid, 0) print 'parent: done waiting for child(pid=%d,status=%d)' %\ (spid, status) assert spid == cpid assert status == 0, "cause = %d, exit = %d" % (status&0xff, status>>8) global stop # Tell threads to die print 'parent: tell threads to die' stop = 1 time.sleep(2*SHORTSLEEP) # Wait for threads to die print 'parent: done (expect threads to be dead by now, hack)' main() -------------------------------------------------------------------------- A couple of test runs: *** This test run passed: [trentm@molotok ~/main/contrib/python.build/dist/src]$ ./python Lib/test/test_fork1.py start main before sleep thread 0: pid=26416 thread 1: pid=26417 thread 2: pid=26418 thread 3: pid=26419 thread 0: pid=26416 thread 1: pid=26417 thread 2: pid=26418 thread 3: pid=26419 thread 0: pid=26416 thread 2: pid=26418 thread 1: pid=26417 thread 3: pid=26419 thread 0: pid=26416 thread 2: pid=26418 thread 3: pid=26419 thread 1: pid=26417 thread 0: pid=26416 after sleep (threads should be started now) before fork after fork thread 2: pid=26418 thread 1: pid=26417 thread 3: pid=26419 parent: start after fork child: start thread 0: pid=26416 thread 2: pid=26418 thread 1: pid=26417 thread 3: pid=26419 thread 0: pid=26416 thread 2: pid=26418 thread 1: pid=26417 thread 3: pid=26419 thread 0: pid=26416 thread 2: pid=26418 thread 1: pid=26417 thread 3: pid=26419 thread 0: pid=26416 thread 2: pid=26418 child: done, exit_value=0 parent: done waiting for child(pid=26420,status=0) parent: tell threads to die parent: done (expect threads to be dead by now, hack) *** This test run seg faulted but completed: [trentm@molotok ~/main/contrib/python.build/dist/src]$ ./python Lib/test/test_fork1.py start main before sleep thread 0: pid=26546 thread 1: pid=26547 thread 2: pid=26548 thread 3: pid=26549 thread 1: pid=26547 thread 3: pid=26549 thread 2: pid=26548 thread 0: pid=26546 thread 2: pid=26548 thread 0: pid=26546 thread 1: pid=26547 thread 3: pid=26549 thread 3: pid=26549 thread 1: pid=26547 thread 2: pid=26548 thread 0: pid=26546 after sleep (threads should be started now) before fork after fork parent: start after fork child: start Segmentation fault (core dumped) [trentm@molotok ~/main/contrib/python.build/dist/src]$ child: done, exit_value=0 [trentm@molotok ~/main/contrib/python.build/dist/src]$ *** This test hung on the last statement: [trentm@molotok ~/main/contrib/python.build/dist/src]$ ./python Lib/test/test_fork1.py start main before sleep thread 0: pid=26753 thread 1: pid=26754 thread 2: pid=26755 thread 3: pid=26756 thread 2: pid=26755 thread 3: pid=26756 thread 0: pid=26753 thread 1: pid=26754 thread 0: pid=26753 thread 2: pid=26755 thread 3: pid=26756 thread 1: pid=26754 thread 0: pid=26753 thread 3: pid=26756 thread 2: pid=26755 thread 1: pid=26754 after sleep (threads should be started now) before fork thread 0: pid=26753 after fork thread 2: pid=26755 parent: start thread 3: pid=26756 thread 1: pid=26754 after fork child: start thread 0: pid=26753 thread 3: pid=26756 thread 1: pid=26754 thread 2: pid=26755 thread 0: pid=26753 thread 3: pid=26756 thread 1: pid=26754 thread 2: pid=26755 thread 0: pid=26753 thread 3: pid=26756 thread 1: pid=26754 thread 2: pid=26755 thread 0: pid=26753 child: done, exit_value=0 parent: done waiting for child(pid=26757,status=0) Those are the only three run cases that I get. Trent -- Trent Mick TrentM@ActiveState.com