[Python-Dev] test_fork1 on SMP? (was Re: [Python Dev] test_fork1 failing --with-threads (for some people)...)

Trent Mick trentm@ActiveState.com
Thu, 27 Jul 2000 15:00:56 -0700


On Thu, Jul 27, 2000 at 01:17:03AM -0400, Tim Peters wrote:
> In parallel, could someone summarize the symptoms?  Exactly how does it
> fail?  Does it fail the same way across all platforms on which it does fail?
> Does it fail every time on all platforms on which it fails, or fail only
> some of the time on all platforms on which it fails, or fails some of the
> time on some of the platforms on which it fails but fails all of the time on
> the rest of the platforms on which it fails <wink>?
> 
> If there exists a platform on which it fails but it doesn't fail every time
> on that platform, that would be strong evidence of a timing hole.  Those
> usually require <gasp!> thought to identify and repair.  I'll voluteer to
> eyeball the code and do some thinking, but not unless the symptoms suggest
> that's worthwhile.
> 
> ignorantly y'rs  - tim

Here is what I have found:

Machine: 
[trentm@molotok ~/main/contrib/python.build/dist/src]$ cat /proc/version
Linux version 2.2.12-20smp (root@porky.devel.redhat.com) (gcc version egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)) #1 SMP Mon Sep 27 10:34:45 EDT 1999

Note that this is an SMP machine.


General symptoms:
test_fork1 did *not* fail for me all the time. In fact it seemed, in this run
of testing, to pass fine a number of time in a row and then some magical
switch flipped and now it fails every time. I don't know what the 'switch'
case is, nor do I know how to flip it on and off. This failing everytime is
modulo debugging print statements that I have put in test_fork1.py. This
indicates that it is a timing issue.



Instumented test_fork1.py:

--------------------------------------------------------------------------
import os, sys, time, thread

try:
    os.fork
except AttributeError:
    raise ImportError, "os.fork not defined -- skipping test_fork1"

LONGSLEEP = 2
SHORTSLEEP = 0.5
NUM_THREADS = 4
alive = {}
stop = 0

def f(id):
    while not stop:
        alive[id] = os.getpid()
        print 'thread %s: pid=%s' % (str(id), str(alive[id]))
        try:
            time.sleep(SHORTSLEEP)
        except IOError:
            pass

def main():
    print 'start main'
    for i in range(NUM_THREADS):
        thread.start_new(f, (i,))

    print 'before sleep'
    time.sleep(LONGSLEEP)
    print 'after sleep (threads should be started now)'

    a = alive.keys()
    a.sort()
    assert a == range(NUM_THREADS)

    prefork_lives = alive.copy()

    print 'before fork'
    cpid = os.fork()
    print 'after fork'

    if cpid == 0:
        print 'child: start'
        # Child
        time.sleep(LONGSLEEP)
        n = 0
        for key in alive.keys():
            if alive[key] != prefork_lives[key]:
                n = n+1
        print 'child: done, exit_value=%d' % n
        os._exit(n)
    else:
        print 'parent: start'
        # Parent
        spid, status = os.waitpid(cpid, 0)
        print 'parent: done waiting for child(pid=%d,status=%d)' %\
            (spid, status)
        assert spid == cpid
        assert status == 0, "cause = %d, exit = %d" % (status&0xff, status>>8)
        global stop
        # Tell threads to die
        print 'parent: tell threads to die' 
        stop = 1
        time.sleep(2*SHORTSLEEP) # Wait for threads to die
        print 'parent: done (expect threads to be dead by now, hack)'

main()
--------------------------------------------------------------------------


A couple of test runs:

*** This test run passed:
[trentm@molotok ~/main/contrib/python.build/dist/src]$ ./python Lib/test/test_fork1.py
start main
before sleep
thread 0: pid=26416
thread 1: pid=26417
thread 2: pid=26418
thread 3: pid=26419
thread 0: pid=26416
thread 1: pid=26417
thread 2: pid=26418
thread 3: pid=26419
thread 0: pid=26416
thread 2: pid=26418
thread 1: pid=26417
thread 3: pid=26419
thread 0: pid=26416
thread 2: pid=26418
thread 3: pid=26419
thread 1: pid=26417
thread 0: pid=26416
after sleep (threads should be started now)
before fork
after fork
thread 2: pid=26418
thread 1: pid=26417
thread 3: pid=26419
parent: start
after fork
child: start
thread 0: pid=26416
thread 2: pid=26418
thread 1: pid=26417
thread 3: pid=26419
thread 0: pid=26416
thread 2: pid=26418
thread 1: pid=26417
thread 3: pid=26419
thread 0: pid=26416
thread 2: pid=26418
thread 1: pid=26417
thread 3: pid=26419
thread 0: pid=26416
thread 2: pid=26418
child: done, exit_value=0
parent: done waiting for child(pid=26420,status=0)
parent: tell threads to die
parent: done (expect threads to be dead by now, hack)



*** This test run seg faulted but completed:
[trentm@molotok ~/main/contrib/python.build/dist/src]$ ./python Lib/test/test_fork1.py
start main
before sleep
thread 0: pid=26546
thread 1: pid=26547
thread 2: pid=26548
thread 3: pid=26549
thread 1: pid=26547
thread 3: pid=26549
thread 2: pid=26548
thread 0: pid=26546
thread 2: pid=26548
thread 0: pid=26546
thread 1: pid=26547
thread 3: pid=26549
thread 3: pid=26549
thread 1: pid=26547
thread 2: pid=26548
thread 0: pid=26546
after sleep (threads should be started now)
before fork
after fork
parent: start
after fork
child: start
Segmentation fault (core dumped)
[trentm@molotok ~/main/contrib/python.build/dist/src]$ child: done, exit_value=0

[trentm@molotok ~/main/contrib/python.build/dist/src]$



*** This test hung on the last statement:
[trentm@molotok ~/main/contrib/python.build/dist/src]$ ./python Lib/test/test_fork1.py
start main
before sleep
thread 0: pid=26753
thread 1: pid=26754
thread 2: pid=26755
thread 3: pid=26756
thread 2: pid=26755
thread 3: pid=26756
thread 0: pid=26753
thread 1: pid=26754
thread 0: pid=26753
thread 2: pid=26755
thread 3: pid=26756
thread 1: pid=26754
thread 0: pid=26753
thread 3: pid=26756
thread 2: pid=26755
thread 1: pid=26754
after sleep (threads should be started now)
before fork
thread 0: pid=26753
after fork
thread 2: pid=26755
parent: start
thread 3: pid=26756
thread 1: pid=26754
after fork
child: start
thread 0: pid=26753
thread 3: pid=26756
thread 1: pid=26754
thread 2: pid=26755
thread 0: pid=26753
thread 3: pid=26756
thread 1: pid=26754
thread 2: pid=26755
thread 0: pid=26753
thread 3: pid=26756
thread 1: pid=26754
thread 2: pid=26755
thread 0: pid=26753
child: done, exit_value=0
parent: done waiting for child(pid=26757,status=0)




Those are the only three run cases that I get.


Trent

-- 
Trent Mick
TrentM@ActiveState.com