threads duplicated on fork() prevent child from terminating

L.C. deja-google at wotevah.com
Thu Jun 19 19:48:12 EDT 2003


The simple program below launches a thread, then forks() and attempts to clean
the mess up on exit like the nice process it is. Only problem is, the child does
not want to die, looks like it detached and is sitting on schedule_timeout()
(according to ps).

This is, I suspect, because the thread and its state are duplicated on fork()
but the state structure is not updated to indicate that the thread is in fact
NOT active so the child is fooled into waiting for its termination, which will
never happen. If the parent terminates without executing waitpid(), the child
will hang there indefinitely wasting resources.

I can see the following ways to actually have it finish normally:

1) parent join()-s all the threads before fork(): not always possible.

2) parent kill()-s the child: not nice.

3) child does os._exit(): IMHO the best option but os._exit() is not described
as an elegant exit method. Using this also means that if all exceptions are not
handled in the thread, we risk the same problem.

I guess I have answered 90% of my question (which I have not stated) already. 
At any rate, it seems to me that this matter should be handled internally in
Python's core. Regardless of whether os._exit() is mandated for use in child
processes or not instead of sys.exit() or other exception handlers, I find it
somewhat illogical to have the child process wait for a thread that is not
really there. I would probably expect that os.fork() would clean up the thread
list in the child unless we know (Solaris) that the threads are really there.

import os, sys
from threading import *

def log( message ):
    print ( "%d %s" % (os.getpid(), message) )


if __name__ == "__main__":
    log( "is parent" )
    t = Thread()
    t.start()
    # t.join() # OPTION 1

    childPID = os.fork()
    if childPID == 0:
        log( "is child" )

    log( "threads: %s" % str(enumerate()) )

    if childPID == 0:
        #os._exit(0) # OPTION 2
        pass

    else:
        t.join()
        #os.kill( childPID, 15 ) # OPTION 3
        log( "parent done waiting for %s, now waiting for %d" %
             (t.getName(), childPID) )
        os.waitpid(childPID, 0)
        
    log( "exiting" )




More information about the Python-list mailing list