[Python-Dev] threads duplicated on fork() prevent child from terminating properly

L.C. (Laurentiu C. Badea) L.C. \(Laurentiu C. Badea\)" <python-dev-20030629@wotevah.com
Sun, 29 Jun 2003 14:12:19 -0700


Hello,

I am reposting this here as c.l.py didn't produce too many responses...
If anybody has any pointers on how to fix (or where to look), I would appreciate
it. I've tried this on 2.1 (RedHat 7.3), 2.2.1 and 2.2.3 (RedHat 9).

The simple program below launches a thread, then fork()s and attempts to clean
the mess up on exit like the nice process it is. Only problem is, the child does
not want to die, looks like it detached and is sitting on schedule_timeout()
(according to ps).

This is, I suspect, because the thread and its state are duplicated on fork()
but the state structure is not updated to indicate that the thread is in fact
NOT active so the child is fooled into waiting for its termination, which will
never happen. If the parent terminates without executing waitpid(), the child
will hang there indefinitely wasting resources.

I can see the following ways to actually have it finish normally:

1) parent join()-s all the threads before fork(): not always possible.

2) parent kill()-s the child: not nice.

3) child does os._exit(): IMHO the best option but os._exit() is not described
as an elegant exit method. Using this also means that if all exceptions are not
handled in the thread, we risk the same problem.

I guess I have answered 90% of my question (which I have not stated) already. 
At any rate, it seems to me that this matter should be handled internally in
Python's core. Regardless of whether os._exit() is mandated for use in child
processes or not instead of sys.exit() or other exception handlers, I find it
somewhat illogical to have the child process wait for a thread that is not
really there. I would probably expect that os.fork() would clean up the thread
list in the child unless we know (Solaris) that the threads are really there.

import os, sys
from threading import *

def log( message ):
    print ( "%d %s" % (os.getpid(), message) )


if __name__ == "__main__":
    log( "is parent" )
    t = Thread()
    t.start()
    # t.join() # OPTION 1

    childPID = os.fork()
    if childPID == 0:
        log( "is child" )

    log( "threads: %s" % str(enumerate()) )

    if childPID == 0:
        #os._exit(0) # OPTION 2
        pass

    else:
        t.join()
        #os.kill( childPID, 15 ) # OPTION 3
        log( "parent done waiting for %s, now waiting for %d" %
             (t.getName(), childPID) )
        os.waitpid(childPID, 0)
        
    log( "exiting" )

Thank you,
-- 
L.C. (Laurentiu Badea)