os.wait() losing child?

Nick Craig-Wood nick at craig-wood.com
Wed Jul 11 16:30:03 EDT 2007


Jason Zheng <Xin.Zheng at jpl.nasa.gov> wrote:
>  greg wrote:
> > Jason Zheng wrote:
> >> Hate to reply to my own thread, but this is the working program that 
> >> can demonstrate what I posted earlier:
> > 
> > I've figured out what's going on. The Popen class has a
> > __del__ method which does a non-blocking wait of its own.
> > So you need to keep the Popen instance for each subprocess
> > alive until your wait call has cleaned it up.
> > 
> > The following version seems to work okay.
> > 
>  It still doesn't work on my machine. I took a closer look at the Popen 
>  class, and I think the problem is that the __init__ method always calls 
>  a method _cleanup, which polls every existing Popen instance. The poll 
>  method does a nonblocking wait.
> 
>  If one of my child process finishes as I create a new Popen instance, 
>  then the _cleanup method effectively de-zombifies the child process, so 
>  I can no longer expect to see the return of that pid on os.wait()
>  any more.

The problem you are having is you are letting Popen do half the job
and doing the other half yourself.

Here is a way which works, done completely with Popen.  Polling the
subprocesses is slightly less efficient than using os.wait() but does
work.  In practice you want to do this anyway to see if your children
exceed their time limits etc.

import os
import time
from subprocess import Popen

processes = []
counts = [0,0,0]

for i in xrange(3):
   p = Popen('sleep 1', shell=True, cwd='/home', stdout=file(os.devnull,'w'))
   processes.append(p)
   print "Starting child process %d (%d)" % (i, p.pid)

while (True):
   for i,p in enumerate(processes):
       exitstat = p.poll()
       pid = p.pid
       if exitstat is not None:
           break
   else:
       time.sleep(0.1)
       continue
   counts[i]=counts[i]+1

   #terminate if count>10
   if (counts[i]==10):
     print "Child Process %d terminated." % i
     if reduce(lambda x,y: x and (y>=10), counts):
       break
     continue

   print "Child Process %d terminated, restarting" % i
   processes[i] = Popen('sleep 1', shell=True, cwd='/home', stdout=file(os.devnull,'w'))



-- 
Nick Craig-Wood <nick at craig-wood.com> -- http://www.craig-wood.com/nick



More information about the Python-list mailing list