[Python-Dev] popen2.py strangeness

Guido van Rossum guido@python.org
Mon, 02 Jun 2003 13:25:45 -0400


[Sjoerd]
> > > I just noticed the following behavior (current CVS):
> > > 
> > > + python
> > > Python 2.3b1+ (#37, Jun  2 2003, 11:34:36)
> > > [GCC 3.2 20020903 (Red Hat Linux 8.0 3.2-7)] on linux2
> > > Type "help", "copyright", "credits" or "license" for more information.
> > > >>> import popen2
> > > >>> a = popen2.Popen3('sleep 1', 1, -1)
> > > >>> # wait more than one second to give the process time to exit
> > > ...
> > > >>> b = popen2.Popen3('sleep 1', 1, -1)
> > > >>> a.wait()
> > > Traceback (most recent call last):
> > >   File "<stdin>", line 1, in ?
> > >   File "/ufs/sjoerd/src/python/Lib/popen2.py", line 86, in wait
> > >     pid, sts = os.waitpid(self.pid, 0)
> > > OSError: [Errno 10] No child processes
> > > >>>
> > > 
> > > I would expect the a.wait() to return the exit status (or None) of the
> > > first "sleep 1" command, but instead I get a traceback.
> > > 
> > > The reason is clear: in popen2.Popen3.__init__ the very first thing that
> > > happens is a call to _cleanup() which does a poll() call on all active
> > > instances.  If the process has died, the os.waitpid() call that poll()
> > > does removes it from the OS's process table, so that a next os.waitpid()
> > > on the same process id will fail.
> > > 
> > > I understand why the call to _cleanup() happens: defunct processes need
> > > to be cleaned up, and if you only ever use the popen[234] factory
> > > functions, the library needs to call os.waitpid for the created
> > > processes.
> > > 
> > > However, it seems to me the above sequence of events is legitimate and
> > > should be supported.  The question is: how?

[me]
> > Call os.waitpid() with an explicit pid argument?

[Sjoerd]
> popen2 does that, that's not the problem.  The problem is, after the
> second call to popen2.Popen3 or popen2.Popen4 (possibly through the use
> of the factory functions) the call to os.waitpid with explicit pid is
> already done, so there is then no way to get the exit status anymore.
> You can also not call os.waitpid (possibly through the poll() method)
> just before calling popen2.Popen[34] because of race conditions (the
> process might end just after your explicit call and just before the
> implicit call in popen2.Popen[34]).

Hm, it looks like wait() should add an "if self.sts < 0" around most
of its body, like poll().  In particular, this patch solves the
problem for me:

Index: popen2.py
===================================================================
RCS file: /cvsroot/python/python/dist/src/Lib/popen2.py,v
retrieving revision 1.25
diff -c -c -r1.25 popen2.py
*** popen2.py	3 Jun 2002 15:58:31 -0000	1.25
--- popen2.py	2 Jun 2003 17:24:53 -0000
***************
*** 83,92 ****
  
      def wait(self):
          """Wait for and return the exit status of the child process."""
!         pid, sts = os.waitpid(self.pid, 0)
!         if pid == self.pid:
!             self.sts = sts
!             _active.remove(self)
          return self.sts
  
  
--- 83,93 ----
  
      def wait(self):
          """Wait for and return the exit status of the child process."""
!         if self.sts < 0:
!             pid, sts = os.waitpid(self.pid, 0)
!             if pid == self.pid:
!                 self.sts = sts
!                 _active.remove(self)
          return self.sts
  
  

Should I check it in, or is there more to it?

--Guido van Rossum (home page: http://www.python.org/~guido/)