[ python-Bugs-1183780 ] Popen4 wait() fails sporadically with threads

SourceForge.net noreply at sourceforge.net
Fri Mar 24 08:40:38 CET 2006

Bugs item #1183780, was opened at 2005-04-15 16:27
Message generated for change (Comment added) made by loewis
You can respond by visiting: 

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Submitted By: Taale Skogan (tskogan)
Assigned to: Neal Norwitz (nnorwitz)
Summary: Popen4 wait() fails sporadically with threads

Initial Comment:
Calling wait() on a popen2.Popen4 object fails 
intermittently with the error

Traceback (most recent call last):
  File "/usr/local/lib/python2.3/popen2.py", line 90, in wait
    pid, sts = os.waitpid(self.pid, 0)
OSError: [Errno 10] No child processes

when using threads. 

The problem seems to be a race condition when a thread 
calls wait() on a popen2.Popen4 object. This also apllies 
to Popen3 objects. 

The constructor of Popen4. calls _cleanup() which calls 
poll() which calls the system call waitpid() for all acitve 
child processes. If another thread calls poll() before the 
current thread calls wait() on it's child process and the 
child process has terminated, the child process is no 
longer waitable and the second call to wait() fails.

Code to replicate this behavoir is attached in popen_bug.

Solution: Popen4 and Popen3 should be threadsafe.

Related modules: A seemingly related error occurs with 
Popen from the new subprocess module. Use the -s 
option in the popen_bug.py script to test this. 

Tested on Linux RedHat Enterprise 3 for Python 2.3.3, 
Python 2.3.5 and Python 2.4.1 and Solaris for Python 2.
4.1. The error did not occur on a RedHat 7.3 machine 
with Python 2.3.5. See the attached file popen_bug.py for 
details on the platforms.


>Comment By: Martin v. Löwis (loewis)
Date: 2006-03-24 08:40

Logged In: YES 

This looks all fine.

As a further issue, I think _cleanup should also clean
processes which already have been waited on. So if waitpid
gives ECHILD (in _cleanup), I think the object should get
removed from _active - otherwise, it would stay there
forever. Of course, care is then need to avoid __del__
adding it back to _active.

Putting these all together, I propose v3 of the patch.

Another aspect that puzzles me is the repeated test that
waitpid() really returns the pid you asked for. How could it
not? If it fails, you get an os.error.


Comment By: Neal Norwitz (nnorwitz)
Date: 2006-03-24 06:17

Logged In: YES 

I agree with your comment about setting self.sts to 0.  That
was the problem I alluded to on python-dev.

Although I dislike __del__, this does seem like an
appropriate place to do the modification of _active.

Note that currently all os.error's are swallowed in poll().
 I'm not sure if that was the best idea, but that's the
current interface.  wait() does *not* catch any exceptions.

I wasn't really sure what to do about threads.  The threads
could always manage their calls into a popen object like you
propose (don't try to handle simultaneous calls to poll and
wait).  Another question I had was if popen should be
deprecated in favor of subprocess?

I've attached a patch which I think implements your
suggestion.  It also seems to fix all the problems.


Comment By: Martin v. Löwis (loewis)
Date: 2006-03-24 01:37

Logged In: YES 

I don't understand why you are setting self.sts to 0 if wait
fails: most likely, there was a simultaneous call to .poll,
which should have set self.sts to the real return value. So
we should return that instead.

I think the whole issue can be avoid if we use resurrection:
If __del__ would put unwaited objects into _active, rather
than __init__, it would not happen that _cleanup polls a pid
which a thread still intends to wait for. In fact, it would
be sufficient to only put the pid into _active (avoiding the
need for resurrection).

If then a thread calls poll explicitly, and another calls
wait, they deserve to lose (with ECHILD). I would claim the
same error exists if part of the application calls
os.wait[3,4](), and another calls .wait later - they also
deserve the exception.

With that approach, I don't think further thread
synchronization would be needed.


Comment By: Neal Norwitz (nnorwitz)
Date: 2006-03-23 09:41

Logged In: YES 

The attached patch fixes the problem for me.  It also
addresses another issue where wait could be called from
outside the popen2 module.  I'm not sure this is the best
solution.  I'm not sure there really is a good solution. 
Perhaps it's best to allow an exception to be raised?


You can respond by visiting: 

More information about the Python-bugs-list mailing list