I have been looking into the (seemingly random) test_popen2 failures today, and found that it fails when the tests are run in the order given in the subject.
Here is what happens: - test_quopri uses os.popen2, which in turn creates a popen2.Popen3 object. It processes stdin/stdout, but never calls wait on that the process (it can't: os.popen2 doesn't return the pid). This is a flaw (the process should be waited for) which I fixed.
- test_wait3 has a loop to wait for child processes, which invokes wait repeatedly until the expected child pid is returned. In the process, it might receive the wait status from processes it didn't start. This is a flaw which I didn't address - I wonder why there is a loop, instead of just waiting once, and expecting the PID of the process that had been created just before.
- test_popen2 invokes _cleanup at the end, which waits for all _active processes. Now, since test_quopri added an active process, and test_wait3 deleted its pid, _cleanup fails with the error that the pid is no longer valid. I made that failure more obvious.
So here is what I did to address these: - I changed test_quopri to use subprocess.Popen().communicate() to invoke quopri as a program. communicate will wait() for the child process.
- I changed test_popen2 to check at the beginning that there are no active processes. To print out the commands that are still active, I added the cmd string to the Popen objects.
It turned out that this broke the Windows builds, which, in turn, resolves as a hidden bug/feature of quopri command line mode. When "python -mquopri <foo.txt" is invoked, the output will contain CRLF even if the input doesn't, this is likely because stdout is in text mode on Windows. Now, subprocess.py is not in text mode (unlike os.popen2), so when reading the results, we see CRLF on Windows, but LF elsewhere.
I fixed this by making the result check line-ending-independent, because I think using CRLF in quoted-unreadable is compliant with the relevant RFC. Alternatively, the quopri.py command line mode could be put into binary mode, so that it produces identical outputs on all systems.
On 3/23/06, "Martin v. Löwis" email@example.com wrote:
I have been looking into the (seemingly random) test_popen2 failures today, and found that it fails when the tests
I played with this some last night and found the same ordering. I have a different patch that also fixes the problem. It also fixes 2-3 bugs I think. Basically the child could be waited on from outside popen (or from 2 threads). The question is what should we do if that happens? I wrapped some calls and handled the exceptions. Threads can be handled fine since we have the return result. But if the child was waited on from outside popen, we don't have the status info. I'm not sure what to do about that.
There is a problem with the patch that I should use self.sts if it's not -1. See bug #1183780 for the details.
Neal Norwitz wrote:
I played with this some last night and found the same ordering. I have a different patch that also fixes the problem. It also fixes 2-3 bugs I think. Basically the child could be waited on from outside popen (or from 2 threads). The question is what should we do if that happens?
As discussed in the patch: If the application calls wait() at some point, or poll() (in the same thread or a different thread), then it is correct if .wait() will raise an os.error (ECHILD) - the pid was already waited on.
OTOH, _cleanup shouldn't mess with pids that the application still might want to wait on; _cleanup should also clean out objects that already have been waited on. So my proposal is that Popen3 objects should be added to _active only in __del__.