
On Fri, 2004-07-23 at 07:07, Andrew Bennetts wrote:
On Fri, Jul 23, 2004 at 09:38:56AM -0400, James Y Knight wrote:
Wait, wait, that causes *hangs*? That seems like a bad thing. It doesn't look like an obviously wrong thing to do to me. Do you know *why* it's hanging?
I'm not sure why it's hanging, and I'd be happy for someone to figure out why. Ideally they'd fix the problem too, if there is one.
My suspicion is that the bug is in that test, not in Twisted, though. The process_pausing.py script itself is far too ugly to have any confidence in. It tries to detect that writes to stdout block by looking at times, which is extremely fragile. Worse, detecting that writing to stdout blocks doesn't necessarily prove anything anyway: the intention (presumably, the test has no comments) is apparently to test that pauseProducing on a transport will cause pipes from a child process to be unread and hence let the buffers fill. But the child process could just as easily be finding that the writes are blocking because it's simply writing more frequently than the parent is reading, e.g. due to scheduling anomalies...
I'm also not aware of any real world reports of problems with the process code hanging, despite the test being pretty prone to intermittent failure, which is also highly suggestive that the test is broken, not the code.
I have a somewhat annoying problem related to the process code, though possibly not caused by it. I have a script that is managing large numbers of processes (sometimes hundreds, over time) and occasionally a process will manage to exit and twisted's process code doesn't get the waitpid info for it, but instead gets the ECHILD (no such child) system error. In that case, twisted will keep trying to reap the process and will never figure out the process is gone. This is on a Redhat 7.2 system using python2.3 and a newish version of twisted. I don't know why the process seems to get lost, but it would be nice if Twisted would at least notice the ECHILD and signal process termination (or lost, or something). Has anyone else experienced this problem? thanks, dave