[Twisted-Python] Re: [Twisted-commits] r11107 - Delete fundamentally broken test.
Wait, wait, that causes *hangs*? That seems like a bad thing. It doesn't look like an obviously wrong thing to do to me. Do you know *why* it's hanging? James On Jul 23, 2004, at 2:06 AM, Andrew Bennetts wrote:
warner and I agree that this test wasn't testing any useful behaviours that anything could actually depend upon, and it causes far too many intermittent test failures (and even hangs!)
-class PausingProcessProtocol(protocol.ProcessProtocol): - - data = "" - elapsed = None - - def connectionMade(self): - self.transport.pauseProducing() - self.transport.write("a") - reactor.callLater(2, self.transport.resumeProducing) - - def outReceived(self, d): - self.data += d - - def processEnded(self, reason): - self.data = self.data.lstrip("a") - if len(self.data) != 5: - self.elapsed = ValueError # XXX! - else: - self.elapsed = float(self.data) - [...] - - def testPausing(self): - exe = sys.executable - scriptPath = util.sibpath(__file__, "process_pausing.py") - p = PausingProcessProtocol() - reactor.spawnProcess(p, exe, [exe, "-u", scriptPath], env=None) - while p.elapsed == None: - reactor.iterate(0.01) - self.failIfEqual(p.elapsed, ValueError, 'Child process wrote garbage') - self.assert_(2.1 > p.elapsed > 1.5) # assert how long process was paused
On Fri, Jul 23, 2004 at 09:38:56AM -0400, James Y Knight wrote:
Wait, wait, that causes *hangs*? That seems like a bad thing. It doesn't look like an obviously wrong thing to do to me. Do you know *why* it's hanging?
I'm not sure why it's hanging, and I'd be happy for someone to figure out why. Ideally they'd fix the problem too, if there is one. My suspicion is that the bug is in that test, not in Twisted, though. The process_pausing.py script itself is far too ugly to have any confidence in. It tries to detect that writes to stdout block by looking at times, which is extremely fragile. Worse, detecting that writing to stdout blocks doesn't necessarily prove anything anyway: the intention (presumably, the test has no comments) is apparently to test that pauseProducing on a transport will cause pipes from a child process to be unread and hence let the buffers fill. But the child process could just as easily be finding that the writes are blocking because it's simply writing more frequently than the parent is reading, e.g. due to scheduling anomalies... I'm also not aware of any real world reports of problems with the process code hanging, despite the test being pretty prone to intermittent failure, which is also highly suggestive that the test is broken, not the code. I also have difficulty imagining a scenario which would rely upon this behaviour. The OS is free to buffer writes to pipes as much as it wants, even if the other end is unread, so this isn't something that can be portably relied on to give any useful information. If the parent really wants to signal to the child that it should behave differently, then ought to tell that to the child directly, either via a message down a pipe, or a signal, or some other channel. Relying upon indirect side-effects doesn't strike me as a good practice. In short: the test's intention is undocumented, and appears to be testing something fundamentally unuseful, and is contributing zero additional confidence in the code base by being in the test suite, but is contributing a considerable amount of noise on the buildbot due to failures that are more than likely spurious. We're better off without it. -Andrew.
On Fri, 2004-07-23 at 10:07, Andrew Bennetts wrote:
In short: the test's intention is undocumented, and appears to be testing something fundamentally unuseful,
We need to test that Process transports have pause/resumeProducing, since this is part of the interface and e.g. CGI depends on it. I don't remember if I specifically wrote this test, but we do need *something*. -- Itamar Shtull-Trauring http://itamarst.org
On Fri, Jul 23, 2004 at 10:41:51AM -0400, Itamar Shtull-Trauring wrote:
On Fri, 2004-07-23 at 10:07, Andrew Bennetts wrote:
In short: the test's intention is undocumented, and appears to be testing something fundamentally unuseful,
We need to test that Process transports have pause/resumeProducing, since this is part of the interface and e.g. CGI depends on it. I don't remember if I specifically wrote this test, but we do need *something*.
If we want to test the interface is implemented, then use IProducer.implementedBy and zope.interface.verify.verifyClass. If we want to test the behaviour (and I agree it'd be nice), someone needs to figure out a better test :) -Andrew.
Sigh I probably ought to have read the process_pausing.py script, too. :) I was wrongly assuming it was just doing something nice and simple. Anyhow, I can see how it could easily cause spurious failures, as it's depending on the process scheduling doing *just* what it wants, which is obviously unreliable. That it hangs is still a bad thing, since even if the timing is wrong, it oughtn't be hanging. ... But I have traced down the problem: "if len(self.data) != 5: raise ValueError" Apparently, the exception raised from inside processEnded just gets eaten somewhere instead of raised back through to the test framework. Thus, if the test takes more than 9.999 seconds to complete, it will raise an exception and not set self.elapsed. Then, the test will sit in the reactor.iterate loop forever waiting for self.elapsed to be set. Okay, I'm satisfied the error was completely in the test. However, it does now seem that pauseProducing isn't tested at all. James On Jul 23, 2004, at 10:07 AM, Andrew Bennetts wrote:
On Fri, Jul 23, 2004 at 09:38:56AM -0400, James Y Knight wrote:
Wait, wait, that causes *hangs*? That seems like a bad thing. It doesn't look like an obviously wrong thing to do to me. Do you know *why* it's hanging?
I'm not sure why it's hanging, and I'd be happy for someone to figure out why. Ideally they'd fix the problem too, if there is one.
My suspicion is that the bug is in that test, not in Twisted, though. The process_pausing.py script itself is far too ugly to have any confidence in. It tries to detect that writes to stdout block by looking at times, which is extremely fragile. Worse, detecting that writing to stdout blocks doesn't necessarily prove anything anyway: the intention (presumably, the test has no comments) is apparently to test that pauseProducing on a transport will cause pipes from a child process to be unread and hence let the buffers fill. But the child process could just as easily be finding that the writes are blocking because it's simply writing more frequently than the parent is reading, e.g. due to scheduling anomalies...
I'm also not aware of any real world reports of problems with the process code hanging, despite the test being pretty prone to intermittent failure, which is also highly suggestive that the test is broken, not the code.
I also have difficulty imagining a scenario which would rely upon this behaviour. The OS is free to buffer writes to pipes as much as it wants, even if the other end is unread, so this isn't something that can be portably relied on to give any useful information. If the parent really wants to signal to the child that it should behave differently, then ought to tell that to the child directly, either via a message down a pipe, or a signal, or some other channel. Relying upon indirect side-effects doesn't strike me as a good practice.
In short: the test's intention is undocumented, and appears to be testing something fundamentally unuseful, and is contributing zero additional confidence in the code base by being in the test suite, but is contributing a considerable amount of noise on the buildbot due to failures that are more than likely spurious. We're better off without it.
-Andrew.
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
On Jul 23, 2004, at 10:07 AM, Andrew Bennetts wrote:
On Fri, Jul 23, 2004 at 09:38:56AM -0400, James Y Knight wrote:
Wait, wait, that causes *hangs*? That seems like a bad thing. It doesn't look like an obviously wrong thing to do to me. Do you know *why* it's hanging?
I'm not sure why it's hanging, and I'd be happy for someone to figure out why. Ideally they'd fix the problem too, if there is one.
Ugh, you *did* know why it was hanging... just saw the commits where it was already fixed. ;) Sigh, oh well. :) James
On Fri, Jul 23, 2004 at 11:05:55AM -0400, James Y Knight wrote:
On Jul 23, 2004, at 10:07 AM, Andrew Bennetts wrote:
On Fri, Jul 23, 2004 at 09:38:56AM -0400, James Y Knight wrote:
Wait, wait, that causes *hangs*? That seems like a bad thing. It doesn't look like an obviously wrong thing to do to me. Do you know *why* it's hanging?
I'm not sure why it's hanging, and I'd be happy for someone to figure out why. Ideally they'd fix the problem too, if there is one.
Ugh, you *did* know why it was hanging... just saw the commits where it was already fixed. ;) Sigh, oh well. :)
No, I didn't know (you're referring to the "raise ValueError" problem I fixed, I presume). It hung again even after I fixed that! -Andrew.
On Fri, 2004-07-23 at 07:07, Andrew Bennetts wrote:
On Fri, Jul 23, 2004 at 09:38:56AM -0400, James Y Knight wrote:
Wait, wait, that causes *hangs*? That seems like a bad thing. It doesn't look like an obviously wrong thing to do to me. Do you know *why* it's hanging?
I'm not sure why it's hanging, and I'd be happy for someone to figure out why. Ideally they'd fix the problem too, if there is one.
My suspicion is that the bug is in that test, not in Twisted, though. The process_pausing.py script itself is far too ugly to have any confidence in. It tries to detect that writes to stdout block by looking at times, which is extremely fragile. Worse, detecting that writing to stdout blocks doesn't necessarily prove anything anyway: the intention (presumably, the test has no comments) is apparently to test that pauseProducing on a transport will cause pipes from a child process to be unread and hence let the buffers fill. But the child process could just as easily be finding that the writes are blocking because it's simply writing more frequently than the parent is reading, e.g. due to scheduling anomalies...
I'm also not aware of any real world reports of problems with the process code hanging, despite the test being pretty prone to intermittent failure, which is also highly suggestive that the test is broken, not the code.
I have a somewhat annoying problem related to the process code, though possibly not caused by it. I have a script that is managing large numbers of processes (sometimes hundreds, over time) and occasionally a process will manage to exit and twisted's process code doesn't get the waitpid info for it, but instead gets the ECHILD (no such child) system error. In that case, twisted will keep trying to reap the process and will never figure out the process is gone. This is on a Redhat 7.2 system using python2.3 and a newish version of twisted. I don't know why the process seems to get lost, but it would be nice if Twisted would at least notice the ECHILD and signal process termination (or lost, or something). Has anyone else experienced this problem? thanks, dave
On Thu, Jul 29, 2004 at 08:39:14PM -0700, Dave Peticolas wrote: [...]
This is on a Redhat 7.2 system using python2.3 and a newish version of twisted. I don't know why the process seems to get lost, but it would be nice if Twisted would at least notice the ECHILD and signal process termination (or lost, or something).
What version of Twisted, exactly? There's been some changes to the reaping code in recent releases (e.g. between 1.2.0 and 1.3.0), which might be relevant. -Andrew.
On Thu, 2004-07-29 at 20:52, Andrew Bennetts wrote:
On Thu, Jul 29, 2004 at 08:39:14PM -0700, Dave Peticolas wrote: [...]
This is on a Redhat 7.2 system using python2.3 and a newish version of twisted. I don't know why the process seems to get lost, but it would be nice if Twisted would at least notice the ECHILD and signal process termination (or lost, or something).
What version of Twisted, exactly? There's been some changes to the reaping code in recent releases (e.g. between 1.2.0 and 1.3.0), which might be relevant.
I believe it is 1.2.0. dave
On Thu, Jul 29, 2004 at 09:21:41PM -0700, Dave Peticolas wrote:
On Thu, 2004-07-29 at 20:52, Andrew Bennetts wrote:
On Thu, Jul 29, 2004 at 08:39:14PM -0700, Dave Peticolas wrote: [...]
This is on a Redhat 7.2 system using python2.3 and a newish version of twisted. I don't know why the process seems to get lost, but it would be nice if Twisted would at least notice the ECHILD and signal process termination (or lost, or something).
What version of Twisted, exactly? There's been some changes to the reaping code in recent releases (e.g. between 1.2.0 and 1.3.0), which might be relevant.
I believe it is 1.2.0.
Ok, then I think it's likely that 1.3.0 fixes your bug. 1.2.0 had a nasty hack in its child reaping that went away in 1.3.0. -Andrew.
On Thu, 2004-07-29 at 21:37, Andrew Bennetts wrote:
On Thu, Jul 29, 2004 at 09:21:41PM -0700, Dave Peticolas wrote:
On Thu, 2004-07-29 at 20:52, Andrew Bennetts wrote:
On Thu, Jul 29, 2004 at 08:39:14PM -0700, Dave Peticolas wrote: [...]
This is on a Redhat 7.2 system using python2.3 and a newish version of twisted. I don't know why the process seems to get lost, but it would be nice if Twisted would at least notice the ECHILD and signal process termination (or lost, or something).
What version of Twisted, exactly? There's been some changes to the reaping code in recent releases (e.g. between 1.2.0 and 1.3.0), which might be relevant.
I believe it is 1.2.0.
Ok, then I think it's likely that 1.3.0 fixes your bug. 1.2.0 had a nasty hack in its child reaping that went away in 1.3.0.
Yes, I see the diff. That seems to have fixed it, thanks for the info! dave
participants (4)
-
Andrew Bennetts
-
Dave Peticolas
-
Itamar Shtull-Trauring
-
James Y Knight