[Twisted-Python] "disconnecting properly" in tests still hangs on macOS

Right, so, I've been trying to get the technique in https://jml.io/pages/how-to-disconnect-in-twisted-really.html to work for me. No hating please, most of my testing in the past has involved hitting a relational database, so there's already a TCP connection flying around, one more won't make any difference. jml's example, exactly as-is on that page, hangs around 30-40% of the time when running on my macOS laptop. From changing the teardown to look like this: def tearDown(self): ds = defer.maybeDeferred(self.serverPort.stopListening) dc = defer.maybeDeferred(self.clientConnection.disconnect) print() ds.addCallback(lambda _: print('serverPort.stopListening')) dc.addCallback(lambda _: print('self.clientConnection.disconnect')) self.clientDisconnected.addCallback(lambda _: print('self.clientDisconnected')) self.serverDisconnected.addCallback(lambda _: print('self.serverDisconnected')) self.serverDisconnected.addErrback(lambda _: print('self.serverDisconnected:', _)) return defer.gatherResults([ds, dc, self.clientDisconnected, self.serverDisconnected]) ...it appears that it's the serverDisconnected deferred that's failing to fire. I can't reproduce this on Linux as of yet, so I'm guessing this is a difference between the SelectReactor used on macOS and the EPollReactor used on Linux. What's the best way to go about debugging a non-firing deferred like this? Anyone know what this might be? cheers, Chris

On Wed, Nov 14, 2018 at 3:50 AM Chris Withers <chris@withers.org> wrote:
Track it backwards towards the earliest/lowest possible source of the event. You've already started this: you noticed that the result of `gatherResults` doesn't fire and you investigate somehow and learned that this is because `serverDisconnected` didn't fire. Now keep going. Why didn't `serverDisconnected` fire? It looks like it should be fired by `ServerProtocol.connectionLost`. Does that method ever get called? If so, you've now identified the break in the chain and you just have to figure out why `connectionLost` doesn't manage to make `serverDisconnected` fire. If not, keep going. What code calls `ServerProtocol.connectionLost` - does that run, etc. Repeat until you find the break.
Anyone know what this might be?
I know that macOS often delivers events in a different order compared to Linux. This has sometimes caused problems, particularly in the test suite. I can't quite see how that would explain this behavior though so maybe it's something else. It might be useful to share your version of Python, Twisted, macOS, and the reactor you're using, in case anyone wants to try to replicate. Jean-Paul

Chris, I played with this for a bit and quickly reproduced the "server side disconnect never seems to happen" behaviour you described, on my system running macOS 10.12.6 and Twisted 18.9.0 using the SelectReactor. Suspecting of an eventual race condition between "server stop listen" and "server disconnect", I tried this variation of tearDown which seems to work reliably: @defer.inlineCallbacks def tearDown(self): self.clientConnection.disconnect() yield defer.gatherResults([self.clientDisconnected, self.serverDisconnected]) yield defer.maybeDeferred(self.serverPort.stopListening) Do I have any motive to suspect such race condition? No, but after ensuring "disconnect first, stop listening later", things work much better here. Also tested the CFReactor and KQueueReactor: CFReactor seems to exhibit a similar behaviour (original code hangs every now and then on cleanup, my code works); KQueueReactor always hangs on cleanup with the original code, and works reliably with my code. My 2c., -- exvito

On Wed, Nov 14, 2018 at 3:50 AM Chris Withers <chris@withers.org> wrote:
Track it backwards towards the earliest/lowest possible source of the event. You've already started this: you noticed that the result of `gatherResults` doesn't fire and you investigate somehow and learned that this is because `serverDisconnected` didn't fire. Now keep going. Why didn't `serverDisconnected` fire? It looks like it should be fired by `ServerProtocol.connectionLost`. Does that method ever get called? If so, you've now identified the break in the chain and you just have to figure out why `connectionLost` doesn't manage to make `serverDisconnected` fire. If not, keep going. What code calls `ServerProtocol.connectionLost` - does that run, etc. Repeat until you find the break.
Anyone know what this might be?
I know that macOS often delivers events in a different order compared to Linux. This has sometimes caused problems, particularly in the test suite. I can't quite see how that would explain this behavior though so maybe it's something else. It might be useful to share your version of Python, Twisted, macOS, and the reactor you're using, in case anyone wants to try to replicate. Jean-Paul

Chris, I played with this for a bit and quickly reproduced the "server side disconnect never seems to happen" behaviour you described, on my system running macOS 10.12.6 and Twisted 18.9.0 using the SelectReactor. Suspecting of an eventual race condition between "server stop listen" and "server disconnect", I tried this variation of tearDown which seems to work reliably: @defer.inlineCallbacks def tearDown(self): self.clientConnection.disconnect() yield defer.gatherResults([self.clientDisconnected, self.serverDisconnected]) yield defer.maybeDeferred(self.serverPort.stopListening) Do I have any motive to suspect such race condition? No, but after ensuring "disconnect first, stop listening later", things work much better here. Also tested the CFReactor and KQueueReactor: CFReactor seems to exhibit a similar behaviour (original code hangs every now and then on cleanup, my code works); KQueueReactor always hangs on cleanup with the original code, and works reliably with my code. My 2c., -- exvito
participants (3)
-
Chris Withers
-
exvito here
-
Jean-Paul Calderone