Re: [Twisted-Python] connectionLost never reached after calling loseConnection: stuck in CLOSE_WAIT forever

Hello Jean-Paul, thanks for tracking this down, you rock! I promise that when I'll have payed all by debts I'll buy one of those posters of Exarkun to hang on my wall! JP wrote:
After a few runs, I managed to reproduce the problem. I instrumented the reactor with some extra logging and test_producer.py with a manhole server. The sequence of events appears to be something like this:
OneA has a producer of OneE OneA has a consumer of OneB At some point OneB gives up and tells OneA to stopProducing (loseConnection) OneA.loseConnection stops the reactor from reading OneA and starts it writing OneA.doWrite happens it finds the send buffer empty it finds a registered producer (OneE) and resumes it OneE never produces any more bytes OneE loses its connection at some point and unregisters itself from OneA OneA takes note that it has no more producer, but does nothing about it
So the bug is likely that FileDescriptor.unregisterProducer doesn't do anything special when disconnecting=True.
You should be able to reproduce this very simply by setting up a transport-producer/consumer pair, calling loseConnection on the transport, then unregistering the producer.
This all sounds somewhat familiar, but I don't see an existing ticket for it, so maybe that's my imagination.
Jean-Paul
Following your indications I attached a minimal example to a new ticket: http://twistedmatrix.com/trac/ticket/4719 Some additional info: * problem occurs only if more than 64KB of data are written to the transport before its consumer calls stopProducing on it * problem occurs only if some time passes before its producer unregisters itself from the transport Thanks again for your help! :) ste

On Sat, 2010-10-30 at 18:48 +0200, Stefano Debenedetti wrote:
Hello Jean-Paul, thanks for tracking this down, you rock!
I promise that when I'll have payed all by debts I'll buy one of those posters of Exarkun to hang on my wall!
You can also get figurines: http://www.toycutter.com/2009/02/exar-kun-three-eye-monster-munnys.html

Il 30/10/2010 18:48, Stefano Debenedetti ha scritto:
I have attached a 2 lines patch that seems to fix the issue. The patch also contains a 50 lines unit test ;) I hope this makes it on time for the 10.2.0 release. BTW there's a small error here: http://twistedmatrix.com/trac/wiki/BasicGuideToContributingCode $ svn diff -u Subcommand 'diff' doesn't accept option '-u [--show-updates]' Type 'svn help diff' for usage. So the correct command for generating a patch should be documented as: svn diff > my-twisted-patch.patch thanks ciao ste

On Oct 31, 2010, at 1:36 PM, Stefano Debenedetti wrote:
Il 30/10/2010 18:48, Stefano Debenedetti ha scritto:
I have attached a 2 lines patch that seems to fix the issue.
The patch also contains a 50 lines unit test ;)
Stefano, this looks like a great contribution: a serious bug fix in a bit of core functionality, with a test. Thanks so much! I'll try to do a complete review soon, but my immediate impression is that test_disconnectEvent needs a better docstring. The word 'correctly' in a docstring is always a clue that the writer gave up and stopped trying to describe what 'correct' behavior is :). if you could try to rephrase it as "When a transport is (...) and then (...), (...) should be called on (...)". (I will have a more concrete suggestion, with fewer "..."s when I have time to go through the ticket history and review the patch in detail.)
I hope this makes it on time for the 10.2.0 release.
BTW there's a small error here:
http://twistedmatrix.com/trac/wiki/BasicGuideToContributingCode
$ svn diff -u Subcommand 'diff' doesn't accept option '-u [--show-updates]' Type 'svn help diff' for usage.
So the correct command for generating a patch should be documented as:
svn diff > my-twisted-patch.patch
thanks ciao ste
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python

Hey, Inside an twisted.words.protocol.irc.IRCClient class I want to download via XDCC and then have a callback after that's finished. Sadly the documentation is lacking and I'm not sure what is what. If I override IRCClient.dccDoSend(self, user, address, port, filename, size, data) then it enters when I initiate an XDCC SEND request. But how do I start the download? 1- How can I start downloading the file?2- Can I receive an array of the bytes instead of saving a file?3- How can I be notified when the download is complete? Thanks

Glyph wrote:
Stefano, this looks like a great contribution: a serious bug fix in a bit of core functionality, with a test. Thanks so much!
Thank you! It's a pleasure to have one's code be called by code in the Twisted framework and to get stuff done with it! And it's an honor to receive support from such a wonderful community, already ever so enjoyable and enlightening even when just lurking among it!
I'll try to do a complete review soon, but my immediate impression is that test_disconnectEvent needs a better docstring. The word 'correctly' in a docstring is always a clue that the writer gave up and stopped trying to describe what 'correct' behavior is :).
Good point, I totally agree with you. "correctly" is most often the wrong word and a shortcut which leads to unpleasant consequences once what is "correct" gets better defined later by explicitly broken behavior. BTW that line was actually shamelessly copy-pasted from the previous test's docstring ;P
if you could try to rephrase it as "When a transport is (...) and then (...), (...) should be called on (...)".
(I will have a more concrete suggestion, with fewer "..."s when I have time to go through the ticket history and review the patch in detail.)
OK, take two: This test checks that a protocol gets notified of a lost connection when its transport has a registered producer writing more than 64KB to it and is then asked to stopProducing without any more data being written to it. This ought to prevent that sockets get stuck in CLOSE_WAIT state when these peculiar conditions occur. See #4719. Another thing I'm not too sure about is if TCPClientTestsBuilder ("Builder defining tests relating to L{IReactorTCP.connectTCP}.") is the right class for holding such a test, which is not very related to connectTCP. ciao thanks ste

Il 31/10/2010 18:36, Stefano Debenedetti ha scritto:
Il 30/10/2010 18:48, Stefano Debenedetti ha scritto:
I have attached a 2 lines patch that seems to fix the issue.
err, it seems to fix that issue but it introduces another issue: File "my/code.py", line 759, in loseConnection proto.transport.unregisterProducer() File "/home/lala/lib/python/twisted/internet/abstract.py", line 330, in unregisterProducer self.startWriting() File "/home/lala/lib/python/twisted/internet/abstract.py", line 290, in startWriting self.reactor.addWriter(self) File "/home/lala/lib/python/twisted/internet/epollreactor.py", line 107, in addWriter self._add(writer, self._writes, self._reads, self._selectables, _epoll.OUT, _epoll.IN) File "/home/lala/lib/python/twisted/internet/epollreactor.py", line 88, in _add self._poller._control(cmd, fd, flags) File "_epoll.pyx", line 125, in _epoll.epoll._control I just saw this appearing once in the logs of my app after I applied the patch I suggested: http://twistedmatrix.com/trac/attachment/ticket/4719/avoid_hang_in_CLOSE_WAI... I'll try to replicate and to figure out what does FileDescriptor.unregisterProducer need to do instead of the probably too dumb lines I added there: if self.disconnecting: self.startWriting() ciao ste

Il 01/11/2010 15:34, Stefano Debenedetti ha scritto:
I'll try to replicate and to figure out what does FileDescriptor.unregisterProducer need to do instead of the probably too dumb lines I added there:
if self.disconnecting: self.startWriting()
Alright, I added a test case that replicates the new error introduced by the above change and included it in a new proposed fix: http://twistedmatrix.com/trac/attachment/ticket/4719/avoid_hang_in_CLOSE_WAI... if self.connected and self.disconnecting: self.startWriting() This new patch now passes both new unit tests. Sorry about the noise, if I find any more issues about this I'll post them to the ticket page instead of polluting the mailing list with replies to myself. ciao ste

On 04:29 pm, ste@demaledetti.net wrote:
Il 01/11/2010 15:34, Stefano Debenedetti ha scritto:
I'll try to replicate and to figure out what does FileDescriptor.unregisterProducer need to do instead of the probably too dumb lines I added there:
if self.disconnecting: self.startWriting()
Alright, I added a test case that replicates the new error introduced by the above change and included it in a new proposed fix:
http://twistedmatrix.com/trac/attachment/ticket/4719/avoid_hang_in_CLOSE_WAI...
if self.connected and self.disconnecting: self.startWriting()
This new patch now passes both new unit tests.
Sorry about the noise, if I find any more issues about this I'll post them to the ticket page instead of polluting the mailing list with replies to myself.
This is really awesome. Thank you very much. :) Jean-Paul
participants (5)
-
exarkun@twistedmatrix.com
-
Glyph Lefkowitz
-
Itamar Turner-Trauring
-
Stefano Debenedetti
-
velociraptor Genjix