New subject: [Twisted-Python] Unable to write to "stuck" TCP client connections

Feb. 25, 2013

      Hi,

I am encountering a weird bug, where some TCP client connections get into a
state where the server is able to read data sent from the client, but not
able to send any data with transport.write().

With some help from the #twitsed IRC channel, I was able to gather the
following information regarding the bug. While I'm still unable to provide
steps to reporduce this bug, I am able to reliably find clients who are in
this state. I am running 24 instances of the twisted server (Epoll reactor)
running on Ubuntu, with a peak traffic of >130k users. At any instance,
there are < 20 TCP connections stuck in this state.

Here is some information about the bug:

1. transport.write() does not send anything down the socket
2. transport.doWrite() will send all the data that has been buffered up,
and then stop sending any new data.
3. transport.writeSomeData() will send data
3. reactor.getWriters() will return a list of transports that are all stuck
in this state, and the writers will remain in this list.
4. Calling reactor.removeWriter(transport) will "unstuck" the transport and
data gets streamed once again.
5. A small number of clients will receive data for a while, and return to
this stuck state. Most return to normal once reactor.removeWriter() is
called.
6. Based on the suggestion from IRC user _habnabit, I used strace after
removing the writer, here is the output:

(4:52:09 PM) thewrongboy: epoll_ctl(3, EPOLL_CTL_MOD, 6504, {EPOLLIN,
{u32=6504, u64=22205092589476200}}) = 0
(4:52:09 PM) thewrongboy: epoll_ctl(3, EPOLL_CTL_MOD, 6504,
{EPOLLIN|EPOLLOUT, {u32=6504, u64=22205092589476200}}) = 0
 (4:52:09 PM) thewrongboy: epoll_ctl(3, EPOLL_CTL_MOD, 6504, {EPOLLIN,
{u32=6504, u64=22205092589476200}}) = 0
(4:52:09 PM) thewrongboy: epoll_ctl(3, EPOLL_CTL_MOD, 6504,
{EPOLLIN|EPOLLOUT, {u32=6504, u64=22205092589476200}}) = 0
 (4:52:09 PM) thewrongboy: epoll_ctl(3, EPOLL_CTL_MOD, 6504, {EPOLLIN,
{u32=6504, u64=22205092589476200}}) = 0

For now, I am using a LoopingCall to check and remove transports that are
stuck in getWriters().

I am using Twisted 12.3.0 on Ubuntu 12.04 - 3.2.0-35-generic #55-Ubuntu SMP
Wed Dec 5 17:42:16 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux.

Has anyone else experienced this weird problem? I'd love to provide more
information regarding this bug.

[Twisted-Python] Unable to write to "stuck" TCP client connections

Wenxiang Wu

Itamar Turner-Trauring

Itamar Turner-Trauring

Glyph

Wenxiang Wu

Glyph

Gelin Yan

Wenxiang Wu

Glyph

Itamar Turner-Trauring

Itamar Turner-Trauring

Glyph

Wenxiang Wu

Glyph

Gelin Yan

Wenxiang Wu

Glyph

tags

participants (5)