[Twisted-Python] Reactor Spinning?
Hey Twisted folks, I'm using the twisted.web framework for a high performance HTTP proxy server that very closely resembles the HTTP proxy server example that comes with the twisted package. Under heavy load, I occasionally run into a problem where the reactor appears to start spinning on an "empty" write. An strace of the process reveals these system calls: epoll_wait(3, {{EPOLLOUT, {u32=81, u64=13194703840475086929}}}, 269, 8) = 1 write(81, "", 0) = 0 gettimeofday({1272980323, 567613}, NULL) = 0 gettimeofday({1272980323, 567696}, NULL) = 0 epoll_wait(3, {{EPOLLOUT, {u32=81, u64=13194703840475086929}}}, 269, 8) = 1 write(81, "", 0) = 0 gettimeofday({1272980323, 567988}, NULL) = 0 gettimeofday({1272980323, 568071}, NULL) = 0 epoll_wait(3, {{EPOLLOUT, {u32=81, u64=13194703840475086929}}}, 269, 7) = 1 write(81, "", 0) = 0 gettimeofday({1272980323, 568501}, NULL) = 0 gettimeofday({1272980323, 568613}, NULL) = 0 epoll_wait(3, {{EPOLLOUT, {u32=81, u64=13194703840475086929}}}, 269, 7) = 1 write(81, "", 0) = 0 gettimeofday({1272980323, 568978}, NULL) = 0 gettimeofday({1272980323, 569084}, NULL) = 0 epoll_wait(3, {{EPOLLOUT, {u32=81, u64=13194703840475086929}}}, 269, 6) = 1 write(81, "", 0) = 0 ...it seems to be waiting for write availability on a FD, trying to write an empty buffer, seeing that it wrote zero bytes, and so waiting for availability again. I've confirmed that when this occurs, whatever FD it is spinning on does correspond with a network socket connected to the proxy. I've seen this with both Twisted 8.2 and 9.0 running on Linux using both the epoll and select reactors. If anyone has any thoughts about this problem or has seen it before, I'd appreciate any insight that anyone might have. Thanks, - moxie -- http://www.thoughtcrime.org
Does it spin forever or does it clear up after some time? I have occasional 'hickups' with heavy loads with responses taking several ten seconds for no apparent reason, the server then working again like nothing happened. Werner Moxie Marlinspike wrote:
Hey Twisted folks,
I'm using the twisted.web framework for a high performance HTTP proxy server that very closely resembles the HTTP proxy server example that comes with the twisted package.
Under heavy load, I occasionally run into a problem where the reactor appears to start spinning on an "empty" write. An strace of the process reveals these system calls:
epoll_wait(3, {{EPOLLOUT, {u32=81, u64=13194703840475086929}}}, 269, 8) = 1 write(81, "", 0) = 0 gettimeofday({1272980323, 567613}, NULL) = 0 gettimeofday({1272980323, 567696}, NULL) = 0 epoll_wait(3, {{EPOLLOUT, {u32=81, u64=13194703840475086929}}}, 269, 8) = 1 write(81, "", 0) = 0 gettimeofday({1272980323, 567988}, NULL) = 0 gettimeofday({1272980323, 568071}, NULL) = 0 epoll_wait(3, {{EPOLLOUT, {u32=81, u64=13194703840475086929}}}, 269, 7) = 1 write(81, "", 0) = 0 gettimeofday({1272980323, 568501}, NULL) = 0 gettimeofday({1272980323, 568613}, NULL) = 0 epoll_wait(3, {{EPOLLOUT, {u32=81, u64=13194703840475086929}}}, 269, 7) = 1 write(81, "", 0) = 0 gettimeofday({1272980323, 568978}, NULL) = 0 gettimeofday({1272980323, 569084}, NULL) = 0 epoll_wait(3, {{EPOLLOUT, {u32=81, u64=13194703840475086929}}}, 269, 6) = 1 write(81, "", 0) = 0
...it seems to be waiting for write availability on a FD, trying to write an empty buffer, seeing that it wrote zero bytes, and so waiting for availability again.
I've confirmed that when this occurs, whatever FD it is spinning on does correspond with a network socket connected to the proxy. I've seen this with both Twisted 8.2 and 9.0 running on Linux using both the epoll and select reactors.
If anyone has any thoughts about this problem or has seen it before, I'd appreciate any insight that anyone might have.
Thanks,
- moxie
I've seen it clear up before, but it will normally spin for days. This state doesn't really impact performance at all (requests are still snappy), but it is annoying how much CPU time it eats through before a restart. - moxie -- http://www.thoughtcrime.org Werner Thie wrote:
Does it spin forever or does it clear up after some time?
I have occasional 'hickups' with heavy loads with responses taking several ten seconds for no apparent reason, the server then working again like nothing happened.
Werner
Moxie Marlinspike wrote:
Hey Twisted folks,
I'm using the twisted.web framework for a high performance HTTP proxy server that very closely resembles the HTTP proxy server example that comes with the twisted package.
Under heavy load, I occasionally run into a problem where the reactor appears to start spinning on an "empty" write. An strace of the process reveals these system calls:
epoll_wait(3, {{EPOLLOUT, {u32=81, u64=13194703840475086929}}}, 269, 8) = 1 write(81, "", 0) = 0 gettimeofday({1272980323, 567613}, NULL) = 0 gettimeofday({1272980323, 567696}, NULL) = 0 epoll_wait(3, {{EPOLLOUT, {u32=81, u64=13194703840475086929}}}, 269, 8) = 1 write(81, "", 0) = 0 gettimeofday({1272980323, 567988}, NULL) = 0 gettimeofday({1272980323, 568071}, NULL) = 0 epoll_wait(3, {{EPOLLOUT, {u32=81, u64=13194703840475086929}}}, 269, 7) = 1 write(81, "", 0) = 0 gettimeofday({1272980323, 568501}, NULL) = 0 gettimeofday({1272980323, 568613}, NULL) = 0 epoll_wait(3, {{EPOLLOUT, {u32=81, u64=13194703840475086929}}}, 269, 7) = 1 write(81, "", 0) = 0 gettimeofday({1272980323, 568978}, NULL) = 0 gettimeofday({1272980323, 569084}, NULL) = 0 epoll_wait(3, {{EPOLLOUT, {u32=81, u64=13194703840475086929}}}, 269, 6) = 1 write(81, "", 0) = 0
...it seems to be waiting for write availability on a FD, trying to write an empty buffer, seeing that it wrote zero bytes, and so waiting for availability again.
I've confirmed that when this occurs, whatever FD it is spinning on does correspond with a network socket connected to the proxy. I've seen this with both Twisted 8.2 and 9.0 running on Linux using both the epoll and select reactors.
If anyone has any thoughts about this problem or has seen it before, I'd appreciate any insight that anyone might have.
Thanks,
- moxie
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
Hey Twisted folks,
I'm using the twisted.web framework for a high performance HTTP proxy server that very closely resembles the HTTP proxy server example that comes with the twisted package.
Under heavy load, I occasionally run into a problem where the reactor appears to start spinning on an "empty" write. An strace of the process reveals these system calls:
Can you produce a minimal reproducing example of a program that has this issue? I can imagine a code bug where you're calling transport.resumeProducing() or startWriting() sometime when you shouldn't, except the empty write() does look wrong... a minimal code sample would help.
Hey Itamar, the problem is that this only occurs under heavy load in a production environment (of course!), so I don't really have any way of testing a simplified version. The code itself is fairly simple, however, and is almost identical to the code that ships in twisted.web.proxy. This is to say that I'm simply shuttling data between an HTTPClient object and a Request object. I never call transport.resumeProducing() or startWriting() myself, and I only call transport.write() in exactly one place (when sending POST data via HTTPClient). The rest of the time I'm writing to either the Request object (not its transport) when communicating back to the clients, or availing myself of higher-level methods like HTTPClient.sendHeader() when writing to the server side. The code is available here: http://www.googlesharing.net/server/googleshare-0.9.tar.gz - moxie -- http://www.thoughtcrime.org Itamar Turner-Trauring wrote:
Can you produce a minimal reproducing example of a program that has this issue? I can imagine a code bug where you're calling transport.resumeProducing() or startWriting() sometime when you shouldn't, except the empty write() does look wrong... a minimal code sample would help.
On 01:48 pm, moxie@thoughtcrime.org wrote:
Hey Twisted folks,
I'm using the twisted.web framework for a high performance HTTP proxy server that very closely resembles the HTTP proxy server example that comes with the twisted package.
Under heavy load, I occasionally run into a problem where the reactor appears to start spinning on an "empty" write. An strace of the process reveals these system calls:
epoll_wait(3, {{EPOLLOUT, {u32=81, u64=13194703840475086929}}}, 269, 8) = 1 write(81, "", 0) = 0
It might be interesting to learn what's in the reactor that's associated with this file descriptor and what the state of its send buffer is. Some strategically placed log messages might help with that, or a manhole in the process that you can connect to when you notice the problem has started in order to poke around. Jean-Paul
I've confirmed that when spinning the FD corresponds with a client connection. What would you suggest logging that might provide some insight? Given what I understand about manholes, the fact that this only occurs in a production environment which must remain high performance means that might not be an option for me. - moxie -- http://www.thoughtcrime.org exarkun@twistedmatrix.com wrote:
It might be interesting to learn what's in the reactor that's associated with this file descriptor and what the state of its send buffer is. Some strategically placed log messages might help with that, or a manhole in the process that you can connect to when you notice the problem has started in order to poke around.
Jean-Paul
On 05/05/10 15:22, Moxie Marlinspike wrote:
I've confirmed that when spinning the FD corresponds with a client connection. What would you suggest logging that might provide some insight?
Given what I understand about manholes, the fact that this only occurs in a production environment which must remain high performance means that might not be an option for me.
A manhole is really just a listening connection which spawns a python interpreter. It doesn't consume any CPU adding a manhole to a twisted server, and very little CPU accessing it (unless you do something *in* the manhole that consumes a lot of CPU)
I've confirmed that when spinning the FD corresponds with a client connection. What would you suggest logging that might provide some insight?
What is the transport's writeSomeData() method doing? Is it actually trying to write an empty string? Transition from/to writeable state may also be interesting, along with logging size of transport's buffer (so logging in stopWriting/startWriting).
Hey everyone, I've still been trying to track down this spin bug. What I probably should have mentioned before is that these are SSL connections. Looking through the 10.0 release code, I found the _sendCloseAlert method in tcp.py: http://pastebin.com/gZKxHtN5 I feel like this is almost certainly the culprit. Most suspiciously, there's an explicit empty write on pastebin-line 23: os.write(self.socket.fileno(), '') It looks to me like there's a code path where this write "succeeds," the SSL socket reports that the connection is still not shut down, and then the transport is resumed via startWriting and startReading. I assume that this then repeats for as long as that socket is up. This directly corresponds with the system calls I'm seeing (poll, empty write, poll, empty write...) I still don't have an entirely comprehensive grasp of whats' going on here, but while these efforts to "cleanly" shut down the SSL connection by exchanging alerts are heroic, I feel like it's much more common (and probably good enough) to just send an alert and immediately close the underlying socket. - moxie -- http://www.thoughtcrime.org On 05/05/2010 01:01 PM, Itamar Turner-Trauring wrote:
I've confirmed that when spinning the FD corresponds with a client connection. What would you suggest logging that might provide some insight?
What is the transport's writeSomeData() method doing? Is it actually trying to write an empty string? Transition from/to writeable state may also be interesting, along with logging size of transport's buffer (so logging in stopWriting/startWriting).
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
On 23 May, 03:42 pm, moxie@thoughtcrime.org wrote:
Hey everyone, I've still been trying to track down this spin bug. What I probably should have mentioned before is that these are SSL connections. Looking through the 10.0 release code, I found the _sendCloseAlert method in tcp.py: http://pastebin.com/gZKxHtN5
I feel like this is almost certainly the culprit. Most suspiciously, there's an explicit empty write on pastebin-line 23:
os.write(self.socket.fileno(), '')
It looks to me like there's a code path where this write "succeeds," the SSL socket reports that the connection is still not shut down, and then the transport is resumed via startWriting and startReading. I assume that this then repeats for as long as that socket is up. This directly corresponds with the system calls I'm seeing (poll, empty write, poll, empty write...)
I still don't have an entirely comprehensive grasp of whats' going on here, but while these efforts to "cleanly" shut down the SSL connection by exchanging alerts are heroic, I feel like it's much more common (and probably good enough) to just send an alert and immediately close the underlying socket.
It'd be great if you could attach a minimal example which demonstrates this behavior to a ticket in the issue tracker (threads on the mailing list usually lead to ignored issues). It's probably also worth reporting your version of OpenSSL and pyOpenSSL, since that might be related. I've not seen this kind of behavior before, which may just mean I'm running an older version of OpenSSL than you. Jean-Paul
participants (5)
-
exarkun@twistedmatrix.com
-
Itamar Turner-Trauring
-
Moxie Marlinspike
-
Phil Mayers
-
Werner Thie