RE: [Twisted-Python] intermittent problem: not accepting new connections

Sept. 10, 2008

      ...
Do non-error log messages continue to appear in the Twisted log?  ie,
is
it clear that the logging system is still working, or could it have
failed
in some way, obscuring any exception reports?
Yes, I print the results of garbage collection every 10 min like this:

print "number of objects tracked by the garbage collector is:", len(gc.get_objects())

and they appear in the main twisted log.

2008/09/10 13:21 -0700 [-] number of objects tracked by the garbage collector is: 860021
2008/09/10 13:31 -0700 [-] number of objects tracked by the garbage collector is: 864316
...
Any new unhandled errno values should definitely result in an exception
being logged (notice that the `raise´ which follows the checks for
various errno values is inside a try/except which logs any exception).
I noticed that raise too... Could it then be EWOULDBLOCK, EAGAIN or EPERM?

                except socket.error, e:
                    if e.args[0] in (EWOULDBLOCK, EAGAIN):
                        self.numberAccepts = i
                        break
                    elif e.args[0] == EPERM:
                        # Netfilter on Linux may have rejected the
                        # connection, but we get told to try to accept()
                        # anyway.
                        continue

I am not sure how to debug this problem- I have another twisted server of a different type on that machine, and while the problematic server stops accepting connections, the second one works just fine, so this is not a machine-wide issue.
What could it be?
...
-----Original Message-----
From: twisted-python-bounces@twistedmatrix.com [mailto:twisted-python-
bounces@twistedmatrix.com] On Behalf Of Jean-Paul Calderone
Sent: Wednesday, September 10, 2008 1:44 PM
To: Twisted general discussion
Subject: Re: [Twisted-Python] intermittent problem: not accepting new
connections
...
I have had a twisted epoll server that was heavily used, such that it
saturated CPU (100% shown by "top", about 5000 connections, intense
message
relaying).
I am using twisted 2.5.0 that I patched for epoll bug.
It was run on python 2.4.4 , 2.6.11 kernel on a single core xeon 3.0
GHz
CPU. This server has been on for many months, and it has been rock-
stable.
A couple of days ago I migrated that server to a newer machine: same
...
twisted 2.5.0, same python 2.4.4, newer 2.6.24 kernel and a quad core
xeon
L5420 CPU.
CPU usage dropped from 100% to 30%, as expected, with the same rate of
client connections.
However the server now has the following intermittent problem: about
twice a
day, it stops accepting new connections for a short period of 5-10
minutes.
telnet times out, I get this:
root@serv2:/proc/net/netfilter# telnet localhost 5229
Trying 127.0.0.1...
Existing connections are not cut, they server receives/delivers
messages
to/from them just fine.
These short periods of not accepting connections do not correlate with
increased CPU load or with the overall number of connections to the
server.
I have had a problem with the same symptoms before, when a server
On Wed, 10 Sep 2008 13:32:06 -0700, Alec Matusis <matusis@yahoo.com>
wrote:
patched
process
...
run out of its quota of file descriptors.
However, there were clear messages in the twisted log at that time,
and
upping the ulimits solved the problem.
This time, there are no errors in ANY logs (twisted log.
/var/log/messages,
etc)
Do non-error log messages continue to appear in the Twisted log?  ie,
is
it clear that the logging system is still working, or could it have
failed
in some way, obscuring any exception reports?
...
I am out of ideas on what this could be, because my setup is exactly
the
...
same as I have been using in the last year, except for a faster CPU
and a
newer kernel?
I suspect that there are some new uncaught accept() exceptions in
internet/tcp.py in the part where it's looking for EMFILE, ENOBUFS,
ENFILE,
ENOMEM, ECONNABORTED errors.
Any new unhandled errno values should definitely result in an exception
being logged (notice that the `raise´ which follows the checks for
various errno values is inside a try/except which logs any exception).
Jean-Paul
_______________________________________________
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python