Mailman 3 [Twisted-Python] carbon-relay eating CPU - EAGAIN (Resource temporarily unavailable)? - Twisted

25 Nov 2013

      - Python 2.7.3
 - [twisted, version 13.1.0]
 - xen-domU

`atop` shows that `carbon-relay` is eating 80, 90% USRCPU. From the
`strace`:

    accept(7, {sa_family=AF_INET, sin_port=htons(60649),
sin_addr=inet_addr("192.237.222.81")}, [16]) = 257
    accept(7, {sa_family=AF_INET, sin_port=htons(51564),
sin_addr=inet_addr("166.78.1.48")}, [16]) = 257
    accept(7, 0x7ffff4679550, [16])         = -1 EAGAIN (Resource
temporarily unavailable)
    accept(7, {sa_family=AF_INET, sin_port=htons(33654),
sin_addr=inet_addr("198.61.194.248")}, [16]) = 257
    accept(7, {sa_family=AF_INET, sin_port=htons(50037),
sin_addr=inet_addr("166.78.181.204")}, [16]) = 257
    accept(7, 0x7ffff4679550, [16])         = -1 EAGAIN (Resource
temporarily unavailable)

The strange thing is: even restart the service, it seems stuck at fd 7
everytime running `strace`. Does it mean this fd is not being cleanup
properly?

I have increased the number of open files:

**/proc/2891/limits**

    Limit                     Soft Limit           Hard Limit
Units
    Max cpu time              unlimited            unlimited
 seconds
    Max file size             unlimited            unlimited
 bytes
    Max data size             unlimited            unlimited
 bytes
    Max stack size            8388608              unlimited
 bytes
    Max core file size        0                    unlimited
 bytes
    Max resident set          unlimited            unlimited
 bytes
    Max processes             15834                15834
 processes
    Max open files            16384                16384
 files
    Max locked memory         65536                65536
 bytes
    Max address space         unlimited            unlimited
 bytes
    Max file locks            unlimited            unlimited
 locks
    Max pending signals       15834                15834
 signals
    Max msgqueue size         819200               819200
bytes
    Max nice priority         0                    0
    Max realtime priority     0                    0
    Max realtime timeout      unlimited            unlimited            us

then it decreases down to ~ 50%.

My problem looks like similar to this [thread](
http://twistedmatrix.com/pipermail/twisted-python/2008-September/018361.html)
but since we have a few sockets in TIME_WAIT state, I don't think that
enable the `tw_recycle` can help. About the `tcp_syncookies`, I don't see
any related message in the syslog.

This is what I get when trying to start `carbon-relay` in debug mode:

    26/11/2013 02:22:14 :: [listener] MetricPickleReceiver connection with
50.56.249.127:48772 lost: Connection to the other side was lost in a
non-clean fashion: Connection lost.
    26/11/2013 02:22:14 :: [listener] MetricPickleReceiver connection with
198.101.241.101:50672 lost: Connection to the other side was lost in a
non-clean fashion: Connection lost.
    26/11/2013 02:22:14 :: [listener] MetricPickleReceiver connection with
166.78.2.167:43346 lost: Connection to the other side was lost in a
non-clean fashion: Connection lost.

This is from `twisted`:

    class ConnectionLost(ConnectionClosed):
        """Connection to the other side was lost in a non-clean fashion"""

        def __str__(self):
            s = self.__doc__.strip().splitlines()[0]
            if self.args:
                s = '%s: %s' % (s, ' '.join(self.args))
            s = '%s.' % s
            return s

I also have tried to [debug with `gdb`](
https://wiki.python.org/moin/DebuggingWithGdb) but `pystack` returns
nothing.

[Twisted-Python] carbon-relay eating CPU - EAGAIN (Resource temporarily unavailable)?

Quan Tong Anh

tags

participants (3)