xmlrpc resource file descriptor leak

This is a bit vague, and I wanted to get some feedback before I submit a ticket. We have a long-running twisted / nevow process that basically has: root \- RPC2 - a twisted.web.xmlrpc.XMLRPC sub-class \- ui - nevow pages The thing hung up over the weekend with "too many open file descriptors" and before I killed it I did an "lsof"; lots of the files were: python25 20163 nsg 31u REG 253,0 370 3276854 /tmp/tmp5QJivu (deleted) ...and "cat /proc/20163/fd/31" shows: <?xml version='1.0'?> <methodCall> <methodName>classify_maclist</methodName> <params> <param> <value><string>HORPROD</string></value> </param> <param> <value><array><data> <value><string>xxxx</string></value> </data></array></value> </param> <param> <value><int>-1</int></value> </param> <param> <value><int>5</int></value> </param> </params> </methodCall> ...which is an XMLRPC call from a Zope server on another machine to this process. I presume the t.w.http.Request content is getting written to a tempfile, but I can't understand why - the Content-Length is tiny (<400 bytes). I can't seem to reproduce this in a sample application though; does anyone have any ideas how I can narrow down the problem?

Phil Mayers wrote:
This is a bit vague, and I wanted to get some feedback before I submit a ticket.
We have a long-running twisted / nevow process that basically has:
root \- RPC2 - a twisted.web.xmlrpc.XMLRPC sub-class \- ui - nevow pages
The thing hung up over the weekend with "too many open file descriptors" and before I killed it I did an "lsof"; lots of the files were:
python25 20163 nsg 31u REG 253,0 370 3276854 /tmp/tmp5QJivu (deleted)
...and "cat /proc/20163/fd/31" shows:
<?xml version='1.0'?> <methodCall> <methodName>classify_maclist</methodName>
<snip>
...which is an XMLRPC call from a Zope server on another machine to this process. I presume the t.w.http.Request content is getting written to a tempfile, but I can't understand why - the Content-Length is tiny (<400 bytes).
Ignore this. The underlying cause seemed to be an unrelated issue; this issue was prevent Python from gc-ing the http.Request objects and thus closing the open file descriptors. I still don't know why the content was written to a temp file when it was so short, but it may be because I'm using HTTPS rather than HTTP. Sorry for the noise.

Hi suffering form the same problem a year ago or so, I dug into this by following the call chain and cgi.py is the source of the 'too many fd' problem. For an explanation read the comment starting at line 417 in cgi.py which reads: The class is subclassable, mostly for the purpose of overriding the make_file() method, which is called internally to come up with a file open for reading and writing. This makes it possible to override the default choice of storing all files in a temporary directory and unlinking them as soon as they have been opened. The trick which is used here is the fact, that an fd hangs around for some time even if the fd in question was unlinked. It takes some time for the OS to collect all those unlinked fds, but they will be collected eventually. The number of fds allowed per process when using cgi.py (used by twisted) depends on the burst rate of requests, because every request has per default a FieldStorage and therefore an fd. The only solution is to up the number of allowed fds per process/per machine and depends on the OS: MS Windows: if CRT is used, hardcoded to 2048 else limited by mem On **ixes use ''ulimit -a' or 'sysctl -a | grep files' to get a printout the system value, usually something along kern.maxfiles=10000 Per machine: /etc/sysctl.conf contains the values for the kernel preset when booting. Per process: /etc/login.conf contains usually a variable called openfiles-max On my OpenBSD production system (avg load 30 req/sec) values are kern.maxfiles=10000 openfiles-max=8192 openfiles-cur=8192 which allows smooth operation of two twisted processes on a dual core machine. HTH, Werner FYI the output of top: load averages: 0.34, 0.31, 0.31 08:55:01 31 processes: 1 running, 29 idle, 1 on processor CPU0 states: 10.8% user, 0.0% nice, 2.6% system, 0.0% interrupt, 86.6% idle CPU1 states: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle Memory: Real: 325M/608M act/tot Free: 2913M Swap: 0K/4096M used/tot PID UNAME PRI NICE SIZE RES STATE WAIT TIME CPU COMMAND 4562 www 2 0 125M 97M sleep/0 poll 242:39 11.82% python2.5 6506 www 2 0 205M 181M run/0 - 34:20 2.00% python2.5 Phil Mayers wrote:
This is a bit vague, and I wanted to get some feedback before I submit a ticket.
We have a long-running twisted / nevow process that basically has:
root \- RPC2 - a twisted.web.xmlrpc.XMLRPC sub-class \- ui - nevow pages
The thing hung up over the weekend with "too many open file descriptors" and before I killed it I did an "lsof"; lots of the files were:
python25 20163 nsg 31u REG 253,0 370 3276854 /tmp/tmp5QJivu (deleted)
...and "cat /proc/20163/fd/31" shows:
<?xml version='1.0'?> <methodCall> <methodName>classify_maclist</methodName> <params> <param> <value><string>HORPROD</string></value> </param> <param> <value><array><data> <value><string>xxxx</string></value> </data></array></value> </param> <param> <value><int>-1</int></value> </param> <param> <value><int>5</int></value> </param> </params> </methodCall>
...which is an XMLRPC call from a Zope server on another machine to this process. I presume the t.w.http.Request content is getting written to a tempfile, but I can't understand why - the Content-Length is tiny (<400 bytes).
I can't seem to reproduce this in a sample application though; does anyone have any ideas how I can narrow down the problem?
_______________________________________________ Twisted-web mailing list Twisted-web@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-web
participants (2)
-
Phil Mayers
-
Werner Thie