[Twisted-Python] Epoll error after upgrading from 10.0 to 12.3
I had 60 busy tcp servers that ran on Python 2.6/Twisted 10.0 (or Twisted 9.0) for over two years with not problems. After I upgraded to Twisted 12.3/Python 2.7, I started getting the errors below (no application code changes). It took about two days for the first error to appear on a busy server under heavy load: [twisted.internet.protocol.ServerFactory] Unhandled Error Traceback (most recent call last): File "/usr/local/encap/Python-2.7.3/lib/python2.7/site-packages/twisted/python/lo g.py", line 73, in callWithCon text return context.call({ILogContext: newCtx}, func, *args, **kw) File "/usr/local/encap/Python-2.7.3/lib/python2.7/site-packages/twisted/python/co ntext.py", line 118, in callWi thContext return self.currentContext().callWithContext(ctx, func, *args, **kw) File "/usr/local/encap/Python-2.7.3/lib/python2.7/site-packages/twisted/python/co ntext.py", line 81, in callWit hContext return func(*args,**kw) File "/usr/local/encap/Python-2.7.3/lib/python2.7/site-packages/twisted/internet/ posixbase.py", line 614, in _d oReadOrWrite why = selectable.doRead() --- <exception caught here> --- File "/usr/local/encap/Python-2.7.3/lib/python2.7/site-packages/twisted/internet/ tcp.py", line 1069, in doRead transport = self.transport(skt, protocol, addr, self, s, self.reactor) File "/usr/local/encap/Python-2.7.3/lib/python2.7/site-packages/twisted/internet/ tcp.py", line 786, in __init__ self.startReading() File "/usr/local/encap/Python-2.7.3/lib/python2.7/site-packages/twisted/internet/ abstract.py", line 429, in sta rtReading self.reactor.addReader(self) File "/usr/local/encap/Python-2.7.3/lib/python2.7/site-packages/twisted/internet/ epollreactor.py", line 256, in addReader _epoll.EPOLLIN, _epoll.EPOLLOUT) File "/usr/local/encap/Python-2.7.3/lib/python2.7/site-packages/twisted/internet/ epollreactor.py", line 240, in _add self._poller.modify(fd, flags) exceptions.IOError: [Errno 2] No such file or directory After the error occurs, the server usually locks up and does not accept new connections. One server self-recovered after these errors after 30min outage, and started accepting new connections. Apparently I am not the ony one who has enountered this: http://stackoverflow.com/questions/12600137/twisted-internet-epollreactor-py -line-238-in-add Is there a patch available, or should I roll back to 10.0.0?
On 05:21 am, matusis@yahoo.com wrote:
I had 60 busy tcp servers that ran on Python 2.6/Twisted 10.0 (or Twisted 9.0) for over two years with not problems. After I upgraded to Twisted 12.3/Python 2.7, I started getting the errors below (no application code changes). It took about two days for the first error to appear on a busy server under heavy load:
[snip]
After the error occurs, the server usually locks up and does not accept new connections. One server self-recovered after these errors after 30min outage, and started accepting new connections. Apparently I am not the ony one who has enountered this: http://stackoverflow.com/questions/12600137/twisted-internet- epollreactor-py -line-238-in-add
Is there a patch available, or should I roll back to 10.0.0?
Doesn't look familiar to me. You should search the bug tracker. If there is a patch (or an svn branch), that's where you'll find it. If you don't find a ticket for the bug, you should report it. Please be sure to include enough information to reproduce the issue if you file a new ticket (or as much information as you have, at least). Jean-Paul
On Sun, Mar 3, 2013 at 1:21 PM, Alec Matusis <matusis@yahoo.com> wrote:
I had 60 busy tcp servers that ran on Python 2.6/Twisted 10.0 (or Twisted 9.0) for over two years with not problems. After I upgraded to Twisted 12.3/Python 2.7, I started getting the errors below (no application code changes). It took about two days for the first error to appear on a busy server under heavy load:
[twisted.internet.protocol.ServerFactory] Unhandled Error Traceback (most recent call last): File
"/usr/local/encap/Python-2.7.3/lib/python2.7/site-packages/twisted/python/lo g.py", line 73, in callWithCon text return context.call({ILogContext: newCtx}, func, *args, **kw) File
"/usr/local/encap/Python-2.7.3/lib/python2.7/site-packages/twisted/python/co ntext.py", line 118, in callWi thContext return self.currentContext().callWithContext(ctx, func, *args, **kw) File
"/usr/local/encap/Python-2.7.3/lib/python2.7/site-packages/twisted/python/co ntext.py", line 81, in callWit hContext return func(*args,**kw) File
"/usr/local/encap/Python-2.7.3/lib/python2.7/site-packages/twisted/internet/ posixbase.py", line 614, in _d oReadOrWrite why = selectable.doRead() --- <exception caught here> --- File
"/usr/local/encap/Python-2.7.3/lib/python2.7/site-packages/twisted/internet/ tcp.py", line 1069, in doRead transport = self.transport(skt, protocol, addr, self, s, self.reactor) File
"/usr/local/encap/Python-2.7.3/lib/python2.7/site-packages/twisted/internet/ tcp.py", line 786, in __init__ self.startReading() File
"/usr/local/encap/Python-2.7.3/lib/python2.7/site-packages/twisted/internet/ abstract.py", line 429, in sta rtReading self.reactor.addReader(self) File
"/usr/local/encap/Python-2.7.3/lib/python2.7/site-packages/twisted/internet/ epollreactor.py", line 256, in addReader _epoll.EPOLLIN, _epoll.EPOLLOUT) File
"/usr/local/encap/Python-2.7.3/lib/python2.7/site-packages/twisted/internet/ epollreactor.py", line 240, in _add self._poller.modify(fd, flags) exceptions.IOError: [Errno 2] No such file or directory
After the error occurs, the server usually locks up and does not accept new connections. One server self-recovered after these errors after 30min outage, and started accepting new connections. Apparently I am not the ony one who has enountered this:
http://stackoverflow.com/questions/12600137/twisted-internet-epollreactor-py -line-238-in-add
Is there a patch available, or should I roll back to 10.0.0?
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
Hi All I remembered twisted started using python own select.epoll instead of its own implementation after dropping support for python 2.5. Is it the reason? A few days ago, there was a post asking a similar question. Regards gelin yan
If you don't find a ticket for the bug, you should report it. Please be sure to include enough information to reproduce the issue if you file a new ticket (or as much information as you have, at least).
Jean-Paul
I filed http://twistedmatrix.com/trac/ticket/6346 It's hard to do any detailed analysis on a live production system, so I do not know how to reliably reproduce it, except that it spontaneously happened on two more additional servers last night. strace looks more or less the same as on a normal server, but given the volume each server has 100s connections per second) it's hard to parse strace output. I now believe that after the first error occurs the server remains overall functional and still accepts new connections, but a growing number of existing connections becomes affected, and the logs become flooded with this error. Since this is a production system I have to revert to 10.0.0 now. -----Original Message----- From: twisted-python-bounces@twistedmatrix.com [mailto:twisted-python-bounces@twistedmatrix.com] On Behalf Of exarkun@twistedmatrix.com Sent: Saturday, March 02, 2013 10:06 PM To: twisted-python@twistedmatrix.com Subject: Re: [Twisted-Python] Epoll error after upgrading from 10.0 to 12.3 On 05:21 am, matusis@yahoo.com wrote:
I had 60 busy tcp servers that ran on Python 2.6/Twisted 10.0 (or Twisted 9.0) for over two years with not problems. After I upgraded to Twisted 12.3/Python 2.7, I started getting the errors below (no application code changes). It took about two days for the first error to appear on a busy server under heavy load:
[snip]
After the error occurs, the server usually locks up and does not accept new connections. One server self-recovered after these errors after 30min outage, and started accepting new connections. Apparently I am not the ony one who has enountered this: http://stackoverflow.com/questions/12600137/twisted-internet- epollreactor-py -line-238-in-add
Is there a patch available, or should I roll back to 10.0.0?
Doesn't look familiar to me. You should search the bug tracker. If there is a patch (or an svn branch), that's where you'll find it. If you don't find a ticket for the bug, you should report it. Please be sure to include enough information to reproduce the issue if you file a new ticket (or as much information as you have, at least). Jean-Paul _______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
On 02:00 am, matusis@yahoo.com wrote:
If you don't find a ticket for the bug, you should report it. Please be sure to include enough information to reproduce the issue if you file a new ticket (or as much information as you have, at least). Jean-Paul
I filed http://twistedmatrix.com/trac/ticket/6346
It's hard to do any detailed analysis on a live production system, so I do not know how to reliably reproduce it, except that it spontaneously happened on two more additional servers last night. strace looks more or less the same as on a normal server, but given the volume each server has 100s connections per second) it's hard to parse strace output. I now believe that after the first error occurs the server remains overall functional and still accepts new connections, but a growing number of existing connections becomes affected, and the logs become flooded with this error.
Since this is a production system I have to revert to 10.0.0 now.
Do you have or can you set up any kind of staging system and run a synthetic load against it? Jean-Paul
On Mar 3, 2013, at 6:00 PM, Alec Matusis <matusis@yahoo.com> wrote:
Since this is a production system I have to revert to 10.0.0 now.
If you were able to attempt an upgrade to 12.3.0, would it be possible for you to upgrade more incrementally; 10.1, 10.2, 11.0, until you find the version that introduced the problem that you're having? It would really help for us to narrow down when this showed up. -glyph
participants (4)
-
Alec Matusis
-
exarkun@twistedmatrix.com
-
Gelin Yan
-
Glyph