[Twisted-Python] new epoll error after upgrading to 9.0.0
I upgraded to 9.0.0 and I am now seeing a new error, not present in 8.2.0 or earlier: 2010-02-10 17:38:33-0800 [TagProtocol,9794986,68.126.204.104] Unhandled Error Traceback (most recent call last): File "/usr/local/encap/python-2.6.4/lib/python2.6/site-packages/Twisted-9.0.0-py2 .6-linux-x86_64.egg/twisted/application/app.py", line 348, in runReactorWithLogging reactor.run() File "/usr/local/encap/python-2.6.4/lib/python2.6/site-packages/Twisted-9.0.0-py2 .6-linux-x86_64.egg/twisted/internet/base.py", line 1166, in run self.mainLoop() File "/usr/local/encap/python-2.6.4/lib/python2.6/site-packages/Twisted-9.0.0-py2 .6-linux-x86_64.egg/twisted/internet/base.py", line 1178, in mainLoop self.doIteration(t) File "/usr/local/encap/python-2.6.4/lib/python2.6/site-packages/Twisted-9.0.0-py2 .6-linux-x86_64.egg/twisted/internet/epollreactor.py", line 194, in doPoll log.callWithLogger(selectable, _drdw, selectable, fd, event) --- <exception caught here> --- File "/usr/local/encap/python-2.6.4/lib/python2.6/site-packages/Twisted-9.0.0-py2 .6-linux-x86_64.egg/twisted/python/log.py", line 84, in callWithLogger return callWithContext({"system": lp}, func, *args, **kw) File "/usr/local/encap/python-2.6.4/lib/python2.6/site-packages/Twisted-9.0.0-py2 .6-linux-x86_64.egg/twisted/python/log.py", line 69, in callWithContext return context.call({ILogContext: newCtx}, func, *args, **kw) File "/usr/local/encap/python-2.6.4/lib/python2.6/site-packages/Twisted-9.0.0-py2 .6-linux-x86_64.egg/twisted/python/context.py", line 59, in callWithContext return self.currentContext().callWithContext(ctx, func, *args, **kw) File "/usr/local/encap/python-2.6.4/lib/python2.6/site-packages/Twisted-9.0.0-py2 .6-linux-x86_64.egg/twisted/python/context.py", line 37, in callWithContext return func(*args,**kw) File "/usr/local/encap/python-2.6.4/lib/python2.6/site-packages/Twisted-9.0.0-py2 .6-linux-x86_64.egg/twisted/internet/epollreactor.py", line 223, in _doReadOrWrite self._disconnectSelectable(selectable, why, inRead) File "/usr/local/encap/python-2.6.4/lib/python2.6/site-packages/Twisted-9.0.0-py2 .6-linux-x86_64.egg/twisted/internet/posixbase.py", line 188, in _disconnectSelectable selectable.readConnectionLost(f) File "/usr/local/encap/python-2.6.4/lib/python2.6/site-packages/Twisted-9.0.0-py2 .6-linux-x86_64.egg/twisted/internet/tcp.py", line 508, in readConnectionLost self.connectionLost(reason) File "/usr/local/encap/python-2.6.4/lib/python2.6/site-packages/Twisted-9.0.0-py2 .6-linux-x86_64.egg/twisted/internet/tcp.py", line 513, in connectionLost abstract.FileDescriptor.connectionLost(self, reason) File "/usr/local/encap/python-2.6.4/lib/python2.6/site-packages/Twisted-9.0.0-py2 .6-linux-x86_64.egg/twisted/internet/abstract.py", line 67, in connectionLost self.stopWriting() File "/usr/local/encap/python-2.6.4/lib/python2.6/site-packages/Twisted-9.0.0-py2 .6-linux-x86_64.egg/twisted/internet/abstract.py", line 267, in stopWriting self.reactor.removeWriter(self) File "/usr/local/encap/python-2.6.4/lib/python2.6/site-packages/Twisted-9.0.0-py2 .6-linux-x86_64.egg/twisted/internet/epollreactor.py", line 145, in removeWriter self._remove(writer, self._writes, self._reads, self._selectables, _epoll.OUT, _epoll.IN) File "/usr/local/encap/python-2.6.4/lib/python2.6/site-packages/Twisted-9.0.0-py2 .6-linux-x86_64.egg/twisted/internet/epollreactor.py", line 131, in _remove self._poller._control(cmd, fd, flags) File "_epoll.pyx", line 125, in _epoll.epoll._control exceptions.IOError: [Errno 2] No such file or directory The error is highy intemittent and occurs only under high connection client rate. Any idea of what this could be?
On 12:16 am, matusis@yahoo.com wrote:
I upgraded to 9.0.0 and I am now seeing a new error, not present in 8.2.0 or earlier:
[snip] "/usr/local/encap/python-2.6.4/lib/python2.6/site- packages/Twisted-9.0.0-py2 .6-linux-x86_64.egg/twisted/internet/abstract.py", line 267, in stopWriting self.reactor.removeWriter(self) File "/usr/local/encap/python-2.6.4/lib/python2.6/site- packages/Twisted-9.0.0-py2 .6-linux-x86_64.egg/twisted/internet/epollreactor.py", line 145, in removeWriter self._remove(writer, self._writes, self._reads, self._selectables, _epoll.OUT, _epoll.IN) File "/usr/local/encap/python-2.6.4/lib/python2.6/site- packages/Twisted-9.0.0-py2 .6-linux-x86_64.egg/twisted/internet/epollreactor.py", line 131, in _remove self._poller._control(cmd, fd, flags) File "_epoll.pyx", line 125, in _epoll.epoll._control
exceptions.IOError: [Errno 2] No such file or directory
The error is highy intemittent and occurs only under high connection client rate. Any idea of what this could be?
Translating into English, a descriptor being monitored for writeability is being removed from the reactor, but epoll thinks it isn't being monitored in the first place. It seems likely this is caused by an attempt to double remove something. However, why that would happen will probably take a bit more digging. There was one direct change to epollreactor.py between 8.2 and 9.0: http://twistedmatrix.com/trac/changeset/26118#file1 It was to reactor shutdown code, though, so it seems like it probably isn't coming in to play in your case. A number of other indirect changes were made, though (eg to the epoll reactor's base classes or other support code it uses). It's conceivable one of these introduced the problem. One could also imagine that the problem existed all along, and one of the changes merely nudged some race condition and now it's going badly for your app. As far as suggestions for how to track this down go... Well, minimizing the example is always nice. ;) Aside from that, one idea that presents itself to me is to instrument the reactor to record addWriter/removeWriter events, and then log the complete stream of them for a particular writer when a double removeWriter is attempted. Initially you might just track that they happen, and use the result to confirm or reject the double removeWriter hypothesis. If it holds up, it might be useful to add stack recording, in order to see why things are happening. It may even be easy to implement this as a tiny reactor wrapper, which would make it easier to deploy and enable/disable. If this doesn't disrupt your production environment overly, it might be worth trying. Keep us updated. :) Jean-Paul
participants (2)
-
Alec Matusis
-
exarkun@twistedmatrix.com