[Twisted-Python] doPoll timeout problems
![](https://secure.gravatar.com/avatar/cf223b7cf77583c0a2665bad01f84f11.jpg?s=120&d=mm&r=g)
We are seeing a problem with poll being called very often and think that the problem is in the timeout calculation. The code look like this: def doPoll(self, timeout): """Poll the poller for new events.""" if timeout is not None: timeout = int(timeout * 1000) # convert seconds to milliseconds What this does is round down the timeout. We are experimenting with this: def doPoll(self, timeout): """Poll the poller for new events.""" if timeout is not None: timeout = int(math.ceil(timeout * 1000)) # convert seconds to milliseconds This rounds up the timeout. With this we see that the poll() is called far less often. The reason being that if the timeout is 0.0009 the code uses 0ms timeout. And if your CPU is fast enough it will do many 0ms timeout calls to poll before the timer is removed from the _pendingTimedCalls list. And if the timeout is 0.0019 the time is 1ms and when the _pendingTimedCalls is check its first element has not expired yet. So keep doing calls with 1ms. Typically we see that the first element in the _pendingTimedCalls is a call to _updateLogDateTime in HTTPFactory. We use a HTTPFactory many times and also NevowSite. That causes a callLater for _updateLogDateTime for each factory. These _updateLogDateTime call seems to be a lot of complexity for no benefit. After all time.time() is very fast on Linux, why cache the log time strings? Especially when in our app we never write an access log file. We are working on stubbing out all that code for our app to bring our performance back up compared to old twisted 2.0. Barry
![](https://secure.gravatar.com/avatar/82bbf56e965eb9c44f5f8a40c6958d77.jpg?s=120&d=mm&r=g)
Scott, Barry wrote:
We are seeing a problem with poll being called very often and think that the problem is in the timeout calculation.
I think you're right and that changing that line to use ceil() is a good fix. It does mean that some timeouts might happen up to a millisecond later than they would otherwise, but I think that's better than having the reactor spin-wait to consume time. This probably only happens with the poll() reactor, because all the other APIs seem to have higher resolution timeouts --- select, epoll, kqueue, ppoll all have microsecond or even nanosecond timeout parameters. If there is code that actually needs sub-millisecond timeout resolution it might be possible for Twisted to implement that using setitimer() or similar, but that seems like a lot of work to support an exotic use case that could be handled more efficiently by switching to epoll etc. I checked the history of that line of code and it dates back all the way to the first poll()-based event loop implementation in 2001. Perhaps computers were slow enough back then that a call to poll() would take most of a millisecond :) but in any case the rounding-down behavior doesn't seem to have been explicitly chosen.
Perhaps it is not the time() call but the cost of converting a time value to a string that is being avoided here? Sometimes the calendar/date calculations are expensive. It seems to me that HTTPFactory could be implemented more efficiently by only caching the string on demand, and then setting a timer to clear the cache (reset _logDateTime to None) at the next 1-second mark. On a heavily-loaded server, that would have the same properties as the current code; but on a lightly-loaded server it would avoid running that callback unneccessarily. And overall I don't think it's any more complicated than the current implementation. Wim.
![](https://secure.gravatar.com/avatar/e1554622707bedd9202884900430b838.jpg?s=120&d=mm&r=g)
I believe it dates back to a time when syscalls were surprisingly expensive, and gettimeofday() or equivalent was still a syscall. I remember looking at profiles Back In The Day where getting the current time dominated. It certainly predates the zero-overhead VDSO method available today.
It seems to me that HTTPFactory could be implemented more efficiently by only caching the string on demand, and then setting a timer to clear the cache (reset _logDateTime to None) at the next 1-second mark. On a heavily-loaded server, that would have the same properties as the current code; but on a lightly-loaded server it would avoid running that callback unneccessarily. And overall I don't think it's any more complicated than the current implementation.
The current algorithm is really just silly. In the absence of a compelling benchmark where it can be shown to hurt performance (we should check with the ones on speed.twistedmatrix.com), I think we could forego the caching entirely, and just do the simple thing where the string is computed as needed. -glyph
![](https://secure.gravatar.com/avatar/82bbf56e965eb9c44f5f8a40c6958d77.jpg?s=120&d=mm&r=g)
On Wed Oct 17 08:56:46 MDT 2018, Barry Scott wrote:
Do you want a pull request for this fix or will you make this small change?
Twisted's contribution process is kind of rigid: every PR needs an accompanying Trac ticket. If you file Trac tickets for these two issues I'll make the PRs (or you can if you like — I think Github even has a way to make small PRs entirely through the web interface now). https://twistedmatrix.com/trac/wiki/TwistedDevelopment#SubmittingaPatch I'd like to hear other users of the poll reactor weigh in on the "ceil" change, though. It seems straightforwardly correct to me, but it also seems like the kind of small change that could have a subtle unexpected downside.
![](https://secure.gravatar.com/avatar/82bbf56e965eb9c44f5f8a40c6958d77.jpg?s=120&d=mm&r=g)
Scott, Barry wrote:
We are seeing a problem with poll being called very often and think that the problem is in the timeout calculation.
I think you're right and that changing that line to use ceil() is a good fix. It does mean that some timeouts might happen up to a millisecond later than they would otherwise, but I think that's better than having the reactor spin-wait to consume time. This probably only happens with the poll() reactor, because all the other APIs seem to have higher resolution timeouts --- select, epoll, kqueue, ppoll all have microsecond or even nanosecond timeout parameters. If there is code that actually needs sub-millisecond timeout resolution it might be possible for Twisted to implement that using setitimer() or similar, but that seems like a lot of work to support an exotic use case that could be handled more efficiently by switching to epoll etc. I checked the history of that line of code and it dates back all the way to the first poll()-based event loop implementation in 2001. Perhaps computers were slow enough back then that a call to poll() would take most of a millisecond :) but in any case the rounding-down behavior doesn't seem to have been explicitly chosen.
Perhaps it is not the time() call but the cost of converting a time value to a string that is being avoided here? Sometimes the calendar/date calculations are expensive. It seems to me that HTTPFactory could be implemented more efficiently by only caching the string on demand, and then setting a timer to clear the cache (reset _logDateTime to None) at the next 1-second mark. On a heavily-loaded server, that would have the same properties as the current code; but on a lightly-loaded server it would avoid running that callback unneccessarily. And overall I don't think it's any more complicated than the current implementation. Wim.
![](https://secure.gravatar.com/avatar/e1554622707bedd9202884900430b838.jpg?s=120&d=mm&r=g)
I believe it dates back to a time when syscalls were surprisingly expensive, and gettimeofday() or equivalent was still a syscall. I remember looking at profiles Back In The Day where getting the current time dominated. It certainly predates the zero-overhead VDSO method available today.
It seems to me that HTTPFactory could be implemented more efficiently by only caching the string on demand, and then setting a timer to clear the cache (reset _logDateTime to None) at the next 1-second mark. On a heavily-loaded server, that would have the same properties as the current code; but on a lightly-loaded server it would avoid running that callback unneccessarily. And overall I don't think it's any more complicated than the current implementation.
The current algorithm is really just silly. In the absence of a compelling benchmark where it can be shown to hurt performance (we should check with the ones on speed.twistedmatrix.com), I think we could forego the caching entirely, and just do the simple thing where the string is computed as needed. -glyph
![](https://secure.gravatar.com/avatar/82bbf56e965eb9c44f5f8a40c6958d77.jpg?s=120&d=mm&r=g)
On Wed Oct 17 08:56:46 MDT 2018, Barry Scott wrote:
Do you want a pull request for this fix or will you make this small change?
Twisted's contribution process is kind of rigid: every PR needs an accompanying Trac ticket. If you file Trac tickets for these two issues I'll make the PRs (or you can if you like — I think Github even has a way to make small PRs entirely through the web interface now). https://twistedmatrix.com/trac/wiki/TwistedDevelopment#SubmittingaPatch I'd like to hear other users of the poll reactor weigh in on the "ceil" change, though. It seems straightforwardly correct to me, but it also seems like the kind of small change that could have a subtle unexpected downside.
participants (4)
-
Barry Scott
-
Glyph
-
Scott, Barry
-
Wim Lewis