[Twisted-Python] LoopingCall.withCount countCallable called with 0
Hi all, I found an odd behavior using a LoopingCall.withCount with a very small interval (0.02 sec): sometimes the countCallable is called with 0 as argument and it looks like the now - _realLastTime is smaller than the looping interval. Here you can find the very simple code I use to reproduce the "anomaly", with a main loopingCall and another one that prints its internals https://gist.github.com/dvinella/31d94f9dd0e47586658d and here an extract of the logs it produces https://gist.github.com/dvinella/8cfdce0b9f5fdda7b375 The doc for the withCount states that the argument should be normally 1, higher if the reactor blocks for some reason, but it doesn't say anything about 0. The beahavior is quite misleading since the loop seems to anticipate the interval timeout, saying "I've been called, but I should have not been called" I must add that the anomaly doesn't appear right at the start, but after tens of seconds, mostly when the system is under medium/heavy load. So is it only a documentation issue for an exceptional case, due to timers imprecision, that should be handled at higher levels, or is it a real issue? dario
On Mar 24, 2015, at 2:49 AM, Dario Vinella <d.vinella@gmail.com> wrote:
Hi all,
I found an odd behavior using a LoopingCall.withCount with a very small interval (0.02 sec): sometimes the countCallable is called with 0 as argument and it looks like the now - _realLastTime is smaller than the looping interval.
So, this can definitely happen if time goes backwards, but it has to go backwards between the time that the reactor starts executing timed calls and the time that LoopingCall looks at the timestamp to compute what the value passed to withCount should be. If it were smaller than the LoopingCall's time-to-run
Here you can find the very simple code I use to reproduce the "anomaly", with a main loopingCall and another one that prints its internals
https://gist.github.com/dvinella/31d94f9dd0e47586658d <https://gist.github.com/dvinella/31d94f9dd0e47586658d>
An even more straightforward way to figure out if it's a problem with your machine would be this: import time from itertools import count then = time.time() for ever in count(): now = time.time() print("{},{}".format(ever, (now - then) * 10e6)) then = now It would be good to look at how many of these numbers are negative.
and here an extract of the logs it produces
https://gist.github.com/dvinella/8cfdce0b9f5fdda7b375 <https://gist.github.com/dvinella/8cfdce0b9f5fdda7b375>
The doc for the withCount states that the argument should be normally 1, higher if the reactor blocks for some reason, but it doesn't say anything about 0. The beahavior is quite misleading since the loop seems to anticipate the interval timeout, saying "I've been called, but I should have not been called"
That seems right. I think there's a real problem here, since physical clocks do just fail, and sometimes NTP slew has to happen. LoopingCall.withCount has a clear opportunity to notice that time has gone backwards, and can just skip that call. This is assuming that time has gone backwards by a tiny amount for a brief period of time, though, and does not address the more complex issue of "the clock was set some significant distance into the past". LoopingCall still isn't going to be able to deal with large amounts of clock drift, we'd need a monotonic version for that :).
I must add that the anomaly doesn't appear right at the start, but after tens of seconds, mostly when the system is under medium/heavy load.
Even seriously broken clocks don't go backwards a lot. ;-)
So is it only a documentation issue for an exceptional case, due to timers imprecision, that should be handled at higher levels, or is it a real issue?
I think it's a real issue. The point of withCount is supposed to be that it keeps track of how far into a sequence you should advance, for example, into the frames of an animation. If your callback advances by N frames each time it's called, it's supposed to be up-to-date with "real time". Assuming that the reason time is going backwards is that your ntp daemon is correcting your clock, every time withCount's callback gets called with 0, _realLastTime gets updated, which means that the "frame" we should have advanced is simply lost, and it will slowly fall behind real time each time this happens. So while the calls with 0 themselves are a minor annoyance, they are a symptom of a problem that makes it impossible for LoopingCall to do its job properly in some cases, and should definitely be fixed. Thanks for reporting! Can you open a ticket? -glyph
2015-03-24 23:18 GMT+01:00 Glyph Lefkowitz <glyph@twistedmatrix.com>:
It would be good to look at how many of these numbers are negative.
Not a single one, running the script for a couple of hours, under any condition
That seems right. I think there's a real problem here, since physical clocks do just fail, and sometimes NTP slew has to happen. LoopingCall.withCount has a clear opportunity to notice that time has gone backwards, and can just skip that call. This is assuming that time has gone backwards by a tiny amount for a brief period of time, though, and does not address the more complex issue of "the clock was set some significant distance into the past". LoopingCall still isn't going to be able to deal with large amounts of clock drift, we'd need a monotonic version for that :).
I must add that the anomaly doesn't appear right at the start, but after tens of seconds, mostly when the system is under medium/heavy load.
Even seriously broken clocks don't go backwards a *lot*. ;-)
I can reproduce the anomaly with my previous script on my linux machine with or without ntp (systemd-timesyncd) enabled and on a macbook (with kqreactor). The only difference is that since the mac is beefier than the linux box I need to stress it more to trigger the anomaly (a couple of vms + some heavy apps) I think the ntp cause can be excluded since the issue is reproducible without it and furthermore because once triggered the anomaly happens too frequently to think the ntpd applies so many (negative) fixes in such short time. Here it is a log where I print a dot when the calls arg is 1, the value if not 1, each dot is separated by 0.02s https://gist.github.com/dvinella/73e5e45a04c428f037bf
Thanks for reporting! Can you open a ticket?
Sure, I'll open it!
On Mar 26, 2015, at 2:06 AM, Dario Vinella <d.vinella@gmail.com> wrote:
Here it is a log where I print a dot when the calls arg is 1, the value if not 1, each dot is separated by 0.02s
https://gist.github.com/dvinella/73e5e45a04c428f037bf <https://gist.github.com/dvinella/73e5e45a04c428f037bf>
Thanks for reporting! Can you open a ticket?
Sure, I'll open it!
I just wanted to follow up with this - https://twistedmatrix.com/trac/ticket/7836 <https://twistedmatrix.com/trac/ticket/7836> is the ticket in question and it is now fixed on trunk, pending the next release. -glyph
Thank you glyph and moshez for working on this! 2015-05-09 22:04 GMT+02:00 Glyph <glyph@twistedmatrix.com>:
On Mar 26, 2015, at 2:06 AM, Dario Vinella <d.vinella@gmail.com> wrote:
Here it is a log where I print a dot when the calls arg is 1, the value if not 1, each dot is separated by 0.02s
https://gist.github.com/dvinella/73e5e45a04c428f037bf
Thanks for reporting! Can you open a ticket?
Sure, I'll open it!
I just wanted to follow up with this - https://twistedmatrix.com/trac/ticket/7836 is the ticket in question and it is now fixed on trunk, pending the next release.
-glyph
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
participants (3)
-
Dario Vinella
-
Glyph
-
Glyph Lefkowitz