[Twisted-Python] a question about monotonic clock

Hi All I am quite concerned the ticket 2424 due to our system use reactor.callLater almost anywhere. I know this issue is hard to be finxed otherwise It wouldn't exist for such a long time. So any work around before we getting fix it? The only way I can do is to disable NTP or stop process, sync the time manually and then restart the process. The latter way is hard to be accepted due to we have many machines. Regards gelin yan

On 05:57 am, dynamicgl@gmail.com wrote:
NTP does not cause problems with scheduling. NTP gradually slews the system clock - it does not introduce discontinuities (either forward or backward), it changes the rate at which time passes until the system clock agrees with the external clock. If you have systems configured to have their system clocks jump (`ntpdate` is sometimes used for this), fix them to not be configured this way. #2424 is primarily about user-initiated events, primarily on desktop machines - a user changing the system time, a user suspending the machine (and later unsuspending it). There's little or no reason for problems related to #2424 to ever come up on a properly maintained server. Jean-Paul

On Sun, Oct 28, 2012 at 1:28 PM, <exarkun@twistedmatrix.com> wrote:
I think that for *certain* uncommon types of applications, even the very minor skewing of ntp can cause problems, but I wonder if gelin yan has encountered real problems caused by the ntp skewing in the application under discussion. Gelin, you did not describe what problem you're actually having. If you would, that would benefit the continuation of the discussion. -- Christopher Armstrong http://radix.twistedmatrix.com/ http://planet-if.com/

On Sun, Oct 28, 2012 at 5:22 PM, Christopher Armstrong <radix@twistedmatrix.com> wrote:
I think that for *certain* uncommon types of applications, even the very minor skewing of ntp can cause problems, but I wonder if gelin yan has
I'm having trouble imagining such an application. In particular, if the application is sensitive to such minor fluctuations in the time source, I don't see how it could operate on commodity hardware at all; such fluctuations are present regardless of whether ntp is slewing the clock or not. You would need to use a separate hardware time source that is more reliable, at which point ntp is essentially out of the picture. -- mithrandi, i Ainil en-Balandor, a faer Ambar

On Sun, Oct 28, 2012 at 4:45 PM, Tristan Seligmann <mithrandi@mithrandi.net>wrote:
I'm not speaking from experience, admittedly. How big exactly are the steps in NTP skewing? I'm remembering VOIP applications (or anything else with low-latency streaming or real-time gaming or something like that), where you can have timed intervals of ~20ms, and if you miss one, you drop packets and lower the quality of the audio stream. In a case like that, using a monotonic time source seems like it would be a good decision. That's why Twisted should provide an API for scheduling calls based on one, if possible (that doesn't seem like a contentious point to me; just the general applicability of such a scheduling mechanism). -- Christopher Armstrong http://radix.twistedmatrix.com/ http://planet-if.com/

On Oct 28, 2012, at 9:01 AM, Christopher Armstrong <radix@twistedmatrix.com> wrote:
I'm not speaking from experience, admittedly. How big exactly are the steps in NTP skewing?
There are two things NTP can do: stepping and slewing. (Skewing is not one of them.) If you're stepping, the steps can be arbitrarily large. This is what ntpdate does. If you're slewing, there are no steps. This is what ntpd does. The frequency of your clock is just adjusted up or down by a small (configurable) amount. Generally not enough to affect the pitch or network latency of 20ms sound sampling. In fact, it would generally help, not hurt, because the only reason ntp would be issuing a slew is that your clock is faster or slower than real time anyway. PEP 418 <http://www.python.org/dev/peps/pep-0418/> covers this stuff in a lot of detail; especially the glossary. -glyph

This depends on how you're running ntpd. If you have "-x" on the command line, yes - ntpd will not step. If not, there are circumstances it will step - clock diffs in excess of 128ms iirc? Who knows what newer implementations like chrony or openntpd do! Glyph <glyph@twistedmatrix.com> wrote:
-- Sent from my mobile device, please excuse brevity and typos.

On Mon, Oct 29, 2012 at 12:05 AM, Phil Mayers <p.mayers@imperial.ac.uk> wrote:
If an offset of 128ms occurs at any time other than initial ntpd startup (which will presumably occur at system startup), that means you've either experienced a significant period of time without connectivity to time servers[1], you have a hardware / kernel issue that should be resolved, or some other software on the system is messing with the clock. Aside from network issues, the other possibilities are all serious issues that should be corrected, not tolerated as a normal situation.
Who knows what newer implementations like chrony or openntpd do!
If they're doing something silly, then maybe you shouldn't use them. [1] Well, haha, just kidding; if this happens, then ntpd will remove the servers as being unreachable, and terminate once they have all been removed, thus resulting in the need to restart ntpd... -- mithrandi, i Ainil en-Balandor, a faer Ambar

On 10/28/2012 10:16 PM, Tristan Seligmann wrote:
Sadly, this is not the case. As has already been pointed out, virtual machine clocks can undergo stepping in "normal" oepration. A specfic example: if a VMWare installation performs live migration of a host. We often see: ntpd[1793]: time reset +0.263757 s ...when this happens. This can occur several times a day, as we're running vCenter-controlled auto-migration - a very common setup. This is forward stepping of course, so is relatively harmless (backward is a pain).
Who knows what newer implementations like chrony or openntpd do!
If they're doing something silly, then maybe you shouldn't use them.
I have no reason to suspect they are doing somthing silly. I merely point out that there are other implmentations than the common ntpd, and that I don't know if they step or not. In fact, a little research suggests that chrony has *better* behaviour w.r.t stepping than ntpd: http://lists.fedoraproject.org/pipermail/devel/2010-May/135679.html ...which is nice. I sense a bit of defensiveness to this reply, TBH. Maybe I'm imagniing it, but if so that's unncessary. I don't hold a strong position about Twisted having a monotonic clock. When the original ticket was discussed here months ago, I was quite alarmed because the symptoms sounded dire. Further discussion clarified what the issues were, and I decided they weren't significant (for us). Maybe "Twisted doesn't need a monotonic clock" is the right reply, but it would be wrong to base that on the assumpton that "ntpd doesn't step" - that's all I was trying to say. Cheers, Phil

On Mon, Oct 29, 2012 at 10:33 AM, Phil Mayers <p.mayers@imperial.ac.uk> wrote:
My original reply was perhaps a bit... exclusionary. But I think the point I was trying to make is still valid; if your VM is suspended for 250ms due to a migration (or CPU throttling, or ...) then it is impossible to maintain a consistent stream of time events because there isn't any code executing at all. Whether the clock stalls, or is jumped forward 250ms, you still have an unavoidable problem as far as time-keeping is concerned.
Who knows what newer implementations like chrony or openntpd do!
If they're doing something silly, then maybe you shouldn't use them.
I didn't intend it that way, but I can see how what I said could be read in a hostile / defensive tone. Perhaps this would be a better phrasing: If chrony or openntpd are doing something sensible, then there shouldn't be any serious issues with using them; if they're doing something that seriously destabilizes the system clock, then you probably shouldn't be using them, at least not if you want reliable timekeeping. I'm not familiar enough with either of them to know which case applies, so I didn't intend to come across as bashing chrony and/or openntpd.
I don't think anyone in this thread is arguing *against* implementing this functionality; I think the point was just that this functionality is only of critical importance under a limited range of circumstances, as opposed to being something of urgent need for every Twisted-using program that needs to schedule timed events. -- mithrandi, i Ainil en-Balandor, a faer Ambar

On Oct 29, 2012, at 2:59 AM, Tristan Seligmann <mithrandi@mithrandi.net> wrote:
Yes. Even if the reasons cited in this thread are not 100% correct, the fact is that many thousands of Twisted servers that depend intimately on timed events are deployed, even on vmware-hosted virtual machine infrastructures, without experiencing difficulties related to #2424. That said, it would be great if someone could implement a fix for that issue, as it's just one more thing that system operators need to be aware of and keep track of. It would be really neat if you could totally screw up your timekeeping but have Twisted applications keep running reliably regardless. -glyph

On 29/10/12 09:59, Tristan Seligmann wrote:
I think this is a pretty good summary; it's rare to need this (though often essential when you do) and it's also hard. TBH I'm not sure there are any sensible semantics for a lot of the cases - the best the framework can do is give the app an option for clocks, and try to give them as much info as possible about how they've advanced or not.

On Oct 29, 2012, at 10:52 AM, Phil Mayers <p.mayers@imperial.ac.uk> wrote:
Right; callLater is not quite expressive enough. I wouldn't want to expose the whole mess of clock nonsense to every application, but it would be necessary to split callLater into "callAfter" (delay a certain number of seconds from "now", to within a best guess, regardless of clock changes) and "callAt" (call as close as possible to a certain calendar time, respecting clock changes). -glyph

On Sun, Oct 28, 2012 at 11:22 PM, Christopher Armstrong < radix@twistedmatrix.com> wrote:
Hi Christopher I didn't encounter any problem with twisted so far but it happened a few years ago when I deployed the system (based on C#/C++) on windows. Sometimes, the timer stopped running. Bugs triggered by this issue are not really easy to be detected. That is why I am concerned.

On 05:57 am, dynamicgl@gmail.com wrote:
NTP does not cause problems with scheduling. NTP gradually slews the system clock - it does not introduce discontinuities (either forward or backward), it changes the rate at which time passes until the system clock agrees with the external clock. If you have systems configured to have their system clocks jump (`ntpdate` is sometimes used for this), fix them to not be configured this way. #2424 is primarily about user-initiated events, primarily on desktop machines - a user changing the system time, a user suspending the machine (and later unsuspending it). There's little or no reason for problems related to #2424 to ever come up on a properly maintained server. Jean-Paul

On Sun, Oct 28, 2012 at 1:28 PM, <exarkun@twistedmatrix.com> wrote:
I think that for *certain* uncommon types of applications, even the very minor skewing of ntp can cause problems, but I wonder if gelin yan has encountered real problems caused by the ntp skewing in the application under discussion. Gelin, you did not describe what problem you're actually having. If you would, that would benefit the continuation of the discussion. -- Christopher Armstrong http://radix.twistedmatrix.com/ http://planet-if.com/

On Sun, Oct 28, 2012 at 5:22 PM, Christopher Armstrong <radix@twistedmatrix.com> wrote:
I think that for *certain* uncommon types of applications, even the very minor skewing of ntp can cause problems, but I wonder if gelin yan has
I'm having trouble imagining such an application. In particular, if the application is sensitive to such minor fluctuations in the time source, I don't see how it could operate on commodity hardware at all; such fluctuations are present regardless of whether ntp is slewing the clock or not. You would need to use a separate hardware time source that is more reliable, at which point ntp is essentially out of the picture. -- mithrandi, i Ainil en-Balandor, a faer Ambar

On Sun, Oct 28, 2012 at 4:45 PM, Tristan Seligmann <mithrandi@mithrandi.net>wrote:
I'm not speaking from experience, admittedly. How big exactly are the steps in NTP skewing? I'm remembering VOIP applications (or anything else with low-latency streaming or real-time gaming or something like that), where you can have timed intervals of ~20ms, and if you miss one, you drop packets and lower the quality of the audio stream. In a case like that, using a monotonic time source seems like it would be a good decision. That's why Twisted should provide an API for scheduling calls based on one, if possible (that doesn't seem like a contentious point to me; just the general applicability of such a scheduling mechanism). -- Christopher Armstrong http://radix.twistedmatrix.com/ http://planet-if.com/

On Oct 28, 2012, at 9:01 AM, Christopher Armstrong <radix@twistedmatrix.com> wrote:
I'm not speaking from experience, admittedly. How big exactly are the steps in NTP skewing?
There are two things NTP can do: stepping and slewing. (Skewing is not one of them.) If you're stepping, the steps can be arbitrarily large. This is what ntpdate does. If you're slewing, there are no steps. This is what ntpd does. The frequency of your clock is just adjusted up or down by a small (configurable) amount. Generally not enough to affect the pitch or network latency of 20ms sound sampling. In fact, it would generally help, not hurt, because the only reason ntp would be issuing a slew is that your clock is faster or slower than real time anyway. PEP 418 <http://www.python.org/dev/peps/pep-0418/> covers this stuff in a lot of detail; especially the glossary. -glyph

This depends on how you're running ntpd. If you have "-x" on the command line, yes - ntpd will not step. If not, there are circumstances it will step - clock diffs in excess of 128ms iirc? Who knows what newer implementations like chrony or openntpd do! Glyph <glyph@twistedmatrix.com> wrote:
-- Sent from my mobile device, please excuse brevity and typos.

On Mon, Oct 29, 2012 at 12:05 AM, Phil Mayers <p.mayers@imperial.ac.uk> wrote:
If an offset of 128ms occurs at any time other than initial ntpd startup (which will presumably occur at system startup), that means you've either experienced a significant period of time without connectivity to time servers[1], you have a hardware / kernel issue that should be resolved, or some other software on the system is messing with the clock. Aside from network issues, the other possibilities are all serious issues that should be corrected, not tolerated as a normal situation.
Who knows what newer implementations like chrony or openntpd do!
If they're doing something silly, then maybe you shouldn't use them. [1] Well, haha, just kidding; if this happens, then ntpd will remove the servers as being unreachable, and terminate once they have all been removed, thus resulting in the need to restart ntpd... -- mithrandi, i Ainil en-Balandor, a faer Ambar

On 10/28/2012 10:16 PM, Tristan Seligmann wrote:
Sadly, this is not the case. As has already been pointed out, virtual machine clocks can undergo stepping in "normal" oepration. A specfic example: if a VMWare installation performs live migration of a host. We often see: ntpd[1793]: time reset +0.263757 s ...when this happens. This can occur several times a day, as we're running vCenter-controlled auto-migration - a very common setup. This is forward stepping of course, so is relatively harmless (backward is a pain).
Who knows what newer implementations like chrony or openntpd do!
If they're doing something silly, then maybe you shouldn't use them.
I have no reason to suspect they are doing somthing silly. I merely point out that there are other implmentations than the common ntpd, and that I don't know if they step or not. In fact, a little research suggests that chrony has *better* behaviour w.r.t stepping than ntpd: http://lists.fedoraproject.org/pipermail/devel/2010-May/135679.html ...which is nice. I sense a bit of defensiveness to this reply, TBH. Maybe I'm imagniing it, but if so that's unncessary. I don't hold a strong position about Twisted having a monotonic clock. When the original ticket was discussed here months ago, I was quite alarmed because the symptoms sounded dire. Further discussion clarified what the issues were, and I decided they weren't significant (for us). Maybe "Twisted doesn't need a monotonic clock" is the right reply, but it would be wrong to base that on the assumpton that "ntpd doesn't step" - that's all I was trying to say. Cheers, Phil

On Mon, Oct 29, 2012 at 10:33 AM, Phil Mayers <p.mayers@imperial.ac.uk> wrote:
My original reply was perhaps a bit... exclusionary. But I think the point I was trying to make is still valid; if your VM is suspended for 250ms due to a migration (or CPU throttling, or ...) then it is impossible to maintain a consistent stream of time events because there isn't any code executing at all. Whether the clock stalls, or is jumped forward 250ms, you still have an unavoidable problem as far as time-keeping is concerned.
Who knows what newer implementations like chrony or openntpd do!
If they're doing something silly, then maybe you shouldn't use them.
I didn't intend it that way, but I can see how what I said could be read in a hostile / defensive tone. Perhaps this would be a better phrasing: If chrony or openntpd are doing something sensible, then there shouldn't be any serious issues with using them; if they're doing something that seriously destabilizes the system clock, then you probably shouldn't be using them, at least not if you want reliable timekeeping. I'm not familiar enough with either of them to know which case applies, so I didn't intend to come across as bashing chrony and/or openntpd.
I don't think anyone in this thread is arguing *against* implementing this functionality; I think the point was just that this functionality is only of critical importance under a limited range of circumstances, as opposed to being something of urgent need for every Twisted-using program that needs to schedule timed events. -- mithrandi, i Ainil en-Balandor, a faer Ambar

On Oct 29, 2012, at 2:59 AM, Tristan Seligmann <mithrandi@mithrandi.net> wrote:
Yes. Even if the reasons cited in this thread are not 100% correct, the fact is that many thousands of Twisted servers that depend intimately on timed events are deployed, even on vmware-hosted virtual machine infrastructures, without experiencing difficulties related to #2424. That said, it would be great if someone could implement a fix for that issue, as it's just one more thing that system operators need to be aware of and keep track of. It would be really neat if you could totally screw up your timekeeping but have Twisted applications keep running reliably regardless. -glyph

On 29/10/12 09:59, Tristan Seligmann wrote:
I think this is a pretty good summary; it's rare to need this (though often essential when you do) and it's also hard. TBH I'm not sure there are any sensible semantics for a lot of the cases - the best the framework can do is give the app an option for clocks, and try to give them as much info as possible about how they've advanced or not.

On Oct 29, 2012, at 10:52 AM, Phil Mayers <p.mayers@imperial.ac.uk> wrote:
Right; callLater is not quite expressive enough. I wouldn't want to expose the whole mess of clock nonsense to every application, but it would be necessary to split callLater into "callAfter" (delay a certain number of seconds from "now", to within a best guess, regardless of clock changes) and "callAt" (call as close as possible to a certain calendar time, respecting clock changes). -glyph

On Sun, Oct 28, 2012 at 11:22 PM, Christopher Armstrong < radix@twistedmatrix.com> wrote:
Hi Christopher I didn't encounter any problem with twisted so far but it happened a few years ago when I deployed the system (based on C#/C++) on windows. Sometimes, the timer stopped running. Bugs triggered by this issue are not really easy to be detected. That is why I am concerned.
participants (7)
-
Christopher Armstrong
-
exarkun@twistedmatrix.com
-
gelin yan
-
Glyph
-
Joshua Bartlett
-
Phil Mayers
-
Tristan Seligmann