Re: [Numpy-discussion] Proposal: add the timestamp64 type (Noam Yorav-Raphael)

On 11 Nov 2020, at 18:00, numpy-discussion-request@python.org<mailto:numpy-discussion-request@python.org> wrote: I propose to add a new type called "timestamp64". It will be a pure timestamp, meaning that it represents a moment in time (as seconds/ms/us/ns since the epoch), without any timezone information. Sorry, but I really don see the usefulness for another time stamping format based on POSIX time. Indeed POSIX time is based on a naive approximation of UTC and is ambiguous across leap seconds. Quoting from Wikipedia <https://en.wikipedia.org/wiki/Unix_time#Leap_seconds> The Unix time number 1483142400 is thus ambiguous: it can refer either to start of the leap second (2016-12-31 23:59:60) or the end of it, one second later (2017-01-01 00:00:00). In the theoretical case when a negative leap second occurs, no ambiguity is caused, but instead there is a range of Unix time numbers that do not refer to any point in UTC time at all. Precision time stamping is quite a complex task: you can use UTC, TAI, GPS, just to mention the most used timescales. And how do you deal with timestamps in the past, when timekeeping was based on earth rotation, and not atomic clocks ticking at (approximately) 1 SI-second frequency? In my opinion time-stamping should be application dependent, and I doubt that the new “timestamp64” could be beneficial to the numpy community. Best regards, Stefano

On 11/12/20 6:04 PM, Stefano Miccoli wrote:
In a one-on-one discussion with Noam in a pre-community call (that, how ironically, we had time for since we both messed up the meeting time-zone change) we reached the conclusion that the request is to clarify whether NumPy's datetime64 represents TAI time [0] or POSIX time, with a preferecne for TAI time. The documentation mentions POSIX time[1]. As Stefano points out, there is a couple of seconds difference between POSIX (or Unix) time and TAI time. In practice numpy simply stores a int64 value to represent the datetime64, and relies on others to convert it. The leap-second might be getting lost in the conversions. So it might make sense to clarify exactly how those conversions deal with the leap-seconds and choose which one we mean when we use datetime64. Noam please correct me if I am mistaken. Matti [0] https://en.wikipedia.org/wiki/International_Atomic_Time [1] https://numpy.org/doc/stable/reference/arrays.datetime.html#datetime-units

Hi Matti and Stefano, My understanding is that datetime64 was decided to be neither TAI nor posix time, but rather represent an abstract calendar point, like datetime.datetime without a specified timezone. This can usually be converted into posix time given a timezone (although in the "repeated" hour between DST and winter time there will be ambiguity!) If it is agreed by all users that a datetime64 represents the time in UTC, it is the same as posix time. I would like to have a type that is defined to be equivalent to posix time. I don't agree with Stefano, I think that posix time is very useful (as I think its ubiquity shows that), and I think that a type that is defined to be posix time would also be very useful. I think that posix time is well suited for the vast majority of use cases. Indeed, there are use cases where you should take into account leap seconds, but those are rare. In practice, a leap second would be presented by the OS as a second that actually takes more than a second. This actually happens all the time without leap seconds - when your computer automatically syncs with ntp, it adjusts the time continuously, so applications will not experience "time bumps". If you want to make sure that the intervals you measure are correct, you should use something like time.monotonic(). So, most users are not interested in very precise time measurements, but rather in knowing what happened before what, and roughly when. For this, posix time is great - it's very simple, and does the job. In some cases you need to take into account leap seconds, but in those cases, just using the computer clock will not give you the precision you need no matter what - so you'll need specialized software anyway. I think that posix time is great, and since it's very easy to make wrong decisions that seem to work until you discover they don't (such as discovering too late that local time won't work when you are not sure of the time zone, or when you switch from DST to winter time), a sane and simple default is important. Cheers, Noam On Thu, Nov 12, 2020 at 6:41 PM Matti Picus <matti.picus@gmail.com> wrote:

On 12/11/2020 17:40, Matti Picus wrote:
Unix time is a representation of the UTC timescale that counts 1 seconds intervals starting from a defined epoch. It deals with leap seconds either skipping one interval (never happened so far) or repeating an interval so that two moments in time that on the UTC timescale are separated by one second (for example 2016-12-31 23:59:59 and 2016-12-31 23:59:60) are represented in the same way and thus the conversion from Unix time to UTC is ambiguous during this one second. This happened 37 times since 1972. This comes with the nice properties that minutes, hours and days have always the same duration (in Unix time), thus converting from the Unix time representation to an date and hour and vice versa is fairly easy. The drawback are, as seen above, an ambiguity on leap seconds and the fact that the trivial computation of time intervals does not take into account leap seconds and thus may be shorted of a few seconds (any time interval across 2016-12-31 23:59:59 is off by at least one second if computed simply subtracting Unix times). I don't think these two drawbacks are important for Numpy (or any other general purpose library). As things stand, it is not even possible, in Python, with or without Numpy, to create a datetime or datetime64 object from the time "2016-12-31 23:59:60" (neither accept the existence of a minute with 61 seconds) thus the ambiguity issue is not an issue in practice. The time interval issue may matter for some applications, but the ones affected are aware of the issue and have means to deal with it (the most common one being taking a day off on the days leap seconds are introduced). I think documenting that datetime64 is a representation of fixed time intervals since a conventional epoch, neglecting leap seconds, is easy to explain and implement and allows for easy interoperability with the rest of the world. What advantage would making datetime64 explicitly a representation of TAI bring? One disadvantage would be that `np.datetime64(datetime.now())` would be harder to support as we are trying to match a point in time on the UTC time scale to a point in time in on the TAI time scale. This is trivial for past times (just need to adjust for the right offset) but it is impossible to do correctly for dates in the future because we cannot predict future leap second insertions. This would, for example, make timestamp conversions not be reproducible across announcement of leap second insertions. Cheers, Dan

On 11/12/20 6:04 PM, Stefano Miccoli wrote:
In a one-on-one discussion with Noam in a pre-community call (that, how ironically, we had time for since we both messed up the meeting time-zone change) we reached the conclusion that the request is to clarify whether NumPy's datetime64 represents TAI time [0] or POSIX time, with a preferecne for TAI time. The documentation mentions POSIX time[1]. As Stefano points out, there is a couple of seconds difference between POSIX (or Unix) time and TAI time. In practice numpy simply stores a int64 value to represent the datetime64, and relies on others to convert it. The leap-second might be getting lost in the conversions. So it might make sense to clarify exactly how those conversions deal with the leap-seconds and choose which one we mean when we use datetime64. Noam please correct me if I am mistaken. Matti [0] https://en.wikipedia.org/wiki/International_Atomic_Time [1] https://numpy.org/doc/stable/reference/arrays.datetime.html#datetime-units

Hi Matti and Stefano, My understanding is that datetime64 was decided to be neither TAI nor posix time, but rather represent an abstract calendar point, like datetime.datetime without a specified timezone. This can usually be converted into posix time given a timezone (although in the "repeated" hour between DST and winter time there will be ambiguity!) If it is agreed by all users that a datetime64 represents the time in UTC, it is the same as posix time. I would like to have a type that is defined to be equivalent to posix time. I don't agree with Stefano, I think that posix time is very useful (as I think its ubiquity shows that), and I think that a type that is defined to be posix time would also be very useful. I think that posix time is well suited for the vast majority of use cases. Indeed, there are use cases where you should take into account leap seconds, but those are rare. In practice, a leap second would be presented by the OS as a second that actually takes more than a second. This actually happens all the time without leap seconds - when your computer automatically syncs with ntp, it adjusts the time continuously, so applications will not experience "time bumps". If you want to make sure that the intervals you measure are correct, you should use something like time.monotonic(). So, most users are not interested in very precise time measurements, but rather in knowing what happened before what, and roughly when. For this, posix time is great - it's very simple, and does the job. In some cases you need to take into account leap seconds, but in those cases, just using the computer clock will not give you the precision you need no matter what - so you'll need specialized software anyway. I think that posix time is great, and since it's very easy to make wrong decisions that seem to work until you discover they don't (such as discovering too late that local time won't work when you are not sure of the time zone, or when you switch from DST to winter time), a sane and simple default is important. Cheers, Noam On Thu, Nov 12, 2020 at 6:41 PM Matti Picus <matti.picus@gmail.com> wrote:

On 12/11/2020 17:40, Matti Picus wrote:
Unix time is a representation of the UTC timescale that counts 1 seconds intervals starting from a defined epoch. It deals with leap seconds either skipping one interval (never happened so far) or repeating an interval so that two moments in time that on the UTC timescale are separated by one second (for example 2016-12-31 23:59:59 and 2016-12-31 23:59:60) are represented in the same way and thus the conversion from Unix time to UTC is ambiguous during this one second. This happened 37 times since 1972. This comes with the nice properties that minutes, hours and days have always the same duration (in Unix time), thus converting from the Unix time representation to an date and hour and vice versa is fairly easy. The drawback are, as seen above, an ambiguity on leap seconds and the fact that the trivial computation of time intervals does not take into account leap seconds and thus may be shorted of a few seconds (any time interval across 2016-12-31 23:59:59 is off by at least one second if computed simply subtracting Unix times). I don't think these two drawbacks are important for Numpy (or any other general purpose library). As things stand, it is not even possible, in Python, with or without Numpy, to create a datetime or datetime64 object from the time "2016-12-31 23:59:60" (neither accept the existence of a minute with 61 seconds) thus the ambiguity issue is not an issue in practice. The time interval issue may matter for some applications, but the ones affected are aware of the issue and have means to deal with it (the most common one being taking a day off on the days leap seconds are introduced). I think documenting that datetime64 is a representation of fixed time intervals since a conventional epoch, neglecting leap seconds, is easy to explain and implement and allows for easy interoperability with the rest of the world. What advantage would making datetime64 explicitly a representation of TAI bring? One disadvantage would be that `np.datetime64(datetime.now())` would be harder to support as we are trying to match a point in time on the UTC time scale to a point in time in on the TAI time scale. This is trivial for past times (just need to adjust for the right offset) but it is impossible to do correctly for dates in the future because we cannot predict future leap second insertions. This would, for example, make timestamp conversions not be reproducible across announcement of leap second insertions. Cheers, Dan
participants (4)
-
Daniele Nicolodi
-
Matti Picus
-
Noam Yorav-Raphael
-
Stefano Miccoli