Re: NumPy-Discussion Digest, Vol 183, Issue 33
Lev, excuse me if I go in super pedantic mode, but your answer and the current text of the article fail to grasp an important point. 1) The proleptic Gregorian calendar is about leap year rules. It tracks days without making any assumption on the length of days. If we agree on using this calendar, dates like -0099-07-12 and 2021-12-29 are defined without ambiguity, and we can easily compute the number of days between these two dates. 2) Posix semantics is about the length of a day, and is based on the (utterly wrong) assumption that a mean solar day is constant and exactly 86400 SI seconds long. (For an authoritative estimate of historical length of day variations see <http://astro.ukho.gov.uk/nao/lvm/ http://astro.ukho.gov.uk/nao/lvm/> and the related papers <http://doi.org/10.1098/rspa.2016.0404 http://doi.org/10.1098/rspa.2016.0404 <https://doi.org/10.1098/rspa.2020.0776 https://doi.org/10.1098/rspa.2020.0776>) Knowing assumption 1) is important when coding dates before 1582-10-15: e.g. 1582-10-04 Julian is 1582-10-14 proleptic Gregorian. Once we agree on the proleptic Gregorian calendar everything works as expected: time deltas expressed in days are correct. Knowing assumption 2) is important if we pretend to compute time deltas for date-time objects with high precision: e.g. how many SI seconds occur between 1582-10-14T12:00:00 and 1582-10-15T12:00:00 with millisecond precision? Here we must first define what T12:00:00 means, say UT1, but most critically we need to know the length of day in 1582. With Posix semantics a day is always 86400.000 SI second long; however the real value of the length of day in 1582 could be about 5 ms less. The problem here is that small errors accumulate and if we compute the difference between 0000-01-01T12:00:00 and 1900-01-01T12:00:00 the numpy answer may be off by about 10_000 seconds. Fast forward to current times: after 1972 T12:00:00 should be defined as UTC, and the posix assumption is correct for almost every day, bar when a leap second is added (86401 s) or removed (86399 s, but this has never occurred.) Now the numpy computed timedeltas are correct up to an integral number of seconds that can be derived from a leap second table, if both dates are in the past. If one or both of the dates are in the future, then we must rely on models of earth rotation, and estimate the future introduction of leap seconds. But earth rotation is quite “unpredictable”, so usually this is not very accurate. The main problem with numpy datetime64 is that by using np.int64 for Datetimes it gives 1/2**63 precision (about 1e-19). But this apparent very high precision has to be confronted with the relative accuracy of the Posix semantics, which lies at about 1e-7, 1e-8, if we look at timespans of a couple of centuries. So I agree that the np.datetime64 precision is somehow misleading. This all said, proleptic Gregorian + Posix semantics is, in my opinion, the only sensible option in a numerical package like numpy, although the results can be inaccurate. However errors are usually small on the average (say 10 ms/day which is about 1e-7). Everything more sophisticated is in the realm of specialised packages, like AstroPy, but also Skyfield <https://rhodesmill.org/skyfield/ https://rhodesmill.org/skyfield/>. Stefano
On 28 Dec 2021, at 21:35, numpy-discussion-request@python.org wrote:
t is not a matter of formal definitions. Leap seconds are uncompromisingly practical. If you look at the wall clock on 1 Jan 1970 00:00 and then look at the same clock today and measure the difference with atomic clock you won't get the time delta that np.timedelta64 reports. There will be a difference of ~37 seconds.
Actually this should be 27s.
One would expect that a library claiming to work with attoseconds would at least count the seconds correctly ) Astropy library calculates https://het.as.utexas.edu/HET/Software/Astropy-1.0/api/astropy.time.TimeGPS.... them properly: "GPS Time. Seconds from 1980-01-06 00:00:00 UTC For example, 630720013.0 is midnight on January 1, 2000."
np.datetime64('2000-01-01', 's') - np.datetime64('1980-01-06', 's') numpy.timedelta64(630720000,'s')
Everything should be made as simple as possible but not simpler. Leap seconds are an inherent part of the world we live in.
Eg this is how people deal with them currently: they have to parse times like 23:59:60.209215 manually https://stackoverflow.com/questions/21027639/python-datetime-not-accounting-... https://stackoverflow.com/questions/21027639/python-datetime-not-accounting-...
- calendrical calculations are performed using a proleptic Gregorian calendar <https://en.wikipedia.org/wiki/Proleptic_Gregorian_calendar https://en.wikipedia.org/wiki/Proleptic_Gregorian_calendar>, - Posix semantics is followed, i.e. each day comprises exactly 86400 SI seconds, thus ignoring the existence of leap seconds.
I would also point out that this choice is consistent with python datetime. But not consistent with python time ;) "Unlike the time module, the datetime module does not support leap seconds." • time.CLOCK_TAI International Atomic Time The system must have a current leap second table in order for this to give the correct answer. PTP or NTP software can maintain a leap second table. Availability: Linux. New in version 3.9.
As what regards the promised future support for leap seconds, I would not mention it, for now. In fact leap second support requires a leap second table, which is not available on all platforms supported by numpy. Therefore the leap second table should be bundled and updated with every numpy release with the very undesirable effect that older version (with outdated tables) would behave differently from newer ones. The olson database is much larger yet it is updated on millions of computers, phones and what not without causing extra difficulties (except when the government unexpectedly decides to shift a region from one TZ to another). This way developers have a choice whether to work with naive datetimes (ok in a single timezone without daylight-saving) or with timezone-aware ones (and take care about updating the pytz).
This is how astropy deals with updating the table: https://docs.astropy.org/en/stable/api/astropy.utils.iers.LeapSeconds.html https://docs.astropy.org/en/stable/api/astropy.utils.iers.LeapSeconds.html Pytz also has this table both inside the binary tz files and in a text file: https://github.com/stub42/pytz/blob/master/tz/leap-seconds.list https://github.com/stub42/pytz/blob/master/tz/leap-seconds.list which it in turn downloads from NIST ftp://ftp.nist.gov/pub/time/leap-seconds.list ftp://ftp.nist.gov/pub/time/leap-seconds.list It is in the public domain, NIST updates this file regularly and it even has an expiration date (presently it is 28 June 2022). Activation of the 'leap-second-aware mode' could be made dependent on the presence of the pytz mode and/or this expiration date.
I don't think having a non-default leap-second-aware mode would hurt anyone, but I also wouldn't consider it a priority. I think when someone needs them he'll make a patch and until that moment it is safe to have them as 'proposed' )
I feel that leap seconds should be mentioned somewhere—in the article or in the docs, because it limits practical precise usage of timedelta64 to a period between 2021 and 2016 (last time when a leap second was injected). A modest timespan for a library claiming to work with years upto 9.2e18 BC ;)
Thank you for your suggestions! I've included them into the article, plz have a look at the updated version.
Best regards, Lev
Hey, Stefano! The level of being pedantic is absolutely acceptable. I don't question any of your arguments. They are all perfectly valid. Except that I'd rather say it is ~29 seconds if measuring against 1970. Leap seconds were introduced in 1972 and there were a total of 27 seconds since then, but TAI time was ticking since 1958 and gained 10 seconds by 1970 so it is approximately 0.83 second per year at which gives approx 28.67 sec between today and 1970. So 1970 is a bad choice of epoch if you want to introduce a leap-second-aware datetime. In GPS time they chose 1980. In TAI it is 1958, but that is somewhat worse than 1980 because it is not immediately clear how to perform the conversion timestamp<->timedelta between 1958 and 1970. Something like 'proleptic gps time' would be needed to estimate the number of leap seconds in the years before 1972 when they were introduced. Or maybe to limit the leap-second timescale to start at 1972 and not to accept any timestamps before that date. The system that ignores the existence of the leap seconds has a right to exist. But it just has limited applicability. np.datetime64 keeps time as a delta between the moment in time and a predefined epoch. Which standard does it use to translate this delta to human-readable time in years, months, and so on? If it is UTC, then it must handle times like 2016-12-31 23:59:60, because it is a valid UTC timestamp.
np.datetime64('2016-12-31 12:59:60') Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: Seconds out of range in datetime string "2016-12-31 12:59:60"
Datetime also fails (so far) to handle it:
dt(2016,12,31,23,59,60) Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: second must be in 0..59
But `time` works. Well, at least it doesn't raise an exception:
t = time.struct_time((2016,12,31,12,59,60,0,0,0)); t time.struct_time(tm_year=2016, tm_mon=12, tm_mday=31, tm_hour=12, tm_min=59, tm_sec=60, tm_wday=0, tm_yday=0, tm_isdst=0) time.asctime(t) 'Mon Dec 31 12:59:60 2016' time.gmtime(calendar.timegm(t)) time.struct_time(tm_year=2017, tm_mon=1, tm_mday=1, tm_hour=1, tm_min=0, tm_sec=0, tm_wday=6, tm_yday=1, tm_isdst=0)
Imagine a user that decides which library to use to store some (life critical!) measurements taken every 100 ms. He looks at NumPy datetime64, reads that it is capable of handling attosecods, and decides that it is a perfect fit. Now imagine that on 31 Dec 2022 the World Government decided to inject a leap second. The system will receive the announcement from the NTC servers and will prepare to replay this second twice. As soon as this moment chimes in he'll run into a ValueError, which he won't notice because he's celebrating a New Year :) And guess whom he'll blame? ;) Actually the humanity has already got used to replaying timespans twice. It happens every year in the countries that observe daylight saving time. And the solution is to use a more linear scale than local time, namely, UTC. But now turns out that UTC is not linear enough and it also has certain timespans happening twice. The solution once again is use a _really_ linear time which is TAI. I think python 'time' library did a right thing to introduce time.CLOCK_TAI, after all. Astropy handles the UTC scale properly though:
t = Time('2016-12-31 23:59:60')
So the solution for that particular person with regular intervals of time
is to use astropy. I mention it in the article.
I made some corrections to the text. I'd be grateful if you had a look and
pointed me to the particular sentences
that need improvement.
Best regards,
Lev
On Wed, Dec 29, 2021 at 6:54 PM Stefano Miccoli
Lev, excuse me if I go in super pedantic mode, but your answer and the current text of the article fail to grasp an important point.
1) The proleptic Gregorian calendar is about leap *year* rules. It tracks days without making any assumption on the length of days. If we agree on using this calendar, dates like -0099-07-12 and 2021-12-29 are defined without ambiguity, and we can easily compute the number of days between these two dates.
2) Posix semantics is about the length of a day, and is based on the (utterly wrong) assumption that a mean solar day is constant and exactly 86400 SI seconds long. (For an authoritative estimate of historical length of day variations see http://astro.ukho.gov.uk/nao/lvm/ and the related papers <http://doi.org/10.1098/rspa.2016.0404 < https://doi.org/10.1098/rspa.2020.0776>)
Knowing assumption 1) is important when coding dates before 1582-10-15: e.g. 1582-10-04 Julian is 1582-10-14 proleptic Gregorian. Once we agree on the proleptic Gregorian calendar everything works as expected: time deltas expressed in days are correct.
Knowing assumption 2) is important if we pretend to compute time deltas for date-time objects with high precision: e.g. how many SI seconds occur between 1582-10-14T12:00:00 and 1582-10-15T12:00:00 with millisecond precision? Here we must first define what T12:00:00 means, say UT1, but most critically we need to know the length of day in 1582. With Posix semantics a day is always 86400.000 SI second long; however the real value of the length of day in 1582 could be about 5 ms less. The problem here is that small errors accumulate and if we compute the difference between 0000-01-01T12:00:00 and 1900-01-01T12:00:00 the numpy answer may be off by about 10_000 seconds.
Fast forward to current times: after 1972 T12:00:00 should be defined as UTC, and the posix assumption is correct for almost every day, bar when a leap second is added (86401 s) or removed (86399 s, but this has never occurred.) Now the numpy computed timedeltas are correct up to an integral number of seconds that can be derived from a leap second table, if both dates are in the past. If one or both of the dates are in the future, then we must rely on models of earth rotation, and estimate the future introduction of leap seconds. But earth rotation is quite “unpredictable”, so usually this is not very accurate.
The main problem with numpy datetime64 is that by using np.int64 for Datetimes it gives 1/2**63 precision (about 1e-19). But this apparent very high precision has to be confronted with the relative accuracy of the Posix semantics, which lies at about 1e-7, 1e-8, if we look at timespans of a couple of centuries. So I agree that the np.datetime64 precision is somehow misleading.
This all said, proleptic Gregorian + Posix semantics is, in my opinion, the only sensible option in a numerical package like numpy, although the results can be inaccurate. However errors are usually small on the average (say 10 ms/day which is about 1e-7). Everything more sophisticated is in the realm of specialised packages, like AstroPy, but also Skyfield < https://rhodesmill.org/skyfield/>.
Stefano
On 28 Dec 2021, at 21:35, numpy-discussion-request@python.org wrote:
t is not a matter of formal definitions. Leap seconds are uncompromisingly practical. If you look at the wall clock on 1 Jan 1970 00:00 and then look at the same clock today and measure the difference with atomic clock you won't get the time delta that np.timedelta64 reports. There will be a difference of ~37 seconds.
Actually this should be 27s.
One would expect that a library claiming to work with attoseconds would at least count the seconds correctly )
Astropy library calculates https://het.as.utexas.edu/HET/Software/Astropy-1.0/api/astropy.time.TimeGPS.... them properly: "GPS Time. Seconds from 1980-01-06 00:00:00 UTC For example, 630720013.0 is midnight on January 1, 2000."
np.datetime64('2000-01-01', 's') - np.datetime64('1980-01-06', 's') numpy.timedelta64(630720000,'s')
Everything should be made as simple as possible but not simpler. Leap seconds are an inherent part of the world we live in.
Eg this is how people deal with them currently: they have to parse times like 23:59:60.209215 manually
https://stackoverflow.com/questions/21027639/python-datetime-not-accounting-...
- calendrical calculations are performed using a proleptic Gregorian
calendar https://en.wikipedia.org/wiki/Proleptic_Gregorian_calendar, - Posix semantics is followed, i.e. each day comprises exactly 86400 SI seconds, thus ignoring the existence of leap seconds.
I would also point out that this choice is consistent with python datetime.
But not consistent with python time ;) "Unlike the time module, the datetime module does not support leap seconds." • time.CLOCK_TAI International Atomic Time The system must have a current leap second table in order for this to give the correct answer. PTP or NTP software can maintain a leap second table. Availability: Linux. New in version 3.9.
As what regards the promised future support for leap seconds, I would not mention it, for now. In fact leap second support requires a leap second table, which is not available on all platforms supported by numpy. Therefore the leap second table should be bundled and updated with every numpy release with the very undesirable effect that older version (with outdated tables) would behave differently from newer ones.
The olson database is much larger yet it is updated on millions of computers, phones and what not without causing extra difficulties (except when the government unexpectedly decides to shift a region from one TZ to another). This way developers have a choice whether to work with naive datetimes (ok in a single timezone without daylight-saving) or with timezone-aware ones (and take care about updating the pytz).
This is how astropy deals with updating the table: https://docs.astropy.org/en/stable/api/astropy.utils.iers.LeapSeconds.html Pytz also has this table both inside the binary tz files and in a text file: https://github.com/stub42/pytz/blob/master/tz/leap-seconds.list which it in turn downloads from NIST ftp://ftp.nist.gov/pub/time/leap-seconds.list It is in the public domain, NIST updates this file regularly and it even has an expiration date (presently it is 28 June 2022). Activation of the 'leap-second-aware mode' could be made dependent on the presence of the pytz mode and/or this expiration date.
I don't think having a non-default leap-second-aware mode would hurt anyone, but I also wouldn't consider it a priority. I think when someone needs them he'll make a patch and until that moment it is safe to have them as 'proposed' )
I feel that leap seconds should be mentioned somewhere—in the article or in the docs, because it limits practical precise usage of timedelta64 to a period between 2021 and 2016 (last time when a leap second was injected). A modest timespan for a library claiming to work with years upto 9.2e18 BC ;)
Thank you for your suggestions! I've included them into the article, plz have a look at the updated version.
Best regards, Lev
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: lev.maximov@gmail.com
participants (2)
-
Lev Maximov
-
Stefano Miccoli