New subject: NumPy-Discussion Digest, Vol 183, Issue 33

29 Dec 2021

      Lev, excuse me if I go in super pedantic mode, but your answer and the current text of the article fail to grasp an important point.

1) The proleptic Gregorian calendar is about leap year rules. It tracks days without making any assumption on the length of days. If we agree on using this calendar, dates like -0099-07-12 and 2021-12-29 are defined without ambiguity, and we can easily compute the number of days between these two dates.

2) Posix semantics is about the length of a day, and is based on the (utterly wrong) assumption that a mean solar day is constant and exactly 86400 SI seconds long. (For an authoritative estimate of historical length of day variations see <http://astro.ukho.gov.uk/nao/lvm/ http://astro.ukho.gov.uk/nao/lvm/> and the related papers <http://doi.org/10.1098/rspa.2016.0404 http://doi.org/10.1098/rspa.2016.0404 <https://doi.org/10.1098/rspa.2020.0776 https://doi.org/10.1098/rspa.2020.0776>)

Knowing assumption 1) is important when coding dates before 1582-10-15: e.g. 1582-10-04 Julian is 1582-10-14 proleptic Gregorian. Once we agree on the proleptic Gregorian calendar everything works as expected: time deltas expressed in days are correct.

Knowing assumption 2) is important if we pretend to compute time deltas for date-time objects with high precision: e.g. how many SI seconds occur between 1582-10-14T12:00:00 and 1582-10-15T12:00:00 with millisecond precision? Here we must first define what T12:00:00 means, say UT1, but most critically we need to know the length of day in 1582. With Posix semantics a day is always 86400.000 SI second long; however  the real value of the length of day in 1582 could be about 5 ms less. The problem here is that small errors accumulate and if we compute the difference between 0000-01-01T12:00:00 and 1900-01-01T12:00:00 the numpy answer may be off by about 10_000 seconds. 

Fast forward to current times: after 1972 T12:00:00 should be defined as UTC, and the posix assumption is correct for almost every day, bar when a leap second is added (86401 s) or removed (86399 s, but this has never occurred.) Now the numpy computed timedeltas are correct up to an integral number of seconds that can be derived from a leap second table, if both dates are in the past. If one or both of the dates are in the future, then we must rely on models of earth rotation, and estimate the future introduction of leap seconds. But earth rotation is quite “unpredictable”, so usually this is not very accurate.

The main problem with numpy datetime64 is that by using np.int64 for Datetimes it gives 1/2**63 precision (about 1e-19). But this apparent very high precision has to be confronted with the relative accuracy of the Posix semantics, which lies at about 1e-7, 1e-8, if we look at timespans of a couple of centuries. So I agree that the np.datetime64 precision is somehow misleading. 

This all said, proleptic Gregorian + Posix semantics is, in my opinion, the only sensible option in a numerical package like numpy, although the results can be inaccurate. However errors are usually small on the average (say 10 ms/day which is about 1e-7). Everything more sophisticated is in the realm of specialised packages, like AstroPy, but also Skyfield <https://rhodesmill.org/skyfield/ https://rhodesmill.org/skyfield/>.

Stefano
...
On 28 Dec 2021, at 21:35, numpy-discussion-request@python.org wrote:
t is not a matter of formal definitions. Leap seconds are uncompromisingly practical.
If you look at the wall clock on 1 Jan 1970 00:00 and then look at the same clock today and measure the difference with atomic clock you won't get the time delta that np.timedelta64 reports. There will be a difference of ~37 seconds.
Actually this should be 27s.
...
One would expect that a library claiming to work with attoseconds would at least count the seconds correctly )
Astropy library calculates https://het.as.utexas.edu/HET/Software/Astropy-1.0/api/astropy.time.TimeGPS.... them properly: 
"GPS Time. Seconds from 1980-01-06 00:00:00 UTC For example, 630720013.0 is midnight on January 1, 2000."
...
...
...
np.datetime64('2000-01-01', 's') - np.datetime64('1980-01-06', 's')
numpy.timedelta64(630720000,'s')
Everything should be made as simple as possible but not simpler. Leap seconds are an inherent part of the world we live in.
Eg this is how people deal with them currently: they have to parse times like 23:59:60.209215 manually
https://stackoverflow.com/questions/21027639/python-datetime-not-accounting-... https://stackoverflow.com/questions/21027639/python-datetime-not-accounting-...
- calendrical calculations are performed using a proleptic Gregorian calendar <https://en.wikipedia.org/wiki/Proleptic_Gregorian_calendar https://en.wikipedia.org/wiki/Proleptic_Gregorian_calendar>,
- Posix semantics is followed, i.e. each day comprises exactly 86400 SI seconds, thus ignoring the existence of leap seconds.
I would also point out that this choice is consistent with python datetime.
But not consistent with python time ;) "Unlike the time module, the datetime module does not support leap seconds."
• time.CLOCK_TAI
    International Atomic Time
   The system must have a current leap second table in order for this to give the correct answer. PTP or NTP software can maintain a leap second table.
    Availability: Linux.
    New in version 3.9.
As what regards the promised future support for leap seconds, I would not mention it, for now. In fact leap second support requires a leap second table, which is not available on all platforms supported by numpy. Therefore the leap second table should be bundled and updated with every numpy release with the very undesirable effect that older version (with outdated tables) would behave differently from newer ones.
The olson database is much larger yet it is updated on millions of computers, phones and what not without causing extra difficulties
(except when the government unexpectedly decides to shift a region from one TZ to another). This way developers have a choice whether 
to work with naive datetimes (ok in a single timezone without daylight-saving) or with timezone-aware ones (and take care about updating the pytz).
This is how astropy deals with updating the table: https://docs.astropy.org/en/stable/api/astropy.utils.iers.LeapSeconds.html https://docs.astropy.org/en/stable/api/astropy.utils.iers.LeapSeconds.html
Pytz also has this table both inside the binary tz files and in a text file: https://github.com/stub42/pytz/blob/master/tz/leap-seconds.list https://github.com/stub42/pytz/blob/master/tz/leap-seconds.list
which it in turn downloads from NIST ftp://ftp.nist.gov/pub/time/leap-seconds.list ftp://ftp.nist.gov/pub/time/leap-seconds.list
It is in the public domain, NIST updates this file regularly and it even has an expiration date (presently it is 28 June 2022).
Activation of the 'leap-second-aware mode' could be made dependent on the presence of the pytz mode and/or this expiration date.
I don't think having a non-default leap-second-aware mode would hurt anyone, but I also wouldn't consider it a priority. I think when someone needs them he'll make a patch and until that moment it is safe to have them as 'proposed' )
I feel that leap seconds should be mentioned somewhere—in the article or in the docs, because it limits practical precise usage of timedelta64 to
a period between 2021 and 2016 (last time when a leap second was injected). A modest timespan for a library claiming to work with years upto 9.2e18 BC ;)
Thank you for your suggestions! I've included them into the article, plz have a look at the updated version.
Best regards,
Lev

Re: NumPy-Discussion Digest, Vol 183, Issue 33

Stefano Miccoli

Lev Maximov

tags

participants (2)