Making datetime64 timezone naive

As has come up repeatedly over the past few years, nobody seems to be very happy with the way that NumPy's datetime64 type parses and prints datetimes in local timezones. The tentative consensus from last year's discussion was that we should make datetime64 timezone naive, like the standard library's datetime.datetime: http://thread.gmane.org/gmane.comp.python.numeric.general/57184 That makes sense to me, and it's exactly what I'd like to see happen for NumPy 1.11. Here's my PR to make that happen: https://github.com/numpy/numpy/pull/6453 As a temporary measure, we still will parse datetimes that include a timezone specification by converting them to UTC, but will issue a DeprecationWarning. This is important for a smooth transition, because at the very least I suspect the "Z" modifier for UTC is widely used. Another option would be to preserve this conversion indefinitely, without any deprecation warning. There's one (slightly) contentious API decision to make: What should we do with the numpy.datetime_to_string function? As far as I can tell, it was never documented as part of the NumPy API and has not been used very much or at all outside of NumPy's own test suite, but it is exposed in the main numpy namespace. If we can remove it, then we can delete and simplify a lot more code related to timezone parsing and display. If not, we'll need to do a bit of work so we can distinguish between the string representations of timezone naive and UTC. Best, Stephan

On Mon, Oct 12, 2015 at 12:10 AM, Stephan Hoyer <shoyer@gmail.com> wrote:
I'm dubious about supporting conversions in the long run -- even "Z" -- because UTC datetimes and naive datetimes are really not the same thing. OTOH maybe if we dropped this it would break everyone's code and they would hate us -- I actually have no idea what people are doing with datetime64 outside of pandas. One way to find out is to start issuing DeprecationWarnings and see if anyone notices :-). (Though of course this is far from fool-proof.)
One possible strategy here would be to do some corpus analysis to find out whether anyone is actually using it, like I did for the ufunc ABI stuff: https://github.com/njsmith/codetrawl https://github.com/njsmith/ufunc-abi-analysis "datetime_to_string" is an easy token to search for, though it looks like enough people have their own functions named that that you'd have to do a bit of filtering to ignore non-numpy-related uses. A filter("content", "import.*numpy") would collect all files that import numpy into a single group for further examination. -n -- Nathaniel J. Smith -- http://vorpus.org

On Mon, Oct 12, 2015 at 12:38 AM, Nathaniel Smith <njs@pobox.com> wrote:
Yes, this is a good approach. I actually mistyped the name here -- it's actually "datetime_as_string". A GitHub search does turn up a handful of uses outside of NumPy: https://github.com/search?utf8=%E2%9C%93&q=numpy.datetime_as_string+in%3Afile%2Cpath+NOT+numpy%2Fcore+NOT+test_datetime.py+NOT+arrayprint.py&type=Code&ref=searchresults That said, I'm not sure it's worth going to the trouble to ensure it continues to work in the future. This function was entirely undocumented, and doesn't even have an inspectable function signature. Stephan

On Mon, Oct 12, 2015 at 12:38 AM, Nathaniel Smith <njs@pobox.com> wrote:
no -- but almost!
OTOH maybe if we dropped this it would break everyone's code and they would hate us --
I think it probably would. In the current implementation, an ISO string without an offset specifier is converted using the system's locale timezone. So to get naive time (or UTC), we need to tack a Z (or 00:00) on there. So killing that would likely break a lot of code! And excepting a Z or 00:00 and then treating it as naive, while being perhaps misleading, would not actually change any results. So I say we keep it. Depreciating it eventually would be good in the long run -- but maybe when we have actual time zone support. I actually have no idea what people are
doing with datetime64 outside of pandas.
What do we need to do with this not to break Panda? I'm guessing more people use datetime64 wrapped by Pandas than any other way... (not me, though)
Well, I'm not using it :-) though I can see that it might be pretty useful. Though once we get rid of datetime64 adjusting for the locale timezone, maybe not anymore. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Mon, Oct 12, 2015 at 3:10 AM, Stephan Hoyer <shoyer@gmail.com> wrote:
If you are going to make datetime64 more like datetime.datetime, please consider adding the "fold" bit. See PEP 495. [1] [1]: https://www.python.org/dev/peps/pep-0495/

I'd be totally in support of switching to timezone naive form. While it would be ideal that everyone stores their dates in UTC, the real world is messy and most of the time, people are just loading dates as-is and don't even care about timezones. I work on machines with different TZs, and I hate it when I save a bunch of data on one machine in UTC, but then go to view it on my local machine and everything is shifted. It gets even more confusing around DST switches because it gets all mixed up. Ben Root On Mon, Oct 12, 2015 at 2:48 PM, Alexander Belopolsky <ndarray@mac.com> wrote:

On Mon, Oct 12, 2015 at 11:48 AM, Alexander Belopolsky <ndarray@mac.com> wrote:
well, adding any timezone support is not (yet) in the table. (no need for "fold" with purely naive time, yes?) But yes, when we get there, absolutely. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Oct 12, 2015 11:48 AM, "Alexander Belopolsky" <ndarray@mac.com> wrote:
On Mon, Oct 12, 2015 at 3:10 AM, Stephan Hoyer <shoyer@gmail.com> wrote:
The tentative consensus from last year's discussion was that we should
make datetime64 timezone naive, like the standard library's datetime.datetime
If you are going to make datetime64 more like datetime.datetime, please
consider adding the "fold" bit. See PEP 495. [1]
The challenge here is that we literally do not have a bit too use :-) Unless we make it datetime65 + 63 bits of padding, stealing a bit to use for fold would halve the range of representable times, and I'm guessing this would not be acceptable? -- pandas's 64-bits-of-nanoseconds already has a somewhat narrow range (584 years). I think for now the two goals are to make the built in datetime64 minimally functional and self consistent, and to make it possible for fancier datetime needs to be handled using third party dtypes. -n

On Tue, Oct 13, 2015 at 3:21 PM, Nathaniel Smith <njs@pobox.com> wrote:
If you are going to make datetime64 more like datetime.datetime, please consider adding the "fold" bit. See PEP 495. [1]
The challenge here is that we literally do not have a bit too use :-)
hmm -- I was first thinking that this could all be in the timezone stuff (when we get there), but while I imagine we'll want an entire array to be in a single timezone, each individual value would need its own "fold" flag. But in any case, we don't need it 'till we do timezones, and my understanding is that we aren't' going to do timezones until we have the mythical new-and-improved-dtype-system. So a future datetime dtype could be 64 bits + a byte of extra info, or be 63 bits plus the fold flag, or...
well, not now, with eh fixed epoch, but if the epoch could be adjusted, maybe a small range would be fine -- who need nanosecond accuracy, AND centuries of range? Thinking a bit more here: For those that didn't follow the massive discussion on this on Python-dev and the new datetime list: the fold flag is required to round-trip properly for timezones with discontiguous time -- i.e. Daylight savings. So if you have: 2015-11-01T01:30 Do you mean the first 1:30 am or the seconds one, after the DST transition? (i.e. in the fold, or not?) So it is key, for Python's Datetime, to make sure to keep that information around. However: Python's datetime was designed to be optimized for: - converting between datetime and other representations in Database, etc. - fast math for "naive time" -- i.e. basic manipulations within the same timezone, like "one day later" - Fast math for "absolute relative deltas" is of secondary concern. The result of this is that datetime stores: year, month, day, hour minute second, microsecond It does NOT store some time_unit_since_an_epch, like unix time or numpy datetime64. Also, IIUC, when you associate a datetime with a timezone, it stores the year, month, day, hour, second,... in the specified timezone -- NOT in UTC, or anything else. This makes manipulations within that timezone easy -- the next day simply required adding a day to teh day field (then normalizing to the month). Given all that -- the "fold" bit is needed, as a particular datetime in a particular timezone may have more than one meaning. Note that to compute a proper time span between two "aware" datetimes, it is necessary to convert to UTC, do the math, then convert back to the timezone you want. However, numpy datetime is optimized for compact storage and fast computation of absolute deltas (actual hours, minutes, seconds... not calendar units like "the next day" ). Because of this, and because it's what we already have, datetime64 stores times as "some number of time units since an epoch -- a simple integer. And because we probably want fast absolute delta computation, when we add timezones, we'll probably want to store the datetime in UTC, and apply the timezone on I/O. Alexander: Am I right that we don't need the "fold" bit in this case? You'd still need it when specifying a time in a timezone with folds.. -- but again, only on I/O -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Oct 13, 2015 3:49 PM, "Chris Barker" <chris.barker@noaa.gov> wrote:
[...]
Except that ironically it actually can't compute absolute deltas accurately with one second resolution, because it does the POSIX time thing of pretending that all UTC days have the same number of seconds, even though this is not true (leap seconds). This isn't really relevant to anything else in this thread, except as a reminder of how freaky date/time handling is. -n

Maybe not directly relevant, but also very clearly why one should ideally not use these at all! Perhaps even less relevant, but if you do need absolute times (and thus work with UTC or TAI or GPS), have a look at astropy's `Time` class. It does use two doubles, but with that maintains "sub-nanosecond precision over times spanning the age of the universe" [1]. And it even converts to strings nicely! -- Marten [1] http://docs.astropy.org/en/latest/time/index.html

On Tue, Oct 13, 2015 at 5:08 PM, Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:
Seriously, though -- if we are opening all this up, maybe it's worth considering other options, rather than kludging datetime64 -- particularly if there is something someone has already implemented and tested... But for now, Stephan's patches to make datetime64 far more useful and easy are very welcome! -CHB [1] http://docs.astropy.org/en/latest/time/index.html
-- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Tue, Oct 13, 2015 at 3:58 PM, Nathaniel Smith <njs@pobox.com> wrote:
Note that I said "fast", not "accurate" -- but the leap second thing may be one more reason not to call datetime64 "UTC" -- who's to say that "naive" time should include leap seconds :-) Also, we could certainly add a leap seconds implementation to the current infrastructure -- the real technical problem with that is how to keep the leap-seconds table up to date -- we have no way to know when there will be leap-seconds in the future... Also -- this may be one more reason to have a selectable epoch -- then you'd likely overlap fewer leap-seconds in a given us case.
This isn't really relevant to anything else in this thread, except as a reminder of how freaky date/time handling is.
yup -- it sure is. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Wed, Oct 14, 2015 at 10:34 AM, Phil Hodge <hodge@stsci.edu> wrote:
exactly -- so more than six month, we have no idea. and even within six months, then you'd need to update some sort of database of leapseconds to get it. So depending on what version of the DB someone was using, they'd get different answers. That could all get ugly :-( -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Tue, Oct 13, 2015 at 6:48 PM, Chris Barker <chris.barker@noaa.gov> wrote:
Since Guido hates leap seconds, PEP 495 is silent on this issue, but strictly speaking UTC leap seconds are "folds." AFAICT, a strictly POSIX system must repeat the same value of time_t when a leap second is inserted. While datetime will never extend the second field to allow second=60, with PEP 495, it is now possible to represent 23:59:60 as 23:59:59/fold=1. Apart from leap seconds, there is no need to use "fold" on datetimes that represent time in UTC or any timezone at a fixed offset from utc.

On Fri, Oct 16, 2015 at 10:19 AM, Alexander Belopolsky <ndarray@mac.com> wrote:
allow second=60, with PEP 495, it is now possible to represent 23:59:60 as 23:59:59/fold=1. Thanks -- If anyhone decides to actually get arond to leap seconds suport in numpy datetime, se can decide whether to do folds or allow second: 60. Off the top of my head, I think allowing a 60th second makes more sense -- jsut like we do leap years. Granted, external systems often don't understand/allow a 60th second, but they generally don't understand a fold bit, either.... -CHB
-- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Sat, Oct 17, 2015 at 6:59 PM, Chris Barker <chris.barker@noaa.gov> wrote:
Off the top of my head, I think allowing a 60th second makes more sense -- jsut like we do leap years.
Yet we don't implement DST by allowing the 24th hour. Even the countries that adjust the clocks at midnight don't do that. In some sense leap seconds are more similar to timezone changes (DST or political) because they are irregular and unpredictable. Furthermore, the notion of "fold" is not tied to a particular 24/60/60 system of encoding times and thus more applicable to numpy where times are encoded as binary integers.

On Sun, Oct 18, 2015 at 12:20 PM, Alexander Belopolsky <ndarray@mac.com> wrote:
Well, isn't that about conforming to already existing standards? DST is a civil construct -- and mst (all?) implementations use the convention of having repeated times. -- so that's what software has to deal with. IIUC, at least +some+standards handle leap seconds by adding a 60th (61st) second, rather than having a repeated one. So it's at least an option to do it that way. And it can then fit into the already existing standards for representing datetimes, etc. Does the "fold" flag approach for representing, well, "folds" exist in a widely used standards? It's my impression that it doesn't since we had to argue a lot about what to call it :-)
In some sense leap seconds are more similar to timezone changes (DST or political) because they are irregular and unpredictable.
in that regard, yes -- you need a constantly updating database to use them. but I don't know that that has any impact on how you represent them. They seem a lot more like leap years to me -- some februaries have a 29th day -- some hours on some days have a 61st second.
but there are no folds in the underlying integer representation -- that is the "continuous" time scale -- the folds (or leap seconds, or leap years, or any of the 24/60/60 business comes in only when you want to go to-from the "datetime" representation. If anyone decides to actually get around to leap seconds support in numpy
datetime, s/he can decide ...
This attitude is the reason why we will probably never have bug free software when it comes to civil time reckoning. OK -- fair enough -- good to think about it sooner than later. Similarly, current numpy.datetime64 design ties arithmetic with encoding.
This makes arithmetic easier, but in the long run may preclude designs that better match the problem domain.
I don't follow here -- how can you NOT tied arithmetic to encoding? sure you could decide that you are going to overload the arithmetic, and it's up t the object that encodes the data to do that math -- but that's pretty much what datetime64 is doing -- defining an encoding so that it can do math -- numpy dtypes are very much about binary representation. No reason one couldn't make a different numpy dtype for datetimes that encoded it a different way, and then it would have to implement math, too. Note how the development of PEP 495 has highlighted the fact that allowing binary operations (subtraction, comparison etc.) between times in different timezones was a design mistake. It will be wise to learn from such mistakes when redesigning numpy.datetime64. So was not considering folds -- frankly, and I this this may be your point, I don't think timezones were well thought out at all when datetime was first introduced -- and however well thought out it was, if you don't provide an implementation, you are not going to find the limitations. And despite Tim's articulate defense of the original impp;imentation decisions, I think encoding the datetime in the local "calendar/clock" just invites a mess. And I'm quite convinced that it wouldn't be a the way to go for numpy use-cases. If you ever plan to support civil time in some form, you should think about it now. well, the goal for now is naive time -- and unlike the original datetime -- we are not adding on a "you can implement your own timezone handling this way" hook yet.
Indeed. Though will that only occur with timezones that have DST? I know I'd be fine with NOT being able to create a numpy datetime64 from a non-naive datetime object. Which would force the user to think about and convert to the timezone they want before passing off to numpy. Unless you can suggest a sensible default way to handle this. At first blush, I think naive time does not have folds, so there is no way to handle them "properly" Also -- I think we are at phase one of a (at least) two step process: 1) clean up datetime64 just enough that it is useful, and less error-prone -- i.e. have it not pretend to support anything other than naive datetimes. 2) Do it right -- perhaps adding some time zone support. This is going to wait until the numpy dtype machinery is cleaned up some. Phase 2 is where we really need the thinking ahead. And I'm still confused about what thinking ahead needs to be done for potential leap second support. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Mon, Oct 19, 2015 at 3:34 PM, Chris Barker <chris.barker@noaa.gov> wrote:
datetime.now() returns *naive* datetime objects unless you supply the timezone. In Python 3.6 *naive* datetime objects will have the fold attribute and datetime.now() will occasionally return fold=1 values unless your system timezone has a fixed UTC offset.

On Mon, Oct 19, 2015 at 12:34 PM, Chris Barker <chris.barker@noaa.gov> wrote:
I agree with Chris. My intent with this work for now (for NumPy 1.11) is simply to complete phase 1. Once NumPy stops pretending to be time zone aware (and with a few other small cleanups), datetime64 will be far more useable. For major fixes, we'll have to wait until dtype support is better. Alexander -- by "mst" I think Chris meant "most". Best, Stephan

On Mon, Oct 19, 2015 at 4:12 PM, Stephan Hoyer <shoyer@gmail.com> wrote:
This is fine. Just be aware that *naive* datetimes will also have the PEP 495 "fold" attribute in Python 3.6. You are free to ignore it, but you will loose the ability to round-trip between naive stdlib datetimes and numpy.datetime64.

This is fine. Just be aware that *naive* datetimes will also have the PEP 495 "fold" attribute in Python 3.6. You are free to ignore it, but you will loose the ability to round-trip between naive stdlib datetimes and numpy.datetime64.
Sigh. I can see why it's there ( primarily to support now(), I suppose). But a naive datetime doesn't have a timezone, so how could you know what time one actually corresponds to if fold is True? And what could you do with it if you did know? I've always figured that if you are using naive time for times in a timezone that has DST, than you'd better know wether you were in DST or not. (Which fold tells you, I guess) but the fold isn't guaranteed to be an hour is it? So without more info, what can you do? And if the fold bit is False, then you still have no idea if you are in DST or not. And then what if you attach a timezone to it? Then the fold bit could be wrong... I take it back, I can't see why the fold bit could be anything but confusing for a naive datetime. :-) Anyway, all I can see to do here is for the datetime64 docs to say that fold is ignored if it's there. But what should datetime64 do when provided with a datetime with a timezone? - Raise an exception? - ignore the timezone? - Convert to UTC? If the time zone is ignored, then you could get DST and non DST times in the same array - that could be ugly. Is there any way to query a timezone object to ask if it's a constant-offset? And yes, I did mean "most". There is no way I'm ever going to introduce a three letter "timezone" abbreviation in one of these threads! -CHB

On Sat, Oct 17, 2015 at 6:59 PM, Chris Barker <chris.barker@noaa.gov> wrote:
If anyone decides to actually get around to leap seconds support in numpy datetime, s/he can decide ...
This attitude is the reason why we will probably never have bug free software when it comes to civil time reckoning. Even though ANSI C has the difftime(time_t time1, time_t time0) function which in theory may not reduce to time1 - time0, in practice it is only useful to avoid overflows in integer to float conversions in cross-platform code and cannot account for the fact some days are longer than others. Similarly, current numpy.datetime64 design ties arithmetic with encoding. This makes arithmetic easier, but in the long run may preclude designs that better match the problem domain. Note how the development of PEP 495 has highlighted the fact that allowing binary operations (subtraction, comparison etc.) between times in different timezones was a design mistake. It will be wise to learn from such mistakes when redesigning numpy.datetime64. If you ever plan to support civil time in some form, you should think about it now. In Python 3.6, datetime.now() will return different values in the first and the second repeated hour in the "fall-back fold." If you allow datetime.datetime to numpy.datetime64 conversion, you should decide what you do with that difference.

On Mon, Oct 12, 2015 at 12:10 AM, Stephan Hoyer <shoyer@gmail.com> wrote:
I'm dubious about supporting conversions in the long run -- even "Z" -- because UTC datetimes and naive datetimes are really not the same thing. OTOH maybe if we dropped this it would break everyone's code and they would hate us -- I actually have no idea what people are doing with datetime64 outside of pandas. One way to find out is to start issuing DeprecationWarnings and see if anyone notices :-). (Though of course this is far from fool-proof.)
One possible strategy here would be to do some corpus analysis to find out whether anyone is actually using it, like I did for the ufunc ABI stuff: https://github.com/njsmith/codetrawl https://github.com/njsmith/ufunc-abi-analysis "datetime_to_string" is an easy token to search for, though it looks like enough people have their own functions named that that you'd have to do a bit of filtering to ignore non-numpy-related uses. A filter("content", "import.*numpy") would collect all files that import numpy into a single group for further examination. -n -- Nathaniel J. Smith -- http://vorpus.org

On Mon, Oct 12, 2015 at 12:38 AM, Nathaniel Smith <njs@pobox.com> wrote:
Yes, this is a good approach. I actually mistyped the name here -- it's actually "datetime_as_string". A GitHub search does turn up a handful of uses outside of NumPy: https://github.com/search?utf8=%E2%9C%93&q=numpy.datetime_as_string+in%3Afile%2Cpath+NOT+numpy%2Fcore+NOT+test_datetime.py+NOT+arrayprint.py&type=Code&ref=searchresults That said, I'm not sure it's worth going to the trouble to ensure it continues to work in the future. This function was entirely undocumented, and doesn't even have an inspectable function signature. Stephan

On Mon, Oct 12, 2015 at 12:38 AM, Nathaniel Smith <njs@pobox.com> wrote:
no -- but almost!
OTOH maybe if we dropped this it would break everyone's code and they would hate us --
I think it probably would. In the current implementation, an ISO string without an offset specifier is converted using the system's locale timezone. So to get naive time (or UTC), we need to tack a Z (or 00:00) on there. So killing that would likely break a lot of code! And excepting a Z or 00:00 and then treating it as naive, while being perhaps misleading, would not actually change any results. So I say we keep it. Depreciating it eventually would be good in the long run -- but maybe when we have actual time zone support. I actually have no idea what people are
doing with datetime64 outside of pandas.
What do we need to do with this not to break Panda? I'm guessing more people use datetime64 wrapped by Pandas than any other way... (not me, though)
Well, I'm not using it :-) though I can see that it might be pretty useful. Though once we get rid of datetime64 adjusting for the locale timezone, maybe not anymore. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Mon, Oct 12, 2015 at 3:10 AM, Stephan Hoyer <shoyer@gmail.com> wrote:
If you are going to make datetime64 more like datetime.datetime, please consider adding the "fold" bit. See PEP 495. [1] [1]: https://www.python.org/dev/peps/pep-0495/

I'd be totally in support of switching to timezone naive form. While it would be ideal that everyone stores their dates in UTC, the real world is messy and most of the time, people are just loading dates as-is and don't even care about timezones. I work on machines with different TZs, and I hate it when I save a bunch of data on one machine in UTC, but then go to view it on my local machine and everything is shifted. It gets even more confusing around DST switches because it gets all mixed up. Ben Root On Mon, Oct 12, 2015 at 2:48 PM, Alexander Belopolsky <ndarray@mac.com> wrote:

On Mon, Oct 12, 2015 at 11:48 AM, Alexander Belopolsky <ndarray@mac.com> wrote:
well, adding any timezone support is not (yet) in the table. (no need for "fold" with purely naive time, yes?) But yes, when we get there, absolutely. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Oct 12, 2015 11:48 AM, "Alexander Belopolsky" <ndarray@mac.com> wrote:
On Mon, Oct 12, 2015 at 3:10 AM, Stephan Hoyer <shoyer@gmail.com> wrote:
The tentative consensus from last year's discussion was that we should
make datetime64 timezone naive, like the standard library's datetime.datetime
If you are going to make datetime64 more like datetime.datetime, please
consider adding the "fold" bit. See PEP 495. [1]
The challenge here is that we literally do not have a bit too use :-) Unless we make it datetime65 + 63 bits of padding, stealing a bit to use for fold would halve the range of representable times, and I'm guessing this would not be acceptable? -- pandas's 64-bits-of-nanoseconds already has a somewhat narrow range (584 years). I think for now the two goals are to make the built in datetime64 minimally functional and self consistent, and to make it possible for fancier datetime needs to be handled using third party dtypes. -n

On Tue, Oct 13, 2015 at 3:21 PM, Nathaniel Smith <njs@pobox.com> wrote:
If you are going to make datetime64 more like datetime.datetime, please consider adding the "fold" bit. See PEP 495. [1]
The challenge here is that we literally do not have a bit too use :-)
hmm -- I was first thinking that this could all be in the timezone stuff (when we get there), but while I imagine we'll want an entire array to be in a single timezone, each individual value would need its own "fold" flag. But in any case, we don't need it 'till we do timezones, and my understanding is that we aren't' going to do timezones until we have the mythical new-and-improved-dtype-system. So a future datetime dtype could be 64 bits + a byte of extra info, or be 63 bits plus the fold flag, or...
well, not now, with eh fixed epoch, but if the epoch could be adjusted, maybe a small range would be fine -- who need nanosecond accuracy, AND centuries of range? Thinking a bit more here: For those that didn't follow the massive discussion on this on Python-dev and the new datetime list: the fold flag is required to round-trip properly for timezones with discontiguous time -- i.e. Daylight savings. So if you have: 2015-11-01T01:30 Do you mean the first 1:30 am or the seconds one, after the DST transition? (i.e. in the fold, or not?) So it is key, for Python's Datetime, to make sure to keep that information around. However: Python's datetime was designed to be optimized for: - converting between datetime and other representations in Database, etc. - fast math for "naive time" -- i.e. basic manipulations within the same timezone, like "one day later" - Fast math for "absolute relative deltas" is of secondary concern. The result of this is that datetime stores: year, month, day, hour minute second, microsecond It does NOT store some time_unit_since_an_epch, like unix time or numpy datetime64. Also, IIUC, when you associate a datetime with a timezone, it stores the year, month, day, hour, second,... in the specified timezone -- NOT in UTC, or anything else. This makes manipulations within that timezone easy -- the next day simply required adding a day to teh day field (then normalizing to the month). Given all that -- the "fold" bit is needed, as a particular datetime in a particular timezone may have more than one meaning. Note that to compute a proper time span between two "aware" datetimes, it is necessary to convert to UTC, do the math, then convert back to the timezone you want. However, numpy datetime is optimized for compact storage and fast computation of absolute deltas (actual hours, minutes, seconds... not calendar units like "the next day" ). Because of this, and because it's what we already have, datetime64 stores times as "some number of time units since an epoch -- a simple integer. And because we probably want fast absolute delta computation, when we add timezones, we'll probably want to store the datetime in UTC, and apply the timezone on I/O. Alexander: Am I right that we don't need the "fold" bit in this case? You'd still need it when specifying a time in a timezone with folds.. -- but again, only on I/O -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Oct 13, 2015 3:49 PM, "Chris Barker" <chris.barker@noaa.gov> wrote:
[...]
Except that ironically it actually can't compute absolute deltas accurately with one second resolution, because it does the POSIX time thing of pretending that all UTC days have the same number of seconds, even though this is not true (leap seconds). This isn't really relevant to anything else in this thread, except as a reminder of how freaky date/time handling is. -n

Maybe not directly relevant, but also very clearly why one should ideally not use these at all! Perhaps even less relevant, but if you do need absolute times (and thus work with UTC or TAI or GPS), have a look at astropy's `Time` class. It does use two doubles, but with that maintains "sub-nanosecond precision over times spanning the age of the universe" [1]. And it even converts to strings nicely! -- Marten [1] http://docs.astropy.org/en/latest/time/index.html

On Tue, Oct 13, 2015 at 5:08 PM, Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:
Seriously, though -- if we are opening all this up, maybe it's worth considering other options, rather than kludging datetime64 -- particularly if there is something someone has already implemented and tested... But for now, Stephan's patches to make datetime64 far more useful and easy are very welcome! -CHB [1] http://docs.astropy.org/en/latest/time/index.html
-- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Tue, Oct 13, 2015 at 3:58 PM, Nathaniel Smith <njs@pobox.com> wrote:
Note that I said "fast", not "accurate" -- but the leap second thing may be one more reason not to call datetime64 "UTC" -- who's to say that "naive" time should include leap seconds :-) Also, we could certainly add a leap seconds implementation to the current infrastructure -- the real technical problem with that is how to keep the leap-seconds table up to date -- we have no way to know when there will be leap-seconds in the future... Also -- this may be one more reason to have a selectable epoch -- then you'd likely overlap fewer leap-seconds in a given us case.
This isn't really relevant to anything else in this thread, except as a reminder of how freaky date/time handling is.
yup -- it sure is. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Wed, Oct 14, 2015 at 10:34 AM, Phil Hodge <hodge@stsci.edu> wrote:
exactly -- so more than six month, we have no idea. and even within six months, then you'd need to update some sort of database of leapseconds to get it. So depending on what version of the DB someone was using, they'd get different answers. That could all get ugly :-( -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Tue, Oct 13, 2015 at 6:48 PM, Chris Barker <chris.barker@noaa.gov> wrote:
Since Guido hates leap seconds, PEP 495 is silent on this issue, but strictly speaking UTC leap seconds are "folds." AFAICT, a strictly POSIX system must repeat the same value of time_t when a leap second is inserted. While datetime will never extend the second field to allow second=60, with PEP 495, it is now possible to represent 23:59:60 as 23:59:59/fold=1. Apart from leap seconds, there is no need to use "fold" on datetimes that represent time in UTC or any timezone at a fixed offset from utc.

On Fri, Oct 16, 2015 at 10:19 AM, Alexander Belopolsky <ndarray@mac.com> wrote:
allow second=60, with PEP 495, it is now possible to represent 23:59:60 as 23:59:59/fold=1. Thanks -- If anyhone decides to actually get arond to leap seconds suport in numpy datetime, se can decide whether to do folds or allow second: 60. Off the top of my head, I think allowing a 60th second makes more sense -- jsut like we do leap years. Granted, external systems often don't understand/allow a 60th second, but they generally don't understand a fold bit, either.... -CHB
-- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Sat, Oct 17, 2015 at 6:59 PM, Chris Barker <chris.barker@noaa.gov> wrote:
Off the top of my head, I think allowing a 60th second makes more sense -- jsut like we do leap years.
Yet we don't implement DST by allowing the 24th hour. Even the countries that adjust the clocks at midnight don't do that. In some sense leap seconds are more similar to timezone changes (DST or political) because they are irregular and unpredictable. Furthermore, the notion of "fold" is not tied to a particular 24/60/60 system of encoding times and thus more applicable to numpy where times are encoded as binary integers.

On Sun, Oct 18, 2015 at 12:20 PM, Alexander Belopolsky <ndarray@mac.com> wrote:
Well, isn't that about conforming to already existing standards? DST is a civil construct -- and mst (all?) implementations use the convention of having repeated times. -- so that's what software has to deal with. IIUC, at least +some+standards handle leap seconds by adding a 60th (61st) second, rather than having a repeated one. So it's at least an option to do it that way. And it can then fit into the already existing standards for representing datetimes, etc. Does the "fold" flag approach for representing, well, "folds" exist in a widely used standards? It's my impression that it doesn't since we had to argue a lot about what to call it :-)
In some sense leap seconds are more similar to timezone changes (DST or political) because they are irregular and unpredictable.
in that regard, yes -- you need a constantly updating database to use them. but I don't know that that has any impact on how you represent them. They seem a lot more like leap years to me -- some februaries have a 29th day -- some hours on some days have a 61st second.
but there are no folds in the underlying integer representation -- that is the "continuous" time scale -- the folds (or leap seconds, or leap years, or any of the 24/60/60 business comes in only when you want to go to-from the "datetime" representation. If anyone decides to actually get around to leap seconds support in numpy
datetime, s/he can decide ...
This attitude is the reason why we will probably never have bug free software when it comes to civil time reckoning. OK -- fair enough -- good to think about it sooner than later. Similarly, current numpy.datetime64 design ties arithmetic with encoding.
This makes arithmetic easier, but in the long run may preclude designs that better match the problem domain.
I don't follow here -- how can you NOT tied arithmetic to encoding? sure you could decide that you are going to overload the arithmetic, and it's up t the object that encodes the data to do that math -- but that's pretty much what datetime64 is doing -- defining an encoding so that it can do math -- numpy dtypes are very much about binary representation. No reason one couldn't make a different numpy dtype for datetimes that encoded it a different way, and then it would have to implement math, too. Note how the development of PEP 495 has highlighted the fact that allowing binary operations (subtraction, comparison etc.) between times in different timezones was a design mistake. It will be wise to learn from such mistakes when redesigning numpy.datetime64. So was not considering folds -- frankly, and I this this may be your point, I don't think timezones were well thought out at all when datetime was first introduced -- and however well thought out it was, if you don't provide an implementation, you are not going to find the limitations. And despite Tim's articulate defense of the original impp;imentation decisions, I think encoding the datetime in the local "calendar/clock" just invites a mess. And I'm quite convinced that it wouldn't be a the way to go for numpy use-cases. If you ever plan to support civil time in some form, you should think about it now. well, the goal for now is naive time -- and unlike the original datetime -- we are not adding on a "you can implement your own timezone handling this way" hook yet.
Indeed. Though will that only occur with timezones that have DST? I know I'd be fine with NOT being able to create a numpy datetime64 from a non-naive datetime object. Which would force the user to think about and convert to the timezone they want before passing off to numpy. Unless you can suggest a sensible default way to handle this. At first blush, I think naive time does not have folds, so there is no way to handle them "properly" Also -- I think we are at phase one of a (at least) two step process: 1) clean up datetime64 just enough that it is useful, and less error-prone -- i.e. have it not pretend to support anything other than naive datetimes. 2) Do it right -- perhaps adding some time zone support. This is going to wait until the numpy dtype machinery is cleaned up some. Phase 2 is where we really need the thinking ahead. And I'm still confused about what thinking ahead needs to be done for potential leap second support. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Mon, Oct 19, 2015 at 3:34 PM, Chris Barker <chris.barker@noaa.gov> wrote:
datetime.now() returns *naive* datetime objects unless you supply the timezone. In Python 3.6 *naive* datetime objects will have the fold attribute and datetime.now() will occasionally return fold=1 values unless your system timezone has a fixed UTC offset.

On Mon, Oct 19, 2015 at 12:34 PM, Chris Barker <chris.barker@noaa.gov> wrote:
I agree with Chris. My intent with this work for now (for NumPy 1.11) is simply to complete phase 1. Once NumPy stops pretending to be time zone aware (and with a few other small cleanups), datetime64 will be far more useable. For major fixes, we'll have to wait until dtype support is better. Alexander -- by "mst" I think Chris meant "most". Best, Stephan

On Mon, Oct 19, 2015 at 4:12 PM, Stephan Hoyer <shoyer@gmail.com> wrote:
This is fine. Just be aware that *naive* datetimes will also have the PEP 495 "fold" attribute in Python 3.6. You are free to ignore it, but you will loose the ability to round-trip between naive stdlib datetimes and numpy.datetime64.

This is fine. Just be aware that *naive* datetimes will also have the PEP 495 "fold" attribute in Python 3.6. You are free to ignore it, but you will loose the ability to round-trip between naive stdlib datetimes and numpy.datetime64.
Sigh. I can see why it's there ( primarily to support now(), I suppose). But a naive datetime doesn't have a timezone, so how could you know what time one actually corresponds to if fold is True? And what could you do with it if you did know? I've always figured that if you are using naive time for times in a timezone that has DST, than you'd better know wether you were in DST or not. (Which fold tells you, I guess) but the fold isn't guaranteed to be an hour is it? So without more info, what can you do? And if the fold bit is False, then you still have no idea if you are in DST or not. And then what if you attach a timezone to it? Then the fold bit could be wrong... I take it back, I can't see why the fold bit could be anything but confusing for a naive datetime. :-) Anyway, all I can see to do here is for the datetime64 docs to say that fold is ignored if it's there. But what should datetime64 do when provided with a datetime with a timezone? - Raise an exception? - ignore the timezone? - Convert to UTC? If the time zone is ignored, then you could get DST and non DST times in the same array - that could be ugly. Is there any way to query a timezone object to ask if it's a constant-offset? And yes, I did mean "most". There is no way I'm ever going to introduce a three letter "timezone" abbreviation in one of these threads! -CHB

On Sat, Oct 17, 2015 at 6:59 PM, Chris Barker <chris.barker@noaa.gov> wrote:
If anyone decides to actually get around to leap seconds support in numpy datetime, s/he can decide ...
This attitude is the reason why we will probably never have bug free software when it comes to civil time reckoning. Even though ANSI C has the difftime(time_t time1, time_t time0) function which in theory may not reduce to time1 - time0, in practice it is only useful to avoid overflows in integer to float conversions in cross-platform code and cannot account for the fact some days are longer than others. Similarly, current numpy.datetime64 design ties arithmetic with encoding. This makes arithmetic easier, but in the long run may preclude designs that better match the problem domain. Note how the development of PEP 495 has highlighted the fact that allowing binary operations (subtraction, comparison etc.) between times in different timezones was a design mistake. It will be wise to learn from such mistakes when redesigning numpy.datetime64. If you ever plan to support civil time in some form, you should think about it now. In Python 3.6, datetime.now() will return different values in the first and the second repeated hour in the "fall-back fold." If you allow datetime.datetime to numpy.datetime64 conversion, you should decide what you do with that difference.
participants (8)
-
Alexander Belopolsky
-
Benjamin Root
-
Chris Barker
-
Chris Barker - NOAA Federal
-
Marten van Kerkwijk
-
Nathaniel Smith
-
Phil Hodge
-
Stephan Hoyer