[Numpy-discussion] Making datetime64 timezone naive

Chris Barker chris.barker at noaa.gov
Tue Oct 13 18:48:38 EDT 2015


On Tue, Oct 13, 2015 at 3:21 PM, Nathaniel Smith <njs at pobox.com> wrote:


> > If you are going to make datetime64 more like datetime.datetime, please
> consider adding the "fold" bit.  See PEP 495. [1]
>
The challenge here is that we literally do not have a bit too use :-)
>

hmm -- I was first thinking that this could all be in the timezone stuff
(when we get there), but while I imagine we'll want an entire array to be
in a single timezone, each individual value would need its own "fold" flag.

But in any case, we don't need it 'till we do timezones, and my
understanding is that we aren't' going to do timezones until we have the
mythical new-and-improved-dtype-system.

So a future datetime dtype could be 64 bits + a byte of extra info, or be
63 bits plus the fold flag, or...

> Unless we make it datetime65 + 63 bits of padding, stealing a bit to use
> for fold would halve the range of representable times, and I'm guessing
> this would not be acceptable?
>
well, not now, with eh fixed epoch, but if the epoch could be adjusted,
maybe a small range would be fine -- who need nanosecond accuracy, AND
centuries of range?

Thinking a bit more here:


For those that didn't follow the massive discussion on this on Python-dev
and the new datetime list:

the fold flag is required to round-trip properly for timezones with
discontiguous time -- i.e. Daylight savings. So if you have:

2015-11-01T01:30

Do you mean the first 1:30 am or the seconds one, after the DST transition?
(i.e. in the fold, or not?)

So it is key, for Python's Datetime, to make sure to keep that information
around.

However: Python's datetime was designed to be optimized for:
  - converting between datetime and other representations in Database, etc.
  - fast math for "naive time" -- i.e. basic manipulations within the same
timezone, like "one day later"
  - Fast math for "absolute relative deltas" is of secondary concern.

The result of this is that datetime stores: year, month, day, hour minute
second, microsecond

It does NOT store some time_unit_since_an_epch, like unix time or numpy
datetime64.

Also, IIUC, when you associate a datetime with a timezone, it stores the
year, month, day, hour, second,... in the specified timezone -- NOT in UTC,
or anything else. This makes manipulations within that timezone easy -- the
next day simply  required adding a day to teh day field (then normalizing
to the month).

Given all that -- the "fold" bit is needed, as a particular datetime in a
particular timezone may have more than one meaning.

Note that to compute a proper time span between two "aware" datetimes, it
is necessary to convert to UTC, do the math, then convert back to the
timezone you want.

However, numpy datetime is optimized for compact storage and fast
computation of absolute deltas (actual hours, minutes, seconds... not
calendar units like "the next day" ).

Because of this, and because it's what we already have, datetime64 stores
times as "some number of time units since an epoch -- a simple integer.

And because we probably want fast absolute delta computation, when we add
timezones, we'll probably want to store the datetime in UTC, and apply the
timezone on I/O.

Alexander: Am I right that we don't need the "fold" bit in this case? You'd
still need it when specifying a time in a timezone with folds.. -- but
again, only on I/O

-Chris

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20151013/34db6f98/attachment.html>


More information about the NumPy-Discussion mailing list