[Numpy-discussion] fixing up datetime

Christopher Barker Chris.Barker at noaa.gov
Thu Jun 2 12:57:52 EDT 2011


Mark Wiebe wrote:
> I'm following what I understand the NEP to mean for combining dates and 
> deltas of different units. This means for timedeltas, the metadata 
> becomes more precise, in particular it becomes the GCD of the input 
> metadata, and between timedelta and datetime the datetime always dominates.
> 
> https://github.com/numpy/numpy/blob/master/doc/neps/datetime-proposal.rst

Thanks for posting this link -- a few comments on that doc follow.

> Only Years, Months, and Business Days have a nonlinear relationship with 
> the other units, so they're the only problem case for this. They can be 
> arbitrarily special-cased based on what is decided to make the most sense.

As mentioned on my recent post -- this stuff should be handles by some 
sort of "calendar" classes -- there is no one way to do that! So numpy 
should provide datetime and timedelta data types that can be used, but a 
timedelta should _not_ ever be defined by these weird variable units.

I guess what I'm getting is that:

a_date_time + a_timedelta

is a fundamentally different operation than:

a_date_time + a_calendar_defined_timespan

The former can follow all the usual math properties for addition, but 
the later doesn't.

About the NEP:

"""
A representation is also supported such that the stored date-time 
integer can encode both the number of a particular unit as well as a 
number of sequential events tracked for each unit.
"""

I'm not sure I understand what this really means, but I _think_ I agree 
with Pierre that this is unnecessary complication - couldn't it be 
handled by multiple arrays, or maybe a structured dtype?

"""
The datetime64 represents an absolute time. Internally it is represented 
as the number of time units between the intended time and the epoch 
(12:00am on January 1, 1970 --- POSIX time including its lack of leap 
seconds).
"""

The CF netcdf metadata standard provides for times to be specified as 
"units since a_date_time". units can be seconds, hours, days, etc (it 
does allow months and years, but it shouldn't!). This is nice, flexible 
system that makes it easy to capture wildly different scales needed: 
from nanoseconds to millennia. Similarly, we might want to consider a 
datetime dtype as containing a reference datetime, and a tic unit.

I think the "Time units" section does specify that you can use various 
units, but it looks like the NEP sticks with the single POSIX epoch.

I see later in the NEP:
"""
However, after thinking more about this, we found that the combination 
of an absolute datetime64 with a relative timedelta64 does offer the 
same functionality while removing the need for the additional origin 
metadata. This is why we have removed it from this proposal.
"""
hmmm -- I don't think that's the case -- you need the "origin" if you 
want to represent something like nanoseconds as a datetime, far away 
from the epoch. Sure, you can supply your own by keeping the origin and 
a timedelta array separately, by you could do that for all uses, also, 
and the point of this is to make working with datetimes easy. If we're 
going to allow different units, we might as well have different "origins".


I also don't think that units like "month", "year", "business day" 
should be allowed -- it just adds confusion. It's not a killer if they 
are defined in the spec:

1 year = 365.25 days (for instance0
1 month = 1year/12

But I think it's better to simply disallow them, and keep that use for 
what I'm calling the "Calendar" functions. And "business day" is 
particularly ugly, and, I'm sure defined differently in different places.

-Chris









-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov



More information about the NumPy-Discussion mailing list