[Numpy-discussion] fixing up datetime

Christopher Barker Chris.Barker at noaa.gov
Thu Jun 2 12:22:24 EDT 2011


Charles R Harris wrote:
>  Good support for units and delta times is very useful, but
> parsing dates and times and handling timezones, daylight savings, leap 
> seconds, business days, etc., is probably best served by addon packages 
> specialized to an area of interest. Just my $.02

I agree here -- I think for numpy, what's key is to focus on the kind of 
things needed for computational use -- that is the performance critical 
stuff.

I suppose business-day type calculations would be both key and 
performance-critical, but that sure seems like the kind of thing that 
should go in an add-on package, rather than in numpy.

The stdlib datetime package is a little bit too small for my taste 
(couldn't I at least as for a TimeDelta to be expressed in, say, 
seconds, without doing any math on my own?), but the idea is good -- 
create the core types, let add-on packages do the more specialized stuff.

> * The existing datetime-related API is probably not useful, and in fact 
> those functions aren't used internally anymore. Is it reasonable to 
> remove the functions, or do we just deprecate them?

I say remove, but some polling to see if anyone is using it might be in 
order first.

> * Leap seconds probably deserve a rigorous treatment, but having an 
> internal representation with leap-seconds overcomplicates otherwise very 
> simple and fast operations.

could you explain more? I don't get the issues -- leap seconds would com 
e in for calculations like: a_given_datetime + a_timedelta, correct? 
Given leap years, and all the other ugliness, does leap seconds really 
make it worse?

> * Default conversion to string - should it be in UTC or with the local 
> timezone baked in?

most date_time handling should be time-zone neutral --i.e. assume 
everything is in the same timezone (and daylight savings status). Libs 
that assume you want the locale setting do nothing but cause major pain 
if you have anything out of the ordinary to do (and sometimes ordinary 
stuff, too).

If you MUST include time-zone, exlicite is better than implicit -- have 
the user specify, or, at the very least make it easy for the user to 
override any defaults.

> As UTC it may be confusing because 'now' will print 
> as a different time than people would expect.

I think "now" should be expressed (but also stored) in the local time, 
unless the user asks for UTC. This is consistent with the std lib 
datetime.now(), if nothing else.

> * Should the NaT (not-a-time) value behave like floating-point NaN? i.e. 
> NaT == NaT return false, etc. Should operations generating NaT trigger 
> an 'invalid' floating point exception in ufuncs?

makes sense to me -- at least many folks are used to NaN symantics.

>     And after the removal of datetime from 1.4.1 and now this, I'd be in
>     favor of putting a large "experimental" sticker over the whole thing
>     until further notice.
>  
> Do we have a good way to do that?

Maybe a "experimental" warning, analogous to the "deprecation" warning.

>     Good support for units and delta times is very useful,

> This part works fairly well now, except for some questions like what
> should datetime("2011-01-30", "D") + timedelta(1, "M") produce. Maybe
> "2011-02-28", or "2011-03-02"?

Neither -- "month" should not be a valid unit to express a timedelta in. 
Nor should year, or anything else that is not clearly defined (we can 
argue about day, which does change a bit as the earth slows down, yes?)

Yes, it's nice to be able to easily have a way of expressing things like 
every month, or "a month from now" when you mean a calendar month, but 
it's a heck of a can of worms.

We just had a big discussion about this in the netcdf CF metadata 
standards list:

http://mailman.cgd.ucar.edu/pipermail/cf-metadata/2011/007807.html

We more or less came to the conclusion (I did, anyway) that there were 
two distinct, but related concepts:

1) time as a strict unit of measurement, like length, mass, etc. In that 
case, don't use "months" as a unit.

2) Calendars -- these are what months, days of week, etc, etc, etc. are 
from, and these get ugly. I also learned that there are even more 
calendars than I thought. Beyond the Julian, Gregorian, etc, there are 
special ones used for climate modeling and the like, that have nice 
properties like all months being 30 days long, etc. Plus, as discussed, 
various "business" calendars.

So: I think that the calendar-related functions need fairly self 
contained library, with various classes for the various calendars one 
might want to use, and a well specified way to define new ones.


-Chris





-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov



More information about the NumPy-Discussion mailing list