[Numpy-discussion] fixing up datetime

Mark Wiebe mwwiebe at gmail.com
Thu Jun 2 12:42:06 EDT 2011


On Thu, Jun 2, 2011 at 11:22 AM, Christopher Barker
<Chris.Barker at noaa.gov>wrote:

> Charles R Harris wrote:
> >  Good support for units and delta times is very useful, but
> > parsing dates and times and handling timezones, daylight savings, leap
> > seconds, business days, etc., is probably best served by addon packages
> > specialized to an area of interest. Just my $.02
>
> I agree here -- I think for numpy, what's key is to focus on the kind of
> things needed for computational use -- that is the performance critical
> stuff.
>
> I suppose business-day type calculations would be both key and
> performance-critical, but that sure seems like the kind of thing that
> should go in an add-on package, rather than in numpy.
>

I agree anything specific to particular workweeks or holidays conventions
should be external, but the ability to specify them and do calculations with
them seems reasonable to me for numpy.


> The stdlib datetime package is a little bit too small for my taste
> (couldn't I at least as for a TimeDelta to be expressed in, say,
> seconds, without doing any math on my own?), but the idea is good --
> create the core types, let add-on packages do the more specialized stuff.
>
> > * The existing datetime-related API is probably not useful, and in fact
> > those functions aren't used internally anymore. Is it reasonable to
> > remove the functions, or do we just deprecate them?
>
> I say remove, but some polling to see if anyone is using it might be in
> order first.
>
> > * Leap seconds probably deserve a rigorous treatment, but having an
> > internal representation with leap-seconds overcomplicates otherwise very
> > simple and fast operations.
>
> could you explain more? I don't get the issues -- leap seconds would com
> e in for calculations like: a_given_datetime + a_timedelta, correct?
> Given leap years, and all the other ugliness, does leap seconds really
> make it worse?
>

Leap years are easy compared with leap seconds. Leap seconds involve a
hardcoded table of particular leap-seconds that are added or subtracted, and
are specified roughly 6 months in advance of when they happen by the
International
Earth Rotation and Reference Systems
Service<http://en.wikipedia.org/wiki/International_Earth_Rotation_and_Reference_Systems_Service>
 (IERS). The POSIX time_t doesn't store leap seconds, so if you subtract two
time_t values you may get the wrong answer by up to 34 seconds (or 24, I'm
not totally clear on the initial 10 second jump that happened somewhere).

> * Default conversion to string - should it be in UTC or with the local
> > timezone baked in?
>
> most date_time handling should be time-zone neutral --i.e. assume
> everything is in the same timezone (and daylight savings status). Libs
> that assume you want the locale setting do nothing but cause major pain
> if you have anything out of the ordinary to do (and sometimes ordinary
> stuff, too).
>

The conversion to string would be in ISO 8601 format with the local timezone
offset baked in, so it would still be an unambiguous UTC datetime. It would
just look better to people wanting to see local time.

If you MUST include time-zone, exlicite is better than implicit -- have
> the user specify, or, at the very least make it easy for the user to
> override any defaults.
>

The ISO 8601 timezone format is very explicit. It's an offset in minutes, so
any daylight savings, etc is already baked in and no knowledge of timezones
is required to work with it.

> As UTC it may be confusing because 'now' will print
> > as a different time than people would expect.
>
> I think "now" should be expressed (but also stored) in the local time,
> unless the user asks for UTC. This is consistent with the std lib
> datetime.now(), if nothing else.
>

I personally dislike this approach, and much prefer having datetime
unambiguously representing a particular time instead of depending on
context. I much prefer printing with the timezone baked in to show it as
local.

> * Should the NaT (not-a-time) value behave like floating-point NaN? i.e.
> > NaT == NaT return false, etc. Should operations generating NaT trigger
> > an 'invalid' floating point exception in ufuncs?
>
> makes sense to me -- at least many folks are used to NaN symantics.
>
> >     And after the removal of datetime from 1.4.1 and now this, I'd be in
> >     favor of putting a large "experimental" sticker over the whole thing
> >     until further notice.
> >
> > Do we have a good way to do that?
>
> Maybe a "experimental" warning, analogous to the "deprecation" warning.
>
> >     Good support for units and delta times is very useful,
>
> > This part works fairly well now, except for some questions like what
> > should datetime("2011-01-30", "D") + timedelta(1, "M") produce. Maybe
> > "2011-02-28", or "2011-03-02"?
>
> Neither -- "month" should not be a valid unit to express a timedelta in.
> Nor should year, or anything else that is not clearly defined (we can
> argue about day, which does change a bit as the earth slows down, yes?)
>

Why not? If a datetime is in months, adding a timedelta in months makes
perfect sense. It's just crossing between units which aren't linearly
related which is a problem.


> Yes, it's nice to be able to easily have a way of expressing things like
> every month, or "a month from now" when you mean a calendar month, but
> it's a heck of a can of worms.
>

So perhaps raising an exception in these cases is preferable to having a
default behavior.

We just had a big discussion about this in the netcdf CF metadata
> standards list:
>
> http://mailman.cgd.ucar.edu/pipermail/cf-metadata/2011/007807.html
>
> We more or less came to the conclusion (I did, anyway) that there were
> two distinct, but related concepts:
>
> 1) time as a strict unit of measurement, like length, mass, etc. In that
> case, don't use "months" as a unit.
>
> 2) Calendars -- these are what months, days of week, etc, etc, etc. are
> from, and these get ugly. I also learned that there are even more
> calendars than I thought. Beyond the Julian, Gregorian, etc, there are
> special ones used for climate modeling and the like, that have nice
> properties like all months being 30 days long, etc. Plus, as discussed,
> various "business" calendars.
>
> So: I think that the calendar-related functions need fairly self
> contained library, with various classes for the various calendars one
> might want to use, and a well specified way to define new ones.
>

The NumPy datetime is based on 1), using the metadata unit and stored as an
offset from 1970-01-01, modulo leap-seconds in a currently non-rigorous way.
For 2), when parsing/printing it uses a Gregorian calendar (extending beyond
when it's defined historically) from what I understand. Other calendar
systems would be for further add-on libraries to handle, I believe.

-Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110602/03748432/attachment.html>


More information about the NumPy-Discussion mailing list