[Numpy-discussion] fixing up datetime
Christopher Barker
Chris.Barker at noaa.gov
Thu Jun 2 15:45:37 EDT 2011
Mark Wiebe wrote:
> It is possible to implement the system so that if you don't use Y/M/B,
> things work out unambiguously, but if you do use them you get a behavior
> that's a little weird, but with rules to eliminate the calendar-created
> ambiguities.
yes, but everyone wants different rules -- so it needs to be very clear
which rules are in place, and there needs to be a way for a user to
specify his/her own rules.
> For the business day unit, what I'm currently trying to do
> is get an assessment of whether my proposed design the right abstraction
> to support all the use cases of people who want it.
Good plan.
> I rather agree here, adding the 'origin' back in is definitely worth
> considering. How is the origin represented in the CF netcdf code?
As a ISO 1601 string. I can't recall if you have the option of
specifying a non-standard calendar.
> So using the calendar specified by ISO 8601 as the default for the
> calendar-based functions is undesirable?
no -- that's fine -- but does ISO 8601 specify stuff like business day?
> I think supporting it to a
> small extent is reasonable, and support for any other calendars or more
> advanced calendar-based functions would go in support libraries.
yup -- what I'm trying to press here is the distinction between linear
time units and the "weird" concepts, like business day, month, etc.
I think there are two related, but distinct issues:
1) representation/specification of a "datetime". The idea here is that
imagine that there is a continuous property called time (which I suppose
has a zero at the Big Bang). We need a way to define where (when) in
that continuum a given event, or set of events occurred. This is what
the datetime dtype is about. I think the standard of "some-time-unit
since some-reference-datetime, in some-calendar" is fine, but that the
time-unit should be unambiguously and clearly defined, and not change
with when it occurs, i.e. seconds, hours, days, but not months, years,
or business days.
2) time spans, and math with time: i.e. timedeltas --- this falls into 2
categories:
a) simple linear time units: seconds, hours, etc. This is quite
straightforward, if working with other time deltas and datetimes all
expressed in well-defined linear units.
b) calendar manipulations: "months since", "business days since",
once a month, "the first sunday of teh month", "next monday". These
require a well defined and complex Calendar, and there are many possible
such Calendars.
What I'm suggesting is that (a) and (b) should be kept quite distinct,
and that it should be fairly easy to define and use custom Calendars
defined for (b).
(a) and (b) could be merged, with various defaults and exceptions raised
for poorly defined operations, but I think that'll be less clear, harder
to implement, and more prone to error.
A little example, again from the CF mailing list (which spawned the
discussion). In the CF standard the units available are defined as
"those supported by the udunits library":
http://www.unidata.ucar.edu/software/udunits/
It turns out that udunits only supports time manipulation as I specified
as (a) i.e. only clearly defined linear time units. However, they do
define "months" and "years", as specific values (something like 365.25
days/year and 12 months/year -- though they also have "Julian-year",
"leap_year", etc)
So folks would specify a time axes as : "months since 2010-01" and
expect that they were getting calandar months, like "1" would mean Feb,
2010, instaed of January 31, 2010 (or whatever).
Anyway, lots of room for confusion, so whatever we come up with needs to
be clearly defined.
-Chris
--
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker at noaa.gov
More information about the NumPy-Discussion
mailing list