[Numpy-discussion] fixing up datetime

alan at ajackson.org alan at ajackson.org
Mon Jun 13 12:59:30 EDT 2011


I'm joining this late (I've been traveling), but it might be useful to
look at the fairly new R module "lubridate". They have put quite some
thought into simplifying date handling, and when I have used it I have
generally been quite pleased. The documentation is quite readable.

Just Google it and it will be at the top.

>Hey all,
>
>So I'm doing a summer internship at Enthought, and the first thing they
>asked me to look into is finishing the datetime type in numpy. It turns out
>that the estimates of how complete the type was weren't accurate, and to
>support what the NEP describes required generalizing the ufunc type
>resolution system. I also found that the date/time parsing code (based on
>mxDateTime) was not robust, producing something for almost any arbitrary
>garbage input. I've replaced much of the broken code and implemented a lot
>of the functionality, and thought this might be a good point to do a pull
>request on what I've got and get feedback on the issues I've run into.
>
>* The existing datetime-related API is probably not useful, and in fact
>those functions aren't used internally anymore. Is it reasonable to remove
>the functions, or do we just deprecate them?
>
>* Leap seconds probably deserve a rigorous treatment, but having an internal
>representation with leap-seconds overcomplicates otherwise very simple and
>fast operations. Could we internally use a value matching TAI or GPS time?
>Currently it's a UTC time in the present, but the 1970 epoch is then not the
>UTC 1970 epoch, but 10s of seconds off, and this isn't properly specified.
>What are people's opinions? The Python datetime.datetime doesn't support
>leap seconds (seconds == 60 is disallowed).
>
>* Default conversion to string - should it be in UTC or with the local
>timezone baked in? As UTC it may be confusing because 'now' will print as a
>different time than people would expect.
>
>* Business days - The existing business idea doesn't seem very useful,
>representing just the western M-F work week and not accounting for holidays.
>I've come up with a design which might address these issues: Extend the
>metadata for business days with a string identifier, like 'M8[B:USA]', then
>have a global internal dictionary which maps 'USA' to a workweek mask and a
>list of holidays. The call to prepare this dictionary for a particular
>business day type might look like np.set_business_days('USA', [1, 1, 1, 1,
>1, 0, 0], np.array([ list of US holidays ], dtype='M8[D]')). Internally,
>business days would be stored the same as regular days, but with special
>treatment where landing on a weekend or holiday gives you back a NaT.
>
>If you are interested in the business day functionality, please comment on
>this design!
>
>* The dtype constructor accepted 'O#' for object types, something I think
>was wrong. I've removed that, but allow # to be 4 or 8, producing a
>deprecation warning if it occurs.
>
>* Would it make sense to offset the week-based datetime's epoch so it aligns
>with ISO 8601's week format? Jan 1, 1970 is a thursday, but the YYYY-Www
>date format uses weeks starting on monday. I think producing strings in this
>format when the datetime has units of weeks would be a natural thing to do.
>
>* Should the NaT (not-a-time) value behave like floating-point NaN? i.e. NaT
>== NaT return false, etc. Should operations generating NaT trigger an
>'invalid' floating point exception in ufuncs?
>
>Cheers,
>Mark


-- 
-----------------------------------------------------------------------
| Alan K. Jackson            | To see a World in a Grain of Sand      |
| alan at ajackson.org          | And a Heaven in a Wild Flower,         |
| www.ajackson.org           | Hold Infinity in the palm of your hand |
| Houston, Texas             | And Eternity in an hour. - Blake       |
-----------------------------------------------------------------------



More information about the NumPy-Discussion mailing list