[Numpy-discussion] Default unit for datetime/timedelta

Wed Jun 8 19:10:26 EDT 2011

On Wed, Jun 8, 2011 at 5:48 PM, Pierre GM <pgmdevlist at gmail.com> wrote:

>
> On Jun 8, 2011, at 11:05 PM, Mark Wiebe wrote:
>
> > The NEP and current implementation of the datetime specifies microseconds
> as the default unit when constructing and converting to datetimes and
> timedeltas.
>
> AFAIU, the default is [us] when otherwise unspecified.
>

That's correct.

> Here are some current behaviors that are inconsistent with the microsecond
> default, but consistent with the "generic time unit" idea:
> >
> > >>> np.timedelta64(10, 's') + 10
> > numpy.timedelta64(20,'s')
>
> Here, the unit is defined: 's'
>

 For the first operand, the inconsistency is with the second. Here's the
reasoning I didn't spell out:
We're adding a timedelta + int, so lets convert 10 into a timedelta. No
units specified, so it's
10 microseconds, so we add 10 seconds and 10 microseconds, not 10 seconds
and 10 seconds.
This intuitive behavior which was specified in the NEP for + follows
naturally from having generic
units, but not from having a default of microseconds.

> >>> np.datetime64('2011-03-12') + 3
> > numpy.datetime64('2011-03-15','D')
>
> OK, here it is not. But the result makes sense... Up to a certain point. If
> you try to guess the unit from a date given as a string, what happens in
> case of ambiguities ? Or do you restrict an input string to be strictly
> ISO8601 to remove those ?
>

Yeah, I'm restricting the string to be (almost) strictly ISO8601. For
supporting other formats, I think creating a 'fancy_date_parser' function or
something like that would be better than having all those date string format
ambiguities in the core type.

> I'd like to make 'M8' and 'm8' be datetime data types with generic time
> units instead of microseconds as they are currently. This would also allow
> the possibility of extending the behavior of detecting the unit from the
> input string as:
> >
> > >>> np.datetime64('2011-03-12T13')
> > numpy.datetime64('2011-03-12T13-0600','h')
> >
> > to also work with arrays, which currently work like this:
> >
> > >>> np.array(['2011-03-12T13', '2012'], dtype='M8')
> > array(['2011-03-12T13:00:00.000000-0600',
> '2011-12-31T18:00:00.000000-0600'], dtype='datetime64[us]')
>
> Why is the second one not '2012-01-01T00:00:00-0600' ?
>

This is because dates are stored at midnight UTC, and when converted to
local time for the default time-based printing, that changes slightly.
ISO8601 specifies to interpret an input in local time if no "Z" or timezone
offset is given, so that's why the first one matches. I haven't been able to
think of a way around it other than putting warnings in the documentation,
and have made 'today' and 'now' throw errors if you try to use them as times
or dates respectively.

>
> Otherwise, I'm all for it.
>

Cool.

-Mark

>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110608/01758195/attachment.html>