Hi Nathaniel,

1- You give as an example of "naive" datetime handling:

>>> np.datetime64('2005-02-25T03:00Z')
np.datetime64('2005-02-25T03:00')

This IIUC is incorrect. The Z modifier is a timezone offset, and for normal "naive" datetimes would cause an error.


If what I understand from reading:
http://thread.gmane.org/gmane.comp.python.numeric.general/53805

It looks like anything other than Z, 00:00 or UTC that has a TZ adjustment would raise an error, and those specific conditions would not (I'm guessing this is because we assume it's UTC (or the same timezone) internally, anything that explicitly tells us it is UTC is acceptable, although that may be just my misreading of it.)

However on output we don't use the Z modifier (which is why it's different from the UTC datetime64).

I will change it to return an error if what I thought is incorrect and also include examples of conversion from datetimes as you requested.

Please let me know if there are any more changes that are required! I look forward to further comments/questions.

Cheers,
Sankarshan

On Fri, Mar 28, 2014 at 5:17 AM, Nathaniel Smith <njs@pobox.com> wrote:

On 28 Mar 2014 05:00, "Sankarshan Mudkavi" <smudkavi@uwaterloo.ca> wrote:
>
> Hi all,
>
> Apologies for the delay in following up, here is an expanded version of the proposal, which hopefully clears up most of the details. I have not included specific implementation details for the code, such as which functions to modify etc. since I think those are not traditionally included in NEPs?

The format seems fine to me. Really the point is just to have a document that we can use as reference when deciding on behaviour, and this does that :-).

Three quick comments:

1- You give as an example of "naive" datetime handling:

>>> np.datetime64('2005-02-25T03:00Z')
np.datetime64('2005-02-25T03:00')

This IIUC is incorrect. The Z modifier is a timezone offset, and for normal "naive" datetimes would cause an error.

2- It would be good to include explicitly examples of conversion to and from datetimes alongside the examples of conversions to and from strings.

3- It would be good to (eventually) include some discussion of the impact of the preferred proposal on existing code. E.g., will this break a lot of people's pipelines? (Are people currently *always* adding timezones to their numpy input to avoid the problem, and now will have to switch to the opposite behaviour depending on numpy version?) And we'll want to make sure to get feedback from the pydata@ (pandas) list explicitly, though that can wait until people here have had a chance to respond to the first draft.

Thanks for pushing this forward!
-n

Hi all,

Apologies for the delay in following up, here is an expanded version of the proposal, which hopefully clears up most of the details. I have not included specific implementation details for the code, such as which functions to modify etc. since I think those are not traditionally included in NEPs?

Please find attached the expanded proposal, and the rendered version is available here:
https://github.com/Sankarshan-Mudkavi/numpy/blob/Enhance-datetime64/doc/neps/datetime-improvement-proposal.rst

<datetime-improvement-proposal.rst>

I look forward to comments, agreements/disagreements with this (and clarification if this needs even further expansion).


Please find attached the 
On Mar 24, 2014, at 12:39 AM, Chris Barker <chris.barker@noaa.gov> wrote:

On Fri, Mar 21, 2014 at 3:43 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Thu, Mar 20, 2014 at 11:27 PM, Chris Barker <chris.barker@noaa.gov> wrote:
> * I think there are more or less three options:
>    1)  a) don't have any timezone handling at all -- all datetime64s are UTC. Always
>          b) don't have any timezone handling at all -- all datetime64s are naive
>              (the only difference between these two is I/O of strings, and maybe I/O of datetime objects with a time zone)
>     2) Have a time zone associated with the array -- defaulting to either UTC or None, but don't provide any implementation other than the tagging, with the ability to add in TZ handler if you want (can this be done efficiently?)
>     3) Full on proper TZ handling.
>
> I think (3) is off the table for now.

I think the first goal is to define what a plain vanilla datetime64
does, without any extra attributes. This is for two practical reasons:
First, our overriding #1 goal is to fix the nasty I/O problems that
default datetime64's show, so until that's done any other bells and
whistles are a distraction. And second, adding parameters to dtypes
right now is technically messy.

This rules out (2) and (3).

yup -- though I'm not sure I agree that we need to do this, if we are going to do something more later anyway. But you have a key point - maybe the dtype system simply isn't ready to do it right, and then it may be better not to try.

In which case, we are down to naive or always UTC -- and again, those really aren't very different. Though I prefer naive -- always UTC adds some complication if you don't actually want UTC, and I'm not sure it actually buys us anything. And maybe it's jsut me, but all my code would need to use naive, so I"d be doing a bit of working around to use a UTC-always system.
 
If we additionally want to keep the option of adding a timezone
parameter later, and have the result end up looking like stdlib
datetime, then I think 1(b) is the obvious choice. My guess is that
this is also what's most compatible with pandas, which is currently
keeping its own timezone object outside of the dtype.

Good point, all else being equal, compatability with Pandas would be a good thing.

Any downsides? I guess this would mean that we start raising an error
on ISO 8601's with offsets attached, which might annoy some people?

yes, but errors are better than incorrect values...

> Writing this made me think of a third option -- tracking, but no real manipulation, of TZ. This would be analogous to the ISO 8601 does -- all it does is note an offset. A given DateTime64 array would have a given offset assigned to it, and the appropriate addition and subtraction would happen at I/O. Offset of 0.00 would be UTC, and there would be a None option for naive.

Please no! An integer offset is a terrible way to represent timezones,

well, it would solve the being able to read ISO strings problem, and being able to perform operations with datetimes in multiple time zones. though I guess you could get most of that with UTC-always.
 
and hardcoding this would just get in the way of a proper solution.

well, that's a point -- if we think there is any hope of a proper solution down the road, then yes, it would be better not to make that harder.

-Chris

--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker@noaa.gov
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

-- 
Sankarshan Mudkavi
Undergraduate in Physics, University of Waterloo







-- 
Sankarshan Mudkavi
Undergraduate in Physics, University of Waterloo
www.smudkavi.com