I look forward to comments, agreements/disagreements with this (and clarification if this needs even further expansion).


Please find attached the 
On Mar 24, 2014, at 12:39 AM, Chris Barker <chris.barker@noaa.gov> wrote:

On Fri, Mar 21, 2014 at 3:43 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Thu, Mar 20, 2014 at 11:27 PM, Chris Barker <chris.barker@noaa.gov> wrote:
> * I think there are more or less three options:
>    1)  a) don't have any timezone handling at all -- all datetime64s are UTC. Always
>          b) don't have any timezone handling at all -- all datetime64s are naive
>              (the only difference between these two is I/O of strings, and maybe I/O of datetime objects with a time zone)
>     2) Have a time zone associated with the array -- defaulting to either UTC or None, but don't provide any implementation other than the tagging, with the ability to add in TZ handler if you want (can this be done efficiently?)
>     3) Full on proper TZ handling.
>
> I think (3) is off the table for now.

I think the first goal is to define what a plain vanilla datetime64
does, without any extra attributes. This is for two practical reasons:
First, our overriding #1 goal is to fix the nasty I/O problems that
default datetime64's show, so until that's done any other bells and
whistles are a distraction. And second, adding parameters to dtypes
right now is technically messy.

This rules out (2) and (3).

yup -- though I'm not sure I agree that we need to do this, if we are going to do something more later anyway. But you have a key point - maybe the dtype system simply isn't ready to do it right, and then it may be better not to try.

In which case, we are down to naive or always UTC -- and again, those really aren't very different. Though I prefer naive -- always UTC adds some complication if you don't actually want UTC, and I'm not sure it actually buys us anything. And maybe it's jsut me, but all my code would need to use naive, so I"d be doing a bit of working around to use a UTC-always system.
 
If we additionally want to keep the option of adding a timezone
parameter later, and have the result end up looking like stdlib
datetime, then I think 1(b) is the obvious choice. My guess is that
this is also what's most compatible with pandas, which is currently
keeping its own timezone object outside of the dtype.

Good point, all else being equal, compatability with Pandas would be a good thing.

Any downsides? I guess this would mean that we start raising an error
on ISO 8601's with offsets attached, which might annoy some people?

yes, but errors are better than incorrect values...

> Writing this made me think of a third option -- tracking, but no real manipulation, of TZ. This would be analogous to the ISO 8601 does -- all it does is note an offset. A given DateTime64 array would have a given offset assigned to it, and the appropriate addition and subtraction would happen at I/O. Offset of 0.00 would be UTC, and there would be a None option for naive.

Please no! An integer offset is a terrible way to represent timezones,

well, it would solve the being able to read ISO strings problem, and being able to perform operations with datetimes in multiple time zones. though I guess you could get most of that with UTC-always.
 
and hardcoding this would just get in the way of a proper solution.

well, that's a point -- if we think there is any hope of a proper solution down the road, then yes, it would be better not to make that harder.

-Chris

--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker@noaa.gov
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

-- 
Sankarshan Mudkavi
Undergraduate in Physics, University of Waterloo