Dates and times and Datetime64 (again)
Hey all, It's been a while since the last datetime and timezones discussion thread was visited (linked below): http://thread.gmane.org/gmane.comp.python.numeric.general/53805 It looks like the best approach to follow is the UTC only approach in the linked thread with an optional flag to indicate the timezone (to avoid confusing applications where they don't expect any timezone info). Since this is slightly more useful than having just a naive datetime64 package and would be open to extension if required, it's probably the best way to start improving the datetime64 library. If we do wish to have full timezone support it would very likely lead to performance drops (as reasoned in the thread) and we would need to have a dedicated, maintained tzinfo package, at which point it would make much more sense to just incorporate the pytz library. (I also don't have the expertise to implement this, so I would be unable to help resolve the current logjam) I would like to start writing a NEP for this followed by implementation, however I'm not sure what the format etc. is, could someone direct me to a page where this information is provided? Please let me know if there are any ideas, comments etc. Cheers, Sankarshan
On Tue, Mar 18, 2014 at 2:49 PM, Sankarshan Mudkavi <smudkavi@uwaterloo.ca>wrote:
It's been a while since the last datetime and timezones discussion thread was visited (linked below):
http://thread.gmane.org/gmane.comp.python.numeric.general/53805
It looks like the best approach to follow is the UTC only approach in the linked thread with an optional flag to indicate the timezone (to avoid confusing applications where they don't expect any timezone info). Since this is slightly more useful than having just a naive datetime64 package and would be open to extension if required, it's probably the best way to start improving the datetime64 library.
IIUC, I agree -- which is why we need a NEP to specify the details. Thank you for stepping up! If we do wish to have full timezone support it would very likely lead to
performance drops (as reasoned in the thread) and we would need to have a dedicated, maintained tzinfo package, at which point it would make much more sense to just incorporate the pytz library.
yup -- there is the option of doing what the stdlib datetime does -- provide a hook to incorporate timezone,s but don't provide an implementation, unless that is a low-level hook that must be implemented in C, it's going to be slow -- slow enough that you might as well use a list of stdlib datetimes.... Also, this has gone far to long without getting fixed -- we need something simple to implement more than anything else.
I would like to start writing a NEP for this followed by implementation, however I'm not sure what the format etc. is, could someone direct me to a page where this information is provided?
I don't know that there is such a thing, but you'll find the existing NEPS here: https://github.com/numpy/numpy/tree/master/doc/neps I'd grab one and follow the format.
Please let me know if there are any ideas, comments etc.
Thanks again -- I look forward to seeing it written up, -- I'm sure to have something to say then! -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
Sankarshan Mudkavi <smudkavi <at> uwaterloo.ca> writes:
Hey all, It's been a while since the last datetime and timezones discussion thread
was visited (linked below):
http://thread.gmane.org/gmane.comp.python.numeric.general/53805
It looks like the best approach to follow is the UTC only approach in the
linked thread with an optional flag to indicate the timezone (to avoid confusing applications where they don't expect any timezone info). Since this is slightly more useful than having just a naive datetime64 package and would be open to extension if required, it's probably the best way to start improving the datetime64 library.
I would like to start writing a NEP for this followed by implementation, however I'm not sure what the format etc. is, could someone direct me to a
<snip> page where this information is provided?
Please let me know if there are any ideas, comments etc.
Cheers, Sankarshan
See: http://article.gmane.org/gmane.comp.python.numeric.general/55191 You could use a current NEP as a template: https://github.com/numpy/numpy/tree/master/doc/neps I'm a huge +100 on the simplest UTC fix. As is, using numpy datetimes is likely to silently give incorrect results - something I've already seen several times in end-user data analysis code. Concrete Example: In [16]: dates = pd.date_range('01-Apr-2014', '04-Apr-2014', freq='H')[:-1] ...: values = np.array([1,2,3]).repeat(24) ...: records = zip(map(str, dates), values) ...: pd.TimeSeries(values, dates).groupby(lambda d: d.date()).mean() ...: Out[16]: 2014-04-01 1 2014-04-02 2 2014-04-03 3 dtype: int32 In [17]: df = pd.DataFrame(np.array(records, dtype=[('dates', 'M8[h]'), ('values', float)])) ...: df.set_index('dates', inplace=True) ...: df.groupby(lambda d: d.date()).mean() ...: Out[17]: values 2014-03-31 1.000000 2014-04-01 1.041667 2014-04-02 2.041667 2014-04-03 3.000000 [4 rows x 1 columns] Try it in your timezone and see what you get! -Dave
Dave, your example is not a problem with numpy per se, rather that the default generation is in local timezone (same as what python datetime does). If you localize to UTC you get the results that you expect. In [49]: dates = pd.date_range('01-Apr-2014', '04-Apr-2014', freq='H')[:-1] In [50]: pd.TimeSeries(values, dates.tz_localize('UTC')).groupby(lambda d: d.date()).mean() Out[50]: 2014-04-01 1 2014-04-02 2 2014-04-03 3 dtype: int64 In [51]: records = zip(map(str, dates.tz_localize('UTC')), values) In [52]: df = pd.DataFrame(np.array(records, dtype=[('dates', 'M8[h]'),('values', float)])) In [53]: df.set_index('dates').groupby(lambda x: x.date()).mean() Out[53]: values 2014-04-01 1 2014-04-02 2 2014-04-03 3 [3 rows x 1 columns] On Wed, Mar 19, 2014 at 5:21 AM, Dave Hirschfeld <novin01@gmail.com> wrote:
Sankarshan Mudkavi <smudkavi <at> uwaterloo.ca> writes:
Hey all, It's been a while since the last datetime and timezones discussion thread
was visited (linked below):
http://thread.gmane.org/gmane.comp.python.numeric.general/53805
It looks like the best approach to follow is the UTC only approach in the
linked thread with an optional flag to indicate the timezone (to avoid confusing applications where they don't expect any timezone info). Since this is slightly more useful than having just a naive datetime64 package and would be open to extension if required, it's probably the best way to start improving the datetime64 library.
I would like to start writing a NEP for this followed by implementation, however I'm not sure what the format etc. is, could someone direct me to a
<snip> page where this information is provided?
Please let me know if there are any ideas, comments etc.
Cheers, Sankarshan
See: http://article.gmane.org/gmane.comp.python.numeric.general/55191
You could use a current NEP as a template: https://github.com/numpy/numpy/tree/master/doc/neps
I'm a huge +100 on the simplest UTC fix.
As is, using numpy datetimes is likely to silently give incorrect results - something I've already seen several times in end-user data analysis code.
Concrete Example:
In [16]: dates = pd.date_range('01-Apr-2014', '04-Apr-2014', freq='H')[:-1] ...: values = np.array([1,2,3]).repeat(24) ...: records = zip(map(str, dates), values) ...: pd.TimeSeries(values, dates).groupby(lambda d: d.date()).mean() ...: Out[16]: 2014-04-01 1 2014-04-02 2 2014-04-03 3 dtype: int32
In [17]: df = pd.DataFrame(np.array(records, dtype=[('dates', 'M8[h]'), ('values', float)])) ...: df.set_index('dates', inplace=True) ...: df.groupby(lambda d: d.date()).mean() ...: Out[17]: values 2014-03-31 1.000000 2014-04-01 1.041667 2014-04-02 2.041667 2014-04-03 3.000000
[4 rows x 1 columns]
Try it in your timezone and see what you get!
-Dave
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Jeff Reback <jeffreback <at> gmail.com> writes:
Dave,
your example is not a problem with numpy per se, rather that the default
generation is in local timezone (same as what python datetime does).
If you localize to UTC you get the results that you expect.
The problem is that the default datetime generation in *numpy* is in local time. Note that this *is not* the case in Python - it doesn't try to guess the timezone info based on where in the world you run the code, if it's not provided it sets it to None. In [7]: pd.datetime? Type: type String Form:<type 'datetime.datetime'> Docstring: datetime(year, month, day[, hour[, minute[, second[, microsecond[,tzinfo]]]]]) The year, month and day arguments are required. tzinfo may be None, or an instance of a tzinfo subclass. The remaining arguments may be ints or longs. In [8]: pd.datetime(2000,1,1).tzinfo is None Out[8]: True This may be the best solution but as others have pointed out this is more difficult to implement and may have other issues. I don't want to wait for the best solution - the assume UTC on input/output if not specified will solve the problem and this desperately needs to be fixed because it's completely broken as is IMHO.
If you localize to UTC you get the results that you expect.
That's the whole point - *numpy* needs to localize to UTC, not to whatever timezone you happen to be in when running the code. In a real-world data analysis problem you don't start with the data in a DataFrame or a numpy array it comes from the web, a csv, Excel, a database and you want to convert it to a DataFrame or numpy array. So what you have from whatever source is a list of tuples of strings and you want to convert them into a typed array. Obviously you can't localize a string - you have to convert it to a date first and if you do that with numpy the date you have is wrong. In [108]: dst = np.array(['2014-03-30 00:00', '2014-03-30 01:00', '2014-03- 30 02:00'], dtype='M8[h]') ...: dst ...: Out[108]: array(['2014-03-30T00+0000', '2014-03-30T00+0000', '2014-03- 30T02+0100'], dtype='datetime64[h]') In [109]: dst.tolist() Out[109]: [datetime.datetime(2014, 3, 30, 0, 0), datetime.datetime(2014, 3, 30, 0, 0), datetime.datetime(2014, 3, 30, 1, 0)] AFAICS there's no way to get the original dates back once they've passed through numpy's parser!? -Dave
On Mar 19, 2014, at 10:01 AM, Dave Hirschfeld <novin01@gmail.com> wrote:
Jeff Reback <jeffreback <at> gmail.com> writes:
Dave,
your example is not a problem with numpy per se, rather that the default
generation is in local timezone (same as what python datetime does).
If you localize to UTC you get the results that you expect.
The problem is that the default datetime generation in *numpy* is in local time.
Note that this *is not* the case in Python - it doesn't try to guess the timezone info based on where in the world you run the code, if it's not provided it sets it to None.
In [7]: pd.datetime? Type: type String Form:<type 'datetime.datetime'> Docstring: datetime(year, month, day[, hour[, minute[, second[, microsecond[,tzinfo]]]]])
The year, month and day arguments are required. tzinfo may be None, or an instance of a tzinfo subclass. The remaining arguments may be ints or longs.
In [8]: pd.datetime(2000,1,1).tzinfo is None Out[8]: True
This may be the best solution but as others have pointed out this is more difficult to implement and may have other issues.
I don't want to wait for the best solution - the assume UTC on input/output if not specified will solve the problem and this desperately needs to be fixed because it's completely broken as is IMHO.
If you localize to UTC you get the results that you expect.
That's the whole point - *numpy* needs to localize to UTC, not to whatever timezone you happen to be in when running the code.
In a real-world data analysis problem you don't start with the data in a DataFrame or a numpy array it comes from the web, a csv, Excel, a database and you want to convert it to a DataFrame or numpy array. So what you have from whatever source is a list of tuples of strings and you want to convert them into a typed array.
Obviously you can't localize a string - you have to convert it to a date first and if you do that with numpy the date you have is wrong.
In [108]: dst = np.array(['2014-03-30 00:00', '2014-03-30 01:00', '2014-03- 30 02:00'], dtype='M8[h]') ...: dst ...: Out[108]: array(['2014-03-30T00+0000', '2014-03-30T00+0000', '2014-03- 30T02+0100'], dtype='datetime64[h]')
In [109]: dst.tolist() Out[109]: [datetime.datetime(2014, 3, 30, 0, 0), datetime.datetime(2014, 3, 30, 0, 0), datetime.datetime(2014, 3, 30, 1, 0)]
AFAICS there's no way to get the original dates back once they've passed through numpy's parser!?
-Dave
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Hi all, I've written a rather rudimentary NEP, (lacking in technical details which I will hopefully add after some further discussion and receiving clarification/help on this thread). Please let me know how to proceed and what you think should be added to the current proposal (attached to this mail). Here is a rendered version of the same: https://github.com/Sankarshan-Mudkavi/numpy/blob/Enhance-datetime64/doc/neps... Cheers, Sankarshan -- Sankarshan Mudkavi Undergraduate in Physics, University of Waterloo www.smudkavi.com
I've written a rather rudimentary NEP, (lacking in technical details which I will hopefully add after some further discussion and receiving clarification/help on this thread).
Please let me know how to proceed and what you think should be added to
On 20 Mar 2014 02:07, "Sankarshan Mudkavi" <smudkavi@uwaterloo.ca> wrote: the current proposal (attached to this mail).
Here is a rendered version of the same:
https://github.com/Sankarshan-Mudkavi/numpy/blob/Enhance-datetime64/doc/neps... Your NEP suggests making all datetime64s be in UTC, and treating string representations from unknown timezones as UTC. How does this differ from, and why is it superior to, making all datetime64s be naive? -n
Hi Nathaniel, It differs by allowing time zone info to be preserved if supplied. A naive datetime64 would be unable to handle this, and would either have to ignore the tzinfo or would have to throw up an exception. The current suggestion is very similar to a naive datetime64 and only differs in being able to handle the given tzinfo, rather than ignoring it or telling the user that the current implementation cannot handle it. This would be superioir to a naive dateime64 for use cases that have the tzinfo available, and would avoid the users having to workaround NumPy's inability to handle them if provided. A big thanks to Chris Barker for the write up linked in the proposal, it makes it very clear what the various possibilities are for improvement. Cheers, Sankarshan On Mar 20, 2014, at 7:16 AM, Nathaniel Smith <njs@pobox.com> wrote:
On 20 Mar 2014 02:07, "Sankarshan Mudkavi" <smudkavi@uwaterloo.ca> wrote:
I've written a rather rudimentary NEP, (lacking in technical details which I will hopefully add after some further discussion and receiving clarification/help on this thread).
Please let me know how to proceed and what you think should be added to the current proposal (attached to this mail).
Here is a rendered version of the same: https://github.com/Sankarshan-Mudkavi/numpy/blob/Enhance-datetime64/doc/neps...
Your NEP suggests making all datetime64s be in UTC, and treating string representations from unknown timezones as UTC. How does this differ from, and why is it superior to, making all datetime64s be naive?
-n
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- Sankarshan Mudkavi Undergraduate in Physics, University of Waterloo www.smudkavi.com
On Thu, Mar 20, 2014 at 9:39 AM, Sankarshan Mudkavi <smudkavi@uwaterloo.ca>wrote:
A naive datetime64 would be unable to handle this, and would either have to ignore the tzinfo or would have to throw up an exception.
This is not true. Python's own datetime has no problem handling this:
t1 = datetime(2000,1,1,12) t2 = datetime(2000,1,1,12,tzinfo=timezone.utc) print(t1) 2000-01-01 12:00:00 print(t2) 2000-01-01 12:00:00+00:00
On Thu, Mar 20, 2014 at 4:16 AM, Nathaniel Smith <njs@pobox.com> wrote:
Your NEP suggests making all datetime64s be in UTC, and treating string representations from unknown timezones as UTC. How does this differ from, and why is it superior to, making all datetime64s be naive?
This came up in the conversation before -- I think the fact is that a 'naive' datetime and a UTC datetime are almost exactly the same. In essence you can use a UTC datetime and pretend it's naive in almost all cases.
The difference comes down to I/O. If it's UTC, then an ISO 8601 string created from it would include a "Z" on the end (or a +0.00, I think), whereas naive datetime should have no TZ indicator. On input, the question is what do you do with an ISO string with a TZ indicator: 1) translate to UTC -- make sense is we have the "always UTC" definition 2) raise an exception -- makes sense if we have the naive definition 3) ignore it -- which would make some sense if were naive, but perhaps a little too prone to error. But the real issue with the current implementation is how an iso string with no TZ indicator is handled -- it currently assumes that means "use the localle TZ", which is more than not wrong, and clearly subject to errors. Also, it time-shifts to locale TZ when creating an ISO string, with no way to specify that. So: * I'm not sure what the new NEP is suggesting at all, actually, we need a fully description, with exampel sof what varios input / ouput would give. * I think there are more or less three options: 1) a) don't have any timezone handling at all -- all datetime64s are UTC. Always b) don't have any timezone handling at all -- all datetime64s are naive (the only difference between these two is I/O of strings, and maybe I/O of datetime objects with a time zone) 2) Have a time zone associated with the array -- defaulting to either UTC or None, but don't provide any implementation other than the tagging, with the ability to add in TZ handler if you want (can this be done efficiently?) 3) Full on proper TZ handling. I think (3) is off the table for now. I think (2) is what the NEP proposes, but I'd need more details, examples to know. I prefer 1(b), but 1(a) is close enough that I'd be happy with that, too. Writing this made me think of a third option -- tracking, but no real manipulation, of TZ. This would be analogous to the ISO 8601 does -- all it does is note an offset. A given DateTime64 array would have a given offset assigned to it, and the appropriate addition and subtraction would happen at I/O. Offset of 0.00 would be UTC, and there would be a None option for naive. I haven't thought that out for the inevitable complications, though. -CHB
-n
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
Hi Chris,
I think there are more or less three options: 1) a) don't have any timezone handling at all -- all datetime64s are UTC. Always b) don't have any timezone handling at all -- all datetime64s are naive (the only difference between these two is I/O of strings, and maybe I/O of datetime objects with a time zone) 2) Have a time zone associated with the array -- defaulting to either UTC or None, but don't provide any implementation other than the tagging, with the ability to add in TZ handler if you want (can this be done efficiently?) 3) Full on proper TZ handling.
I think (3) is off the table for now.
I think (2) is what the NEP proposes, but I'd need more details, examples to know.
I prefer 1(b), but 1(a) is close enough that I'd be happy with that, too.
Yes 2) is indeed what I was suggesting. My apologies for being unclear, I was unsure of how much detail and technical information I should include in the proposal. I will update it and add more examples etc. to actually specify what I mean. I'm not sure how much of a hit the performance would take if we were to take of the Z handler. Do you have any major concerns as of now regarding that, or do you want to wait till I provide more specific details? It also looks like the last option you mentioned seems quite reasonable too. To only do what ISO 8601 does. Perhaps, it would be better to implement that first and then look for an improvement later on? Do you have a preference for this or the option 2) ? I will expand the NEP and hopefully make it clearer what it entails. Once again, thanks for the earlier write up. Cheers, Sankarshan On Mar 20, 2014, at 7:27 PM, Chris Barker <chris.barker@noaa.gov> wrote:
On Thu, Mar 20, 2014 at 4:16 AM, Nathaniel Smith <njs@pobox.com> wrote: Your NEP suggests making all datetime64s be in UTC, and treating string representations from unknown timezones as UTC. How does this differ from, and why is it superior to, making all datetime64s be naive?
This came up in the conversation before -- I think the fact is that a 'naive' datetime and a UTC datetime are almost exactly the same. In essence you can use a UTC datetime and pretend it's naive in almost all cases.
The difference comes down to I/O. If it's UTC, then an ISO 8601 string created from it would include a "Z" on the end (or a +0.00, I think), whereas naive datetime should have no TZ indicator.
On input, the question is what do you do with an ISO string with a TZ indicator: 1) translate to UTC -- make sense is we have the "always UTC" definition 2) raise an exception -- makes sense if we have the naive definition 3) ignore it -- which would make some sense if were naive, but perhaps a little too prone to error.
But the real issue with the current implementation is how an iso string with no TZ indicator is handled -- it currently assumes that means "use the localle TZ", which is more than not wrong, and clearly subject to errors.
Also, it time-shifts to locale TZ when creating an ISO string, with no way to specify that.
So:
* I'm not sure what the new NEP is suggesting at all, actually, we need a fully description, with exampel sof what varios input / ouput would give.
* I think there are more or less three options: 1) a) don't have any timezone handling at all -- all datetime64s are UTC. Always b) don't have any timezone handling at all -- all datetime64s are naive (the only difference between these two is I/O of strings, and maybe I/O of datetime objects with a time zone) 2) Have a time zone associated with the array -- defaulting to either UTC or None, but don't provide any implementation other than the tagging, with the ability to add in TZ handler if you want (can this be done efficiently?) 3) Full on proper TZ handling.
I think (3) is off the table for now.
I think (2) is what the NEP proposes, but I'd need more details, examples to know.
I prefer 1(b), but 1(a) is close enough that I'd be happy with that, too.
Writing this made me think of a third option -- tracking, but no real manipulation, of TZ. This would be analogous to the ISO 8601 does -- all it does is note an offset. A given DateTime64 array would have a given offset assigned to it, and the appropriate addition and subtraction would happen at I/O. Offset of 0.00 would be UTC, and there would be a None option for naive.
I haven't thought that out for the inevitable complications, though.
-CHB
-n
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
--
Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- Sankarshan Mudkavi Undergraduate in Physics, University of Waterloo www.smudkavi.com
On Thu, Mar 20, 2014 at 4:55 PM, Sankarshan Mudkavi <smudkavi@uwaterloo.ca>wrote:
Yes 2) is indeed what I was suggesting. My apologies for being unclear, I was unsure of how much detail and technical information I should include in the proposal.
well, you need to put enough in that it's clear what it means. I think examples are critical -- at least that's how I learn things.
I'm not sure how much of a hit the performance would take if we were to take of the Z handler. Do you have any major concerns as of now regarding that, or do you want to wait till I provide more specific details?
more detail would be good. My comment about performance is that if numpy needs to call a Python object to do the time zone handling for each value in an array, that is going to pretty slow -- but maybe better than not having it at all. And there shouldn't be any reason not to have a fast path for when the array is naive or you are working with two arrays that are in the same TZ -- the really common case that we care about performance for. So ot probably comes down to one extra field... It also looks like the last option you mentioned seems quite reasonable
too. To only do what ISO 8601 does. Perhaps, it would be better to implement that first and then look for an improvement later on? Do you have a preference for this or the option 2) ?
I'm liking that one: It seems pretty easy to allow a tag for TZ offset, and not much extra math when converting. And this could be pretty useful. But I'm not writing the code... -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On Thu, Mar 20, 2014 at 7:27 PM, Chris Barker <chris.barker@noaa.gov> wrote:
On Thu, Mar 20, 2014 at 4:16 AM, Nathaniel Smith <njs@pobox.com> wrote:
Your NEP suggests making all datetime64s be in UTC, and treating string representations from unknown timezones as UTC. How does this differ from, and why is it superior to, making all datetime64s be naive?
This came up in the conversation before -- I think the fact is that a 'naive' datetime and a UTC datetime are almost exactly the same. In essence you can use a UTC datetime and pretend it's naive in almost all cases.
The difference comes down to I/O.
It is more than I/O. It is also about interoperability with Python's datetime module. Here is the behavior that I don't like in the current implementation:
d = array(['2001-01-01T12:00'], dtype='M8[ms]') d.item(0) datetime.datetime(2001, 1, 1, 17, 0)
If I understand NEP correctly, the proposal is to make d.item(0) return
d.item(0).replace(tzinfo=timezone.utc) datetime.datetime(2001, 1, 1, 12, 0, tzinfo=datetime.timezone.utc)
instead. But this is not what I would expect: I want
d.item(0) datetime.datetime(2001, 1, 1, 12, 0)
When I work with naive datetime objects I don't want to be exposed to timezones at all.
On Thu, Mar 20, 2014 at 6:32 PM, Alexander Belopolsky <ndarray@mac.com>wrote:
The difference comes down to I/O.
It is more than I/O. It is also about interoperability with Python's datetime module.
Sorry -- I was using I/O to mean "converting to/from datetime64 and other types" So that included datetime.datetime. Here is the behavior that I don't like in the current implementation:
d = array(['2001-01-01T12:00'], dtype='M8[ms]') d.item(0) datetime.datetime(2001, 1, 1, 17, 0)
yup , it converted to UTC using your locale setting -- really not good! Then tossed that our when creating a datetime.datetime. This really is quite broken. But this brings up a good point -- having time zone handling fully compatible ith datetime.datetime would have its advantages. So use the same tzinfo API. If I understand NEP correctly, the proposal is to make d.item(0) return
d.item(0).replace(tzinfo=timezone.utc) datetime.datetime(2001, 1, 1, 12, 0, tzinfo=datetime.timezone.utc)
instead. But this is not what I would expect: I want
d.item(0) datetime.datetime(2001, 1, 1, 12, 0)
When I work with naive datetime objects I don't want to be exposed to timezones at all.
right -- naive time zones really would be good. The problem now with the current code and your example, is that in:
d = array(['2001-01-01T12:00'], dtype='M8[ms]')
'2001-01-01T12:00' is interpreted as meaning "in the machines locale time zone" combining that with teh UTC assumption, and you have trouble. The work around for what you want now is to add TZ info to the string: In [56]: d = np.array(['2001-01-01T12:00Z'], dtype='M8[ms]') In [57]: d.item(0) Out[57]: datetime.datetime(2001, 1, 1, 12, 0) or: In [60]: d = np.array(['2001-01-01T12:00-00:00'], dtype='M8[ms]') In [61]: d.item(0) Out[61]: datetime.datetime(2001, 1, 1, 12, 0) I _think_ that's what you want. This is what I mean that naive and UTC are almost the same. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On Fri, Mar 21, 2014 at 5:31 PM, Chris Barker <chris.barker@noaa.gov> wrote:
But this brings up a good point -- having time zone handling fully compatible ith datetime.datetime would have its advantages.
I don't know if everyone is aware of this, but Python stdlib has support for fixed-offset timezones since version 3.2: http://docs.python.org/3.2/whatsnew/3.2.html#datetime-and-time It took many years to bring in that feature, but now we can benefit from not having to reinvent the wheel. I will try to write up some specific proposal this weekend.
On Thu, Mar 20, 2014 at 11:27 PM, Chris Barker <chris.barker@noaa.gov> wrote:
* I think there are more or less three options: 1) a) don't have any timezone handling at all -- all datetime64s are UTC. Always b) don't have any timezone handling at all -- all datetime64s are naive (the only difference between these two is I/O of strings, and maybe I/O of datetime objects with a time zone) 2) Have a time zone associated with the array -- defaulting to either UTC or None, but don't provide any implementation other than the tagging, with the ability to add in TZ handler if you want (can this be done efficiently?) 3) Full on proper TZ handling.
I think (3) is off the table for now.
I think (2) is what the NEP proposes, but I'd need more details, examples to know.
I prefer 1(b), but 1(a) is close enough that I'd be happy with that, too.
I think the first goal is to define what a plain vanilla datetime64 does, without any extra attributes. This is for two practical reasons: First, our overriding #1 goal is to fix the nasty I/O problems that default datetime64's show, so until that's done any other bells and whistles are a distraction. And second, adding parameters to dtypes right now is technically messy. This rules out (2) and (3). If we additionally want to keep the option of adding a timezone parameter later, and have the result end up looking like stdlib datetime, then I think 1(b) is the obvious choice. My guess is that this is also what's most compatible with pandas, which is currently keeping its own timezone object outside of the dtype. Any downsides? I guess this would mean that we start raising an error on ISO 8601's with offsets attached, which might annoy some people?
Writing this made me think of a third option -- tracking, but no real manipulation, of TZ. This would be analogous to the ISO 8601 does -- all it does is note an offset. A given DateTime64 array would have a given offset assigned to it, and the appropriate addition and subtraction would happen at I/O. Offset of 0.00 would be UTC, and there would be a None option for naive.
Please no! An integer offset is a terrible way to represent timezones, and hardcoding this would just get in the way of a proper solution. -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org
On Fri, Mar 21, 2014 at 3:43 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Thu, Mar 20, 2014 at 11:27 PM, Chris Barker <chris.barker@noaa.gov> wrote:
* I think there are more or less three options: 1) a) don't have any timezone handling at all -- all datetime64s are UTC. Always b) don't have any timezone handling at all -- all datetime64s are naive (the only difference between these two is I/O of strings, and maybe I/O of datetime objects with a time zone) 2) Have a time zone associated with the array -- defaulting to either UTC or None, but don't provide any implementation other than the tagging, with the ability to add in TZ handler if you want (can this be done efficiently?) 3) Full on proper TZ handling.
I think (3) is off the table for now.
I think the first goal is to define what a plain vanilla datetime64 does, without any extra attributes. This is for two practical reasons: First, our overriding #1 goal is to fix the nasty I/O problems that default datetime64's show, so until that's done any other bells and whistles are a distraction. And second, adding parameters to dtypes right now is technically messy.
This rules out (2) and (3).
yup -- though I'm not sure I agree that we need to do this, if we are going to do something more later anyway. But you have a key point - maybe the dtype system simply isn't ready to do it right, and then it may be better not to try. In which case, we are down to naive or always UTC -- and again, those really aren't very different. Though I prefer naive -- always UTC adds some complication if you don't actually want UTC, and I'm not sure it actually buys us anything. And maybe it's jsut me, but all my code would need to use naive, so I"d be doing a bit of working around to use a UTC-always system.
If we additionally want to keep the option of adding a timezone parameter later, and have the result end up looking like stdlib datetime, then I think 1(b) is the obvious choice. My guess is that this is also what's most compatible with pandas, which is currently keeping its own timezone object outside of the dtype.
Good point, all else being equal, compatability with Pandas would be a good thing. Any downsides? I guess this would mean that we start raising an error
on ISO 8601's with offsets attached, which might annoy some people?
yes, but errors are better than incorrect values...
Writing this made me think of a third option -- tracking, but no real manipulation, of TZ. This would be analogous to the ISO 8601 does -- all it does is note an offset. A given DateTime64 array would have a given offset assigned to it, and the appropriate addition and subtraction would happen at I/O. Offset of 0.00 would be UTC, and there would be a None option for naive.
Please no! An integer offset is a terrible way to represent timezones,
well, it would solve the being able to read ISO strings problem, and being able to perform operations with datetimes in multiple time zones. though I guess you could get most of that with UTC-always.
and hardcoding this would just get in the way of a proper solution.
well, that's a point -- if we think there is any hope of a proper solution down the road, then yes, it would be better not to make that harder. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
Hi all, Apologies for the delay in following up, here is an expanded version of the proposal, which hopefully clears up most of the details. I have not included specific implementation details for the code, such as which functions to modify etc. since I think those are not traditionally included in NEPs? Please find attached the expanded proposal, and the rendered version is available here: https://github.com/Sankarshan-Mudkavi/numpy/blob/Enhance-datetime64/doc/neps... I look forward to comments, agreements/disagreements with this (and clarification if this needs even further expansion). Please find attached the On Mar 24, 2014, at 12:39 AM, Chris Barker <chris.barker@noaa.gov> wrote:
On Fri, Mar 21, 2014 at 3:43 PM, Nathaniel Smith <njs@pobox.com> wrote: On Thu, Mar 20, 2014 at 11:27 PM, Chris Barker <chris.barker@noaa.gov> wrote:
* I think there are more or less three options: 1) a) don't have any timezone handling at all -- all datetime64s are UTC. Always b) don't have any timezone handling at all -- all datetime64s are naive (the only difference between these two is I/O of strings, and maybe I/O of datetime objects with a time zone) 2) Have a time zone associated with the array -- defaulting to either UTC or None, but don't provide any implementation other than the tagging, with the ability to add in TZ handler if you want (can this be done efficiently?) 3) Full on proper TZ handling.
I think (3) is off the table for now.
I think the first goal is to define what a plain vanilla datetime64 does, without any extra attributes. This is for two practical reasons: First, our overriding #1 goal is to fix the nasty I/O problems that default datetime64's show, so until that's done any other bells and whistles are a distraction. And second, adding parameters to dtypes right now is technically messy.
This rules out (2) and (3).
yup -- though I'm not sure I agree that we need to do this, if we are going to do something more later anyway. But you have a key point - maybe the dtype system simply isn't ready to do it right, and then it may be better not to try.
In which case, we are down to naive or always UTC -- and again, those really aren't very different. Though I prefer naive -- always UTC adds some complication if you don't actually want UTC, and I'm not sure it actually buys us anything. And maybe it's jsut me, but all my code would need to use naive, so I"d be doing a bit of working around to use a UTC-always system.
If we additionally want to keep the option of adding a timezone parameter later, and have the result end up looking like stdlib datetime, then I think 1(b) is the obvious choice. My guess is that this is also what's most compatible with pandas, which is currently keeping its own timezone object outside of the dtype.
Good point, all else being equal, compatability with Pandas would be a good thing.
Any downsides? I guess this would mean that we start raising an error on ISO 8601's with offsets attached, which might annoy some people?
yes, but errors are better than incorrect values...
Writing this made me think of a third option -- tracking, but no real manipulation, of TZ. This would be analogous to the ISO 8601 does -- all it does is note an offset. A given DateTime64 array would have a given offset assigned to it, and the appropriate addition and subtraction would happen at I/O. Offset of 0.00 would be UTC, and there would be a None option for naive.
Please no! An integer offset is a terrible way to represent timezones,
well, it would solve the being able to read ISO strings problem, and being able to perform operations with datetimes in multiple time zones. though I guess you could get most of that with UTC-always.
and hardcoding this would just get in the way of a proper solution.
well, that's a point -- if we think there is any hope of a proper solution down the road, then yes, it would be better not to make that harder.
-Chris
--
Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- Sankarshan Mudkavi Undergraduate in Physics, University of Waterloo www.smudkavi.com
On 28 Mar 2014 05:00, "Sankarshan Mudkavi" <smudkavi@uwaterloo.ca> wrote:
Hi all,
Apologies for the delay in following up, here is an expanded version of
the proposal, which hopefully clears up most of the details. I have not included specific implementation details for the code, such as which functions to modify etc. since I think those are not traditionally included in NEPs? The format seems fine to me. Really the point is just to have a document that we can use as reference when deciding on behaviour, and this does that :-). Three quick comments: 1- You give as an example of "naive" datetime handling:
np.datetime64('2005-02-25T03:00Z') np.datetime64('2005-02-25T03:00')
This IIUC is incorrect. The Z modifier is a timezone offset, and for normal "naive" datetimes would cause an error. 2- It would be good to include explicitly examples of conversion to and from datetimes alongside the examples of conversions to and from strings. 3- It would be good to (eventually) include some discussion of the impact of the preferred proposal on existing code. E.g., will this break a lot of people's pipelines? (Are people currently *always* adding timezones to their numpy input to avoid the problem, and now will have to switch to the opposite behaviour depending on numpy version?) And we'll want to make sure to get feedback from the pydata@ (pandas) list explicitly, though that can wait until people here have had a chance to respond to the first draft. Thanks for pushing this forward! -n
Hi Nathaniel,
1- You give as an example of "naive" datetime handling:
np.datetime64('2005-02-25T03:00Z') np.datetime64('2005-02-25T03:00')
This IIUC is incorrect. The Z modifier is a timezone offset, and for normal "naive" datetimes would cause an error.
If what I understand from reading: http://thread.gmane.org/gmane.comp.python.numeric.general/53805 It looks like anything other than Z, 00:00 or UTC that has a TZ adjustment would raise an error, and those specific conditions would not (I'm guessing this is because we assume it's UTC (or the same timezone) internally, anything that explicitly tells us it is UTC is acceptable, although that may be just my misreading of it.) However on output we don't use the Z modifier (which is why it's different from the UTC datetime64). I will change it to return an error if what I thought is incorrect and also include examples of conversion from datetimes as you requested. Please let me know if there are any more changes that are required! I look forward to further comments/questions. Cheers, Sankarshan
On Fri, Mar 28, 2014 at 5:17 AM, Nathaniel Smith <njs@pobox.com> wrote: On 28 Mar 2014 05:00, "Sankarshan Mudkavi" <smudkavi@uwaterloo.ca> wrote:
Hi all,
Apologies for the delay in following up, here is an expanded version of the proposal, which hopefully clears up most of the details. I have not included specific implementation details for the code, such as which functions to modify etc. since I think those are not traditionally included in NEPs?
The format seems fine to me. Really the point is just to have a document that we can use as reference when deciding on behaviour, and this does that :-).
Three quick comments:
1- You give as an example of "naive" datetime handling:
np.datetime64('2005-02-25T03:00Z') np.datetime64('2005-02-25T03:00')
This IIUC is incorrect. The Z modifier is a timezone offset, and for normal "naive" datetimes would cause an error.
2- It would be good to include explicitly examples of conversion to and from datetimes alongside the examples of conversions to and from strings.
3- It would be good to (eventually) include some discussion of the impact of the preferred proposal on existing code. E.g., will this break a lot of people's pipelines? (Are people currently *always* adding timezones to their numpy input to avoid the problem, and now will have to switch to the opposite behaviour depending on numpy version?) And we'll want to make sure to get feedback from the pydata@ (pandas) list explicitly, though that can wait until people here have had a chance to respond to the first draft.
Thanks for pushing this forward! -n
Hi all,
Apologies for the delay in following up, here is an expanded version of the proposal, which hopefully clears up most of the details. I have not included specific implementation details for the code, such as which functions to modify etc. since I think those are not traditionally included in NEPs?
Please find attached the expanded proposal, and the rendered version is available here: https://github.com/Sankarshan-Mudkavi/numpy/blob/Enhance-datetime64/doc/neps...
<datetime-improvement-proposal.rst>
I look forward to comments, agreements/disagreements with this (and clarification if this needs even further expansion).
Please find attached the On Mar 24, 2014, at 12:39 AM, Chris Barker <chris.barker@noaa.gov> wrote:
On Fri, Mar 21, 2014 at 3:43 PM, Nathaniel Smith <njs@pobox.com> wrote: On Thu, Mar 20, 2014 at 11:27 PM, Chris Barker <chris.barker@noaa.gov> wrote:
* I think there are more or less three options: 1) a) don't have any timezone handling at all -- all datetime64s are UTC. Always b) don't have any timezone handling at all -- all datetime64s are naive (the only difference between these two is I/O of strings, and maybe I/O of datetime objects with a time zone) 2) Have a time zone associated with the array -- defaulting to either UTC or None, but don't provide any implementation other than the tagging, with the ability to add in TZ handler if you want (can this be done efficiently?) 3) Full on proper TZ handling.
I think (3) is off the table for now.
I think the first goal is to define what a plain vanilla datetime64 does, without any extra attributes. This is for two practical reasons: First, our overriding #1 goal is to fix the nasty I/O problems that default datetime64's show, so until that's done any other bells and whistles are a distraction. And second, adding parameters to dtypes right now is technically messy.
This rules out (2) and (3).
yup -- though I'm not sure I agree that we need to do this, if we are going to do something more later anyway. But you have a key point - maybe the dtype system simply isn't ready to do it right, and then it may be better not to try.
In which case, we are down to naive or always UTC -- and again, those really aren't very different. Though I prefer naive -- always UTC adds some complication if you don't actually want UTC, and I'm not sure it actually buys us anything. And maybe it's jsut me, but all my code would need to use naive, so I"d be doing a bit of working around to use a UTC-always system.
If we additionally want to keep the option of adding a timezone parameter later, and have the result end up looking like stdlib datetime, then I think 1(b) is the obvious choice. My guess is that this is also what's most compatible with pandas, which is currently keeping its own timezone object outside of the dtype.
Good point, all else being equal, compatability with Pandas would be a good thing.
Any downsides? I guess this would mean that we start raising an error on ISO 8601's with offsets attached, which might annoy some people?
yes, but errors are better than incorrect values...
Writing this made me think of a third option -- tracking, but no real manipulation, of TZ. This would be analogous to the ISO 8601 does -- all it does is note an offset. A given DateTime64 array would have a given offset assigned to it, and the appropriate addition and subtraction would happen at I/O. Offset of 0.00 would be UTC, and there would be a None option for naive.
Please no! An integer offset is a terrible way to represent timezones,
well, it would solve the being able to read ISO strings problem, and being able to perform operations with datetimes in multiple time zones. though I guess you could get most of that with UTC-always.
and hardcoding this would just get in the way of a proper solution.
well, that's a point -- if we think there is any hope of a proper solution down the road, then yes, it would be better not to make that harder.
-Chris
--
Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- Sankarshan Mudkavi Undergraduate in Physics, University of Waterloo www.smudkavi.com
-- Sankarshan Mudkavi Undergraduate in Physics, University of Waterloo www.smudkavi.com
FYI Here are docs for panda of timezone handling wesm worked thru the various issues w.r.t. conversion, localization, and ambiguous zone crossing. http://pandas.pydata.org/pandas-docs/stable/timeseries.html#time-zone-handli... implementation is largely in here: (underlying impl is a datetime64[ns] dtype with a pytz as the timezone) https://github.com/pydata/pandas/blob/master/pandas/tseries/index.py On Fri, Mar 28, 2014 at 4:30 PM, Sankarshan Mudkavi <smudkavi@uwaterloo.ca>wrote:
Hi Nathaniel,
1- You give as an example of "naive" datetime handling:
np.datetime64('2005-02-25T03:00Z') np.datetime64('2005-02-25T03:00')
This IIUC is incorrect. The Z modifier is a timezone offset, and for normal "naive" datetimes would cause an error.
If what I understand from reading: http://thread.gmane.org/gmane.comp.python.numeric.general/53805
It looks like anything other than Z, 00:00 or UTC that has a TZ adjustment would raise an error, and those specific conditions would not (I'm guessing this is because we assume it's UTC (or the same timezone) internally, anything that explicitly tells us it is UTC is acceptable, although that may be just my misreading of it.)
However on output we don't use the Z modifier (which is why it's different from the UTC datetime64).
I will change it to return an error if what I thought is incorrect and also include examples of conversion from datetimes as you requested.
Please let me know if there are any more changes that are required! I look forward to further comments/questions.
Cheers, Sankarshan
On Fri, Mar 28, 2014 at 5:17 AM, Nathaniel Smith <njs@pobox.com> wrote:
On 28 Mar 2014 05:00, "Sankarshan Mudkavi" <smudkavi@uwaterloo.ca> wrote:
Hi all,
Apologies for the delay in following up, here is an expanded version of
the proposal, which hopefully clears up most of the details. I have not included specific implementation details for the code, such as which functions to modify etc. since I think those are not traditionally included in NEPs?
The format seems fine to me. Really the point is just to have a document that we can use as reference when deciding on behaviour, and this does that :-).
Three quick comments:
1- You give as an example of "naive" datetime handling:
np.datetime64('2005-02-25T03:00Z') np.datetime64('2005-02-25T03:00')
This IIUC is incorrect. The Z modifier is a timezone offset, and for normal "naive" datetimes would cause an error.
2- It would be good to include explicitly examples of conversion to and from datetimes alongside the examples of conversions to and from strings.
3- It would be good to (eventually) include some discussion of the impact of the preferred proposal on existing code. E.g., will this break a lot of people's pipelines? (Are people currently *always* adding timezones to their numpy input to avoid the problem, and now will have to switch to the opposite behaviour depending on numpy version?) And we'll want to make sure to get feedback from the pydata@ (pandas) list explicitly, though that can wait until people here have had a chance to respond to the first draft.
Thanks for pushing this forward! -n
Hi all,
Apologies for the delay in following up, here is an expanded version of the proposal, which hopefully clears up most of the details. I have not included specific implementation details for the code, such as which functions to modify etc. since I think those are not traditionally included in NEPs?
Please find attached the expanded proposal, and the rendered version is available here:
https://github.com/Sankarshan-Mudkavi/numpy/blob/Enhance-datetime64/doc/neps...
<datetime-improvement-proposal.rst>
I look forward to comments, agreements/disagreements with this (and clarification if this needs even further expansion).
Please find attached the On Mar 24, 2014, at 12:39 AM, Chris Barker <chris.barker@noaa.gov> wrote:
On Fri, Mar 21, 2014 at 3:43 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Thu, Mar 20, 2014 at 11:27 PM, Chris Barker <chris.barker@noaa.gov> wrote:
* I think there are more or less three options: 1) a) don't have any timezone handling at all -- all datetime64s are UTC. Always b) don't have any timezone handling at all -- all datetime64s are naive (the only difference between these two is I/O of strings, and maybe I/O of datetime objects with a time zone) 2) Have a time zone associated with the array -- defaulting to either UTC or None, but don't provide any implementation other than the tagging, with the ability to add in TZ handler if you want (can this be done efficiently?) 3) Full on proper TZ handling.
I think (3) is off the table for now.
I think the first goal is to define what a plain vanilla datetime64 does, without any extra attributes. This is for two practical reasons: First, our overriding #1 goal is to fix the nasty I/O problems that default datetime64's show, so until that's done any other bells and whistles are a distraction. And second, adding parameters to dtypes right now is technically messy.
This rules out (2) and (3).
yup -- though I'm not sure I agree that we need to do this, if we are going to do something more later anyway. But you have a key point - maybe the dtype system simply isn't ready to do it right, and then it may be better not to try.
In which case, we are down to naive or always UTC -- and again, those really aren't very different. Though I prefer naive -- always UTC adds some complication if you don't actually want UTC, and I'm not sure it actually buys us anything. And maybe it's jsut me, but all my code would need to use naive, so I"d be doing a bit of working around to use a UTC-always system.
If we additionally want to keep the option of adding a timezone parameter later, and have the result end up looking like stdlib datetime, then I think 1(b) is the obvious choice. My guess is that this is also what's most compatible with pandas, which is currently keeping its own timezone object outside of the dtype.
Good point, all else being equal, compatability with Pandas would be a good thing.
Any downsides? I guess this would mean that we start raising an error
on ISO 8601's with offsets attached, which might annoy some people?
yes, but errors are better than incorrect values...
Writing this made me think of a third option -- tracking, but no real manipulation, of TZ. This would be analogous to the ISO 8601 does -- all it does is note an offset. A given DateTime64 array would have a given offset assigned to it, and the appropriate addition and subtraction would happen at I/O. Offset of 0.00 would be UTC, and there would be a None option for naive.
Please no! An integer offset is a terrible way to represent timezones,
well, it would solve the being able to read ISO strings problem, and being able to perform operations with datetimes in multiple time zones. though I guess you could get most of that with UTC-always.
and hardcoding this would just get in the way of a proper solution.
well, that's a point -- if we think there is any hope of a proper solution down the road, then yes, it would be better not to make that harder.
-Chris
--
Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- Sankarshan Mudkavi Undergraduate in Physics, University of Waterloo www.smudkavi.com
-- Sankarshan Mudkavi Undergraduate in Physics, University of Waterloo www.smudkavi.com
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Fri, Mar 28, 2014 at 9:30 PM, Sankarshan Mudkavi <smudkavi@uwaterloo.ca> wrote:
Hi Nathaniel,
1- You give as an example of "naive" datetime handling:
np.datetime64('2005-02-25T03:00Z') np.datetime64('2005-02-25T03:00')
This IIUC is incorrect. The Z modifier is a timezone offset, and for normal "naive" datetimes would cause an error.
If what I understand from reading: http://thread.gmane.org/gmane.comp.python.numeric.general/53805
It looks like anything other than Z, 00:00 or UTC that has a TZ adjustment would raise an error, and those specific conditions would not (I'm guessing this is because we assume it's UTC (or the same timezone) internally, anything that explicitly tells us it is UTC is acceptable, although that may be just my misreading of it.)
If we assume it's UTC, then that's proposal 2, I think :-). My point is just that "naive datetime" already has a specific meaning in Python, and as I understand that meaning, it says that trying to pass a Z timezone to a naive datetime should be an error. As a separate issue, we might decide that we want to continue to allow "Z" modifiers (or all offset modifiers) temporarily in numpy, to avoid breaking code without warning. Just if we do, then we shoudn't say that this is because we are implementing naive datetimes and this is how naive datetimes work. Instead we should either say that we're not implementing naive datetimes, or else say that we're implementing naive datetimes but have some temporary compatibility hacks on top of that (and probably issue a DeprecationWarning if anyone passes a timezone). -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org
On Sat, Mar 29, 2014 at 1:04 PM, Nathaniel Smith <njs@pobox.com> wrote:
1- You give as an example of "naive" datetime handling:
np.datetime64('2005-02-25T03:00Z') np.datetime64('2005-02-25T03:00')
This IIUC is incorrect. The Z modifier is a timezone offset, and for normal "naive" datetimes would cause an error.
I think this is somewhat open for discussion -- yes, it's odd, but in the spirit of practicality beats purity, it seems OK. We could allow any TZ specifier for that matter -- that's kind of how "naive" or "local" timezone (non) handling works -- it's up to the user to make sure that all DTs are in the same timezone. All it would be doing is tossing out some additional information that was in the ISO string. If we are explicitly calling it UTC-always, then anything other than Z or 00:00 (or nothing) would need to be converted. I think when it comes down to it, anything other than "proper" timezone handling will require these user-beware compromises. As a separate issue, we might decide that we want to continue to allow
"Z" modifiers (or all offset modifiers) temporarily in numpy, to avoid breaking code without warning.
Maybe the best tactic -- though it's broken enough now that I'm not sure it matters. A clear direction from here may be a better bet. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On 29 Mar 2014 20:57, "Chris Barker" <chris.barker@noaa.gov> wrote:
I think this is somewhat open for discussion -- yes, it's odd, but in the spirit of practicality beats purity, it seems OK. We could allow any TZ specifier for that matter -- that's kind of how "naive" or "local" timezone (non) handling works -- it's up to the user to make sure that all DTs are in the same timezone.
That isn't how naive timezone handling works in datetime.datetime, though. If you try to mix a timezone (even a Zulu timezone) datetime with a naive datetime, you get an exception. I agree this is open for discussion, but IMO deviating from the stdlib behavior this much would require some more justification. Don't let errors pass silently, etc. -n
On Sat, Mar 29, 2014 at 3:08 PM, Nathaniel Smith <njs@pobox.com> wrote:
I think this is somewhat open for discussion -- yes, it's odd, but in
On 29 Mar 2014 20:57, "Chris Barker" <chris.barker@noaa.gov> wrote: the spirit of practicality beats purity, it seems OK. We could allow any TZ specifier for that matter -- that's kind of how "naive" or "local" timezone (non) handling works -- it's up to the user to make sure that all DTs are in the same timezone.
That isn't how naive timezone handling works in datetime.datetime, though. If you try to mix a timezone (even a Zulu timezone) datetime with a naive datetime, you get an exception.
fari enough. The difference is that datetime.datetime doesn't provide any iso string parsing. The use case I'm imagining is for folks with ISO strings with a Z on the end -- they'll need to deal with pre-parsing the strings to strip off the Z, when it wouldn't change the result. Maybe this is an argument for "UTC always" rather than "naive"? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On 31 Mar 2014 19:47, "Chris Barker" <chris.barker@noaa.gov> wrote:
On Sat, Mar 29, 2014 at 3:08 PM, Nathaniel Smith <njs@pobox.com> wrote:
On 29 Mar 2014 20:57, "Chris Barker" <chris.barker@noaa.gov> wrote:
I think this is somewhat open for discussion -- yes, it's odd, but in
That isn't how naive timezone handling works in datetime.datetime,
the spirit of practicality beats purity, it seems OK. We could allow any TZ specifier for that matter -- that's kind of how "naive" or "local" timezone (non) handling works -- it's up to the user to make sure that all DTs are in the same timezone. though. If you try to mix a timezone (even a Zulu timezone) datetime with a naive datetime, you get an exception.
fari enough.
The difference is that datetime.datetime doesn't provide any iso string
parsing. Sure it does. datetime.strptime, with the %z modifier in particular.
The use case I'm imagining is for folks with ISO strings with a Z on the end -- they'll need to deal with pre-parsing the strings to strip off the Z, when it wouldn't change the result.
Maybe this is an argument for "UTC always" rather than "naive"?
Probably it is, but that approach seems a lot harder to extend to proper tz support later, plus being more likely to cause trouble for pandas's proper tz support now. -n
On Mon, Mar 31, 2014 at 7:19 PM, Nathaniel Smith <njs@pobox.com> wrote:
The difference is that datetime.datetime doesn't provide any iso string parsing.
Sure it does. datetime.strptime, with the %z modifier in particular.
that's not ISO parsing, that's parsing according to a user-defined format string, which can be used for ISO parsing, but the user is in control of how that's done. And I see this: "For a naive object, the %z and %Z format codes are replaced by empty strings." though I'm not entirely sure what that means -- probably only for writing.
The use case I'm imagining is for folks with ISO strings with a Z on the end -- they'll need to deal with pre-parsing the strings to strip off the Z, when it wouldn't change the result.
Maybe this is an argument for "UTC always" rather than "naive"?
Probably it is, but that approach seems a lot harder to extend to proper tz support later, plus being more likely to cause trouble for pandas's proper tz support now.
I was originally advocating for naive to begin with ;-) Someone else pushed for UTC -- I thought it was you! (but I guess not) It seems this committee of two has come to a consensus on naive -- and you're probably right, raise an exception if there is a time zone specifier. -CHB
-n
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On Tue, Apr 1, 2014 at 12:10 PM, Chris Barker <chris.barker@noaa.gov> wrote:
"For a naive object, the %z and %Z format codes are replaced by empty strings."
though I'm not entirely sure what that means -- probably only for writing.
That's right:
from datetime import * datetime.now().strftime('%z') '' datetime.now(timezone.utc).strftime('%z') '+0000'
On Tue, Apr 1, 2014 at 12:10 PM, Chris Barker <chris.barker@noaa.gov> wrote:
It seems this committee of two has come to a consensus on naive -- and you're probably right, raise an exception if there is a time zone specifier.
Count me as +1 on naive, but consider converting garbage (including strings with trailing Z) to NaT.
On Tue, Apr 1, 2014 at 5:22 PM, Alexander Belopolsky <ndarray@mac.com> wrote:
On Tue, Apr 1, 2014 at 12:10 PM, Chris Barker <chris.barker@noaa.gov> wrote:
It seems this committee of two has come to a consensus on naive -- and you're probably right, raise an exception if there is a time zone specifier.
Count me as +1 on naive, but consider converting garbage (including strings with trailing Z) to NaT.
That's not how we handle other types, e.g.: In [5]: a = np.zeros(1, dtype=float) In [6]: a[0] = "garbage" ValueError: could not convert string to float: garbage (Cf, "Errors should never pass silently".) Any reason why datetime64 should be different? -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org
I agree with that interpretation of naive as well. I'll change the proposal to reflect that. So any modifier should raise an error then? (At the risk of breaking people's code.) The only question is, should we consider accepting the modifier and disregard it with a warning, letting the user know that this is only for temporary compatibility purposes? As of now, it's not clear to me which of those options is better. Cheers, Sankarshan On Apr 1, 2014, at 1:12 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Tue, Apr 1, 2014 at 5:22 PM, Alexander Belopolsky <ndarray@mac.com> wrote:
On Tue, Apr 1, 2014 at 12:10 PM, Chris Barker <chris.barker@noaa.gov> wrote:
It seems this committee of two has come to a consensus on naive -- and you're probably right, raise an exception if there is a time zone specifier.
Count me as +1 on naive, but consider converting garbage (including strings with trailing Z) to NaT.
That's not how we handle other types, e.g.:
In [5]: a = np.zeros(1, dtype=float)
In [6]: a[0] = "garbage" ValueError: could not convert string to float: garbage
(Cf, "Errors should never pass silently".) Any reason why datetime64 should be different?
-n
-- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- Sankarshan Mudkavi Undergraduate in Physics, University of Waterloo www.smudkavi.com
On Tue, Apr 1, 2014 at 1:12 PM, Nathaniel Smith <njs@pobox.com> wrote:
In [6]: a[0] = "garbage" ValueError: could not convert string to float: garbage
(Cf, "Errors should never pass silently".) Any reason why datetime64 should be different?
datetime64 is different because it has NaT support from the start. NaN support for floats seems to be an afterthought if not an accident of implementation. And it looks like some errors do pass silently:
a[0] = "1" # not a TypeError
But I withdraw my suggestion. The closer datetime64 behavior is to numeric types the better.
So is the consensus that we don't accept any tags at all (not even temporarily)? Would that break too much existing code? Cheers, Sankarshan On Apr 1, 2014, at 2:50 PM, Alexander Belopolsky <ndarray@mac.com> wrote:
On Tue, Apr 1, 2014 at 1:12 PM, Nathaniel Smith <njs@pobox.com> wrote: In [6]: a[0] = "garbage" ValueError: could not convert string to float: garbage
(Cf, "Errors should never pass silently".) Any reason why datetime64 should be different?
datetime64 is different because it has NaT support from the start. NaN support for floats seems to be an afterthought if not an accident of implementation.
And it looks like some errors do pass silently:
a[0] = "1" # not a TypeError
But I withdraw my suggestion. The closer datetime64 behavior is to numeric types the better.
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- Sankarshan Mudkavi Undergraduate in Physics, University of Waterloo www.smudkavi.com
On Fri, Apr 11, 2014 at 11:25 PM, Sankarshan Mudkavi <smudkavi@uwaterloo.ca> wrote:
So is the consensus that we don't accept any tags at all (not even temporarily)? Would that break too much existing code?
Well, we don't know. If anyone has any ideas on how to figure it out then they should speak up :-). Barring any brilliant suggestions though, I suggest we just go ahead with disallowing all timezone tags for now. We can always change our mind as we get closer to the release and people start experimenting with the new code. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org
On Fri, Apr 11, 2014 at 4:25 PM, Sankarshan Mudkavi <smudkavi@uwaterloo.ca>wrote:
So is the consensus that we don't accept any tags at all (not even temporarily)? Would that break too much existing code?
Cheers, Sankarshan
On Apr 1, 2014, at 2:50 PM, Alexander Belopolsky <ndarray@mac.com> wrote:
On Tue, Apr 1, 2014 at 1:12 PM, Nathaniel Smith <njs@pobox.com> wrote:
In [6]: a[0] = "garbage" ValueError: could not convert string to float: garbage
(Cf, "Errors should never pass silently".) Any reason why datetime64 should be different?
datetime64 is different because it has NaT support from the start. NaN support for floats seems to be an afterthought if not an accident of implementation.
And it looks like some errors do pass silently:
a[0] = "1" # not a TypeError
But I withdraw my suggestion. The closer datetime64 behavior is to numeric types the better.
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Are we in a position to start looking at implementation? If so, it would be useful to have a collection of test cases, i.e., typical uses with specified results. That should also cover conversion from/(to?) datetime.datetime. Chuck
On Fri, Apr 11, 2014 at 3:56 PM, Charles R Harris <charlesr.harris@gmail.com
wrote:
Are we in a position to start looking at implementation? If so, it would be useful to have a collection of test cases, i.e., typical uses with specified results. That should also cover conversion from/(to?) datetime.datetime.
Indeed, my personal wish-list for np.datetime64 is centered much more on robust conversion to/from native date objects, including comparison. Here are some of my particular points of frustration (apologies for the thread jacking!): - NaT should have similar behavior to NaN when used for comparisons (i.e., comparisons should always be False). - You can't compare a datetime object to a datetime64 object. - datetime64 objects with high precision (e.g., ns) can't compare to datetime objects. Pandas has a very nice wrapper around datetime64 arrays that solves most of these issues, but it would be nice to get much of that functionality in core numpy, since I don't always want to store my values in a 1-dimensional array + hash-table (the pandas Index): http://pandas.pydata.org/pandas-docs/stable/timeseries.html Here's code which reproduces all of the above: from numpy import datetime64 from datetime import datetime print np.datetime64('NaT') < np.datetime64('2011-01-01') # this should not to true print datetime(2010, 1, 1) < np.datetime64('2011-01-01') # raises exception print np.datetime64('2011-01-01T00:00', 'ns') > datetime(2010, 1, 1) # another exception print np.datetime64('2011-01-01T00:00') > datetime(2010, 1, 1) # finally something works!
On Fri, Apr 11, 2014 at 7:58 PM, Stephan Hoyer <shoyer@gmail.com> wrote:
print datetime(2010, 1, 1) < np.datetime64('2011-01-01') # raises exception
This is somewhat consistent with
from datetime import * datetime(2010, 1, 1) < date(2010, 1, 1) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: can't compare datetime.datetime to datetime.date
but I would expect date(2010, 1, 1) < np.datetime64('2011-01-01') to return False.
On Fri, Apr 11, 2014 at 4:58 PM, Stephan Hoyer <shoyer@gmail.com> wrote:
On Fri, Apr 11, 2014 at 3:56 PM, Charles R Harris < charlesr.harris@gmail.com> wrote:
Are we in a position to start looking at implementation? If so, it would be useful to have a collection of test cases, i.e., typical uses with specified results. That should also cover conversion from/(to?) datetime.datetime.
yup -- tests are always good! Indeed, my personal wish-list for np.datetime64 is centered much more on
robust conversion to/from native date objects, including comparison.
A good use case.
Here are some of my particular points of frustration (apologies for the thread jacking!): - NaT should have similar behavior to NaN when used for comparisons (i.e., comparisons should always be False).
make sense.
- You can't compare a datetime object to a datetime64 object.
that would be nice to have.
- datetime64 objects with high precision (e.g., ns) can't compare to datetime objects.
That's a problem, but how do you think it should be handled? My thought is that it should round to microseconds, and then compare -- kind of like comparing float32 and float64...
Pandas has a very nice wrapper around datetime64 arrays that solves most of these issues, but it would be nice to get much of that functionality in core numpy,
yes -- it would -- but learning from pandas is certainly a good idea.
from numpy import datetime64 from datetime import datetime
print np.datetime64('NaT') < np.datetime64('2011-01-01') # this should not to true print datetime(2010, 1, 1) < np.datetime64('2011-01-01') # raises exception print np.datetime64('2011-01-01T00:00', 'ns') > datetime(2010, 1, 1) # another exception print np.datetime64('2011-01-01T00:00') > datetime(2010, 1, 1) # finally something works!
now to get them into proper unit tests.... -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
I think we'll be ready to start implementation once I get the conversion to datetime.datetime on the proposal with some decent examples. It would also be great to have opinions on what test cases should be used, so please speak up if you feel you have anything to say about that. Cheers, Sankarshan On Apr 14, 2014, at 2:59 PM, Chris Barker <chris.barker@noaa.gov> wrote:
On Fri, Apr 11, 2014 at 4:58 PM, Stephan Hoyer <shoyer@gmail.com> wrote: On Fri, Apr 11, 2014 at 3:56 PM, Charles R Harris <charlesr.harris@gmail.com> wrote: Are we in a position to start looking at implementation? If so, it would be useful to have a collection of test cases, i.e., typical uses with specified results. That should also cover conversion from/(to?) datetime.datetime.
yup -- tests are always good!
Indeed, my personal wish-list for np.datetime64 is centered much more on robust conversion to/from native date objects, including comparison.
A good use case.
Here are some of my particular points of frustration (apologies for the thread jacking!): - NaT should have similar behavior to NaN when used for comparisons (i.e., comparisons should always be False).
make sense.
- You can't compare a datetime object to a datetime64 object.
that would be nice to have.
- datetime64 objects with high precision (e.g., ns) can't compare to datetime objects.
That's a problem, but how do you think it should be handled? My thought is that it should round to microseconds, and then compare -- kind of like comparing float32 and float64...
Pandas has a very nice wrapper around datetime64 arrays that solves most of these issues, but it would be nice to get much of that functionality in core numpy,
yes -- it would -- but learning from pandas is certainly a good idea.
from numpy import datetime64 from datetime import datetime
print np.datetime64('NaT') < np.datetime64('2011-01-01') # this should not to true print datetime(2010, 1, 1) < np.datetime64('2011-01-01') # raises exception print np.datetime64('2011-01-01T00:00', 'ns') > datetime(2010, 1, 1) # another exception print np.datetime64('2011-01-01T00:00') > datetime(2010, 1, 1) # finally something works!
now to get them into proper unit tests....
-CHB
--
Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- Sankarshan Mudkavi Undergraduate in Physics, University of Waterloo www.smudkavi.com
On Mon, Apr 14, 2014 at 11:59 AM, Chris Barker <chris.barker@noaa.gov>wrote:
- datetime64 objects with high precision (e.g., ns) can't compare to
datetime objects.
That's a problem, but how do you think it should be handled? My thought is that it should round to microseconds, and then compare -- kind of like comparing float32 and float64...
I agree -- if the ns matter, you shouldn't be using datetime.datetime objects. Similarly, it's currently not possible to convert high precision datetime64 objects into datetimes. Worse, this doesn't even raise an error!
from datetime import datetime import numpy as np np.datetime64('2000-01-01T00:00:00Z', 'us').astype(datetime)
datetime.datetime(2000, 1, 1, 0, 0)
np.datetime64('2000-01-01T00:00:00Z', 'ns').astype(datetime)
946684800000000000L Other inconsistent behavior:
np.datetime64('2000', 'M') numpy.datetime64('2000-01') np.datetime64('2000', 'D') numpy.datetime64('2000-01-01') np.datetime64('2000', 's')
TypeError Traceback (most recent call last) <ipython-input-67-bf5fc9a2985b> in <module>() ----> 1 np.datetime64('2000', 's') TypeError: Cannot parse "2000" as unit 's' using casting rule 'same_kind' More broadly, my recommendation is to look through the unit tests for pandas' datetIme handling: https://github.com/pydata/pandas/tree/master/pandas/tseries/tests Not everything is relevant but you might find some test cases you could borrow wholesale. Pandas is BSD licensed, so you may even be able to copy them directly into numpy. Best, Stephan
On 14.04.2014 20:59, Chris Barker wrote:
On Fri, Apr 11, 2014 at 4:58 PM, Stephan Hoyer <shoyer@gmail.com <mailto:shoyer@gmail.com>> wrote:
On Fri, Apr 11, 2014 at 3:56 PM, Charles R Harris <charlesr.harris@gmail.com <mailto:charlesr.harris@gmail.com>> wrote:
Are we in a position to start looking at implementation? If so, it would be useful to have a collection of test cases, i.e., typical uses with specified results. That should also cover conversion from/(to?) datetime.datetime.
yup -- tests are always good!
Indeed, my personal wish-list for np.datetime64 is centered much more on robust conversion to/from native date objects, including comparison.
A good use case.
Here are some of my particular points of frustration (apologies for the thread jacking!): - NaT should have similar behavior to NaN when used for comparisons (i.e., comparisons should always be False).
make sense.
- You can't compare a datetime object to a datetime64 object.
that would be nice to have.
- datetime64 objects with high precision (e.g., ns) can't compare to datetime objects.
That's a problem, but how do you think it should be handled? My thought is that it should round to microseconds, and then compare -- kind of like comparing float32 and float64...
Pandas has a very nice wrapper around datetime64 arrays that solves most of these issues, but it would be nice to get much of that functionality in core numpy,
yes -- it would -- but learning from pandas is certainly a good idea.
from numpy import datetime64 from datetime import datetime
print np.datetime64('NaT') < np.datetime64('2011-01-01') # this should not to true print datetime(2010, 1, 1) < np.datetime64('2011-01-01') # raises exception print np.datetime64('2011-01-01T00:00', 'ns') > datetime(2010, 1, 1) # another exception print np.datetime64('2011-01-01T00:00') > datetime(2010, 1, 1) # finally something works!
now to get them into proper unit tests....
As one further suggestion, I think it would be nice if doing arithmetic using np.datetime64 and datetime.timedelta objects would work: np.datetime64(2011,1,1) + datetime.timedelta(1) == np.datetime64(2011,1,2) And of course, but this is probably in the loop anyways, np.asarray([list_of_datetime.datetime_objects]) should work as expected. -- Andreas.
On 19.04.2014 09:03, Andreas Hilboll wrote:
On 14.04.2014 20:59, Chris Barker wrote:
On Fri, Apr 11, 2014 at 4:58 PM, Stephan Hoyer <shoyer@gmail.com <mailto:shoyer@gmail.com>> wrote:
On Fri, Apr 11, 2014 at 3:56 PM, Charles R Harris <charlesr.harris@gmail.com <mailto:charlesr.harris@gmail.com>> wrote:
Are we in a position to start looking at implementation? If so, it would be useful to have a collection of test cases, i.e., typical uses with specified results. That should also cover conversion from/(to?) datetime.datetime.
yup -- tests are always good!
Indeed, my personal wish-list for np.datetime64 is centered much more on robust conversion to/from native date objects, including comparison.
A good use case.
Here are some of my particular points of frustration (apologies for the thread jacking!): - NaT should have similar behavior to NaN when used for comparisons (i.e., comparisons should always be False).
make sense.
- You can't compare a datetime object to a datetime64 object.
that would be nice to have.
- datetime64 objects with high precision (e.g., ns) can't compare to datetime objects.
That's a problem, but how do you think it should be handled? My thought is that it should round to microseconds, and then compare -- kind of like comparing float32 and float64...
Pandas has a very nice wrapper around datetime64 arrays that solves most of these issues, but it would be nice to get much of that functionality in core numpy,
yes -- it would -- but learning from pandas is certainly a good idea.
from numpy import datetime64 from datetime import datetime
print np.datetime64('NaT') < np.datetime64('2011-01-01') # this should not to true print datetime(2010, 1, 1) < np.datetime64('2011-01-01') # raises exception print np.datetime64('2011-01-01T00:00', 'ns') > datetime(2010, 1, 1) # another exception print np.datetime64('2011-01-01T00:00') > datetime(2010, 1, 1) # finally something works!
now to get them into proper unit tests....
As one further suggestion, I think it would be nice if doing arithmetic using np.datetime64 and datetime.timedelta objects would work:
np.datetime64(2011,1,1) + datetime.timedelta(1) == np.datetime64(2011,1,2)
And of course, but this is probably in the loop anyways, np.asarray([list_of_datetime.datetime_objects]) should work as expected.
One more wish / suggestion from my side (apologies if this isn't the place to make wishes): Array-wide access to the individual datetime components should work, i.e., datetime64array.year should yield an array of dtype int with the years. That would allow boolean indexing to filter data, like datetime64array[datetime64array.year == 2014] would yield all entries from 2014. Cheers, -- Andreas.
On Fri, Apr 25, 2014 at 4:57 AM, Andreas Hilboll <lists@hilboll.de> wrote:
Array-wide access to the individual datetime components should work, i.e.,
datetime64array.year
should yield an array of dtype int with the years. That would allow boolean indexing to filter data, like
datetime64array[datetime64array.year == 2014]
would yield all entries from 2014.
that would be nice, yes, but datetime64 doesn't support anything like that at all -- i.e. array-wide or not access to the components. In this case, you could kludge it with: In [19]: datetimearray Out[19]: array(['2014-02-03', '2013-03-08', '2012-03-07', '2014-04-06'], dtype='datetime64[D]') In [20]: datetimearray[datetimearray.astype('datetime64[Y]') == np.datetime64('2014')] Out[20]: array(['2014-02-03', '2014-04-06'], dtype='datetime64[D]') but that wouldn't work for months, for instance. I think the current NEP should stick with simply fixing the timezone thing -- no new functionality or consequence. But: Maybe it's time for a new NEP for what we want datetime64 to be in the future -- maybe borrow from the blaze proposal cited earlier? Or wait and see how that works out, then maybe port that code over to numpy? In the meantime, a set of utilities that do the kind of things you're looking for might make sense. You could do it as a ndarray subclass, and add those sorts of methods, though ndarray subclasses do get messy.... -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On Thu, Mar 20, 2014 at 7:16 AM, Nathaniel Smith <njs@pobox.com> wrote:
Your NEP suggests making all datetime64s be in UTC, and treating string representations from unknown timezones as UTC.
I recall that it was at some point suggested that epoch be part of dtype. I was not able to find the reasons for a rejection, but it would make perfect sense to keep timezone offset in dtype and treat it effectively as an alternative epoch. The way I like to think about datetime is that YYYY-MM-DD hh:mm:ss.nnn is just a fancy way to represent numbers which is more convoluted than decimal notation, but conceptually not so different. So different units, epochs or timezones are just different ways to convert an abstract notion of a point in time to a specific series of bits inside an array. This is what dtype is for - a description of how abstract numbers are stored in memory.
On Thu, Mar 20, 2014 at 5:53 PM, Alexander Belopolsky <ndarray@mac.com>wrote:
I recall that it was at some point suggested that epoch be part of dtype. I was not able to find the reasons for a rejection,
I don't think it was rejected, it just wasn't adopted by anyone to write a NEP and write the code... I actually think it's silly to allow changing the units without changing the epoch. But the pre-defined epoch works fine for all my use cass, so I'm not going to push that. I also did think it was a separate issue that timezones, and thus shouldn't clutter up the NEP (though one someone is opening the code, it would be a good time to do it..) but it would make perfect sense to keep timezone offset in dtype and treat
it effectively as an alternative epoch.
Hmm -- good point -- if we had a dynamic epoch you could just sift that to account for the time zone offset. Though I think that's an implementation issue. The way I like to think about datetime is that YYYY-MM-DD hh:mm:ss.nnn is
just a fancy way to represent numbers which is more convoluted than decimal notation, but conceptually not so different. So different units, epochs or timezones are just different ways to convert an abstract notion of a point in time to a specific series of bits inside an array. This is what dtype is for - a description of how abstract numbers are stored in memory.
yes -- and also how to convert to/from other types -- which is where the trick is here. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On Wed, Mar 19, 2014 at 7:07 PM, Sankarshan Mudkavi <smudkavi@uwaterloo.ca>wrote:
I've written a rather rudimentary NEP, (lacking in technical details which I will hopefully add after some further discussion and receiving clarification/help on this thread).
Please let me know how to proceed and what you think should be added to the current proposal (attached to this mail).
Here is a rendered version of the same:
https://github.com/Sankarshan-Mudkavi/numpy/blob/Enhance-datetime64/doc/neps...
I've done a bit of copy-editing, and added some more from this discussion. See the pull request on gitHub. There are a fair number of rough edges, but I think we have a consensus among the small group of folks that participated in this discussion anyway, so now "all" we need is someone to actually fix the code. If someone steps up, then we should also go in and add a bunch of unit tests, as discussed in this thread. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
Thank you very much, I will incorporate it! I've been quite busy for the past few weeks but I should be much freer after next week and can pick up on this (fixing the code and actually implement things). Cheers, Sankarshan On Apr 23, 2014, at 5:58 PM, Chris Barker <chris.barker@noaa.gov> wrote:
On Wed, Mar 19, 2014 at 7:07 PM, Sankarshan Mudkavi <smudkavi@uwaterloo.ca> wrote:
I've written a rather rudimentary NEP, (lacking in technical details which I will hopefully add after some further discussion and receiving clarification/help on this thread).
Please let me know how to proceed and what you think should be added to the current proposal (attached to this mail).
Here is a rendered version of the same: https://github.com/Sankarshan-Mudkavi/numpy/blob/Enhance-datetime64/doc/neps...
I've done a bit of copy-editing, and added some more from this discussion. See the pull request on gitHub.
There are a fair number of rough edges, but I think we have a consensus among the small group of folks that participated in this discussion anyway, so now "all" we need is someone to actually fix the code.
If someone steps up, then we should also go in and add a bunch of unit tests, as discussed in this thread.
-CHB
--
Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- Sankarshan Mudkavi Undergraduate in Physics, University of Waterloo www.smudkavi.com
On Apr 23, 2014, at 8:23 PM, Sankarshan Mudkavi <smudkavi@uwaterloo.ca> wrote: I've been quite busy for the past few weeks but I should be much freer after next week and can pick up on this (fixing the code and actually implement things). wonderful! Thanks. Chris Cheers, Sankarshan On Apr 23, 2014, at 5:58 PM, Chris Barker <chris.barker@noaa.gov> wrote: On Wed, Mar 19, 2014 at 7:07 PM, Sankarshan Mudkavi <smudkavi@uwaterloo.ca>wrote:
I've written a rather rudimentary NEP, (lacking in technical details which I will hopefully add after some further discussion and receiving clarification/help on this thread).
Please let me know how to proceed and what you think should be added to the current proposal (attached to this mail).
Here is a rendered version of the same:
https://github.com/Sankarshan-Mudkavi/numpy/blob/Enhance-datetime64/doc/neps...
I've done a bit of copy-editing, and added some more from this discussion. See the pull request on gitHub. There are a fair number of rough edges, but I think we have a consensus among the small group of folks that participated in this discussion anyway, so now "all" we need is someone to actually fix the code. If someone steps up, then we should also go in and add a bunch of unit tests, as discussed in this thread. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Sankarshan Mudkavi Undergraduate in Physics, University of Waterloo www.smudkavi.com _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Thu, Apr 24, 2014 at 10:26 AM, Chris Barker - NOAA Federal < chris.barker@noaa.gov> wrote:
On Apr 23, 2014, at 8:23 PM, Sankarshan Mudkavi <smudkavi@uwaterloo.ca> wrote:
I've been quite busy for the past few weeks but I should be much freer after next week and can pick up on this (fixing the code and actually implement things).
wonderful! Thanks.
Might want to take a look at the datetime proposal<https://github.com/ContinuumIO/blaze/blob/master/docs/design/blaze-datetime.md>for blaze. <snip> Chuck
On Thu, Apr 24, 2014 at 10:07 AM, Charles R Harris < charlesr.harris@gmail.com> wrote:
Might want to take a look at the datetime proposal<https://github.com/ContinuumIO/blaze/blob/master/docs/design/blaze-datetime.md>for blaze.
oh man! not again!. Oh well, that is a decidedly different proposal -- maybe better, I don't know. But it's different enough that I think we should pretty much ignore it for now, and still do a few fixes to make the current datetime64 usable. Maybe as that gets mature, we could adopt it, or something like it, to numpy. Or maybe we'll all be using Blaze then anyway ;-) But thanks for the ping... -CHB
<snip>
Chuck
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
participants (10)
-
Alexander Belopolsky
-
Andreas Hilboll
-
Charles R Harris
-
Chris Barker
-
Chris Barker - NOAA Federal
-
Dave Hirschfeld
-
Jeff Reback
-
Nathaniel Smith
-
Sankarshan Mudkavi
-
Stephan Hoyer