On Mar 19, 2014, at 10:01 AM, Dave Hirschfeld
Jeff Reback
writes: Dave,
your example is not a problem with numpy per se, rather that the default
generation is in local timezone (same as what python datetime does).
If you localize to UTC you get the results that you expect.
The problem is that the default datetime generation in *numpy* is in local time.
Note that this *is not* the case in Python - it doesn't try to guess the timezone info based on where in the world you run the code, if it's not provided it sets it to None.
In [7]: pd.datetime? Type: type String Form:
Docstring: datetime(year, month, day[, hour[, minute[, second[, microsecond[,tzinfo]]]]]) The year, month and day arguments are required. tzinfo may be None, or an instance of a tzinfo subclass. The remaining arguments may be ints or longs.
In [8]: pd.datetime(2000,1,1).tzinfo is None Out[8]: True
This may be the best solution but as others have pointed out this is more difficult to implement and may have other issues.
I don't want to wait for the best solution - the assume UTC on input/output if not specified will solve the problem and this desperately needs to be fixed because it's completely broken as is IMHO.
If you localize to UTC you get the results that you expect.
That's the whole point - *numpy* needs to localize to UTC, not to whatever timezone you happen to be in when running the code.
In a real-world data analysis problem you don't start with the data in a DataFrame or a numpy array it comes from the web, a csv, Excel, a database and you want to convert it to a DataFrame or numpy array. So what you have from whatever source is a list of tuples of strings and you want to convert them into a typed array.
Obviously you can't localize a string - you have to convert it to a date first and if you do that with numpy the date you have is wrong.
In [108]: dst = np.array(['2014-03-30 00:00', '2014-03-30 01:00', '2014-03- 30 02:00'], dtype='M8[h]') ...: dst ...: Out[108]: array(['2014-03-30T00+0000', '2014-03-30T00+0000', '2014-03- 30T02+0100'], dtype='datetime64[h]')
In [109]: dst.tolist() Out[109]: [datetime.datetime(2014, 3, 30, 0, 0), datetime.datetime(2014, 3, 30, 0, 0), datetime.datetime(2014, 3, 30, 1, 0)]
AFAICS there's no way to get the original dates back once they've passed through numpy's parser!?
-Dave
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Hi all, I've written a rather rudimentary NEP, (lacking in technical details which I will hopefully add after some further discussion and receiving clarification/help on this thread). Please let me know how to proceed and what you think should be added to the current proposal (attached to this mail). Here is a rendered version of the same: https://github.com/Sankarshan-Mudkavi/numpy/blob/Enhance-datetime64/doc/neps... Cheers, Sankarshan -- Sankarshan Mudkavi Undergraduate in Physics, University of Waterloo www.smudkavi.com