Currently we allow ONLY datetime64[ns] as an internal representation (and analgously timedelta64[ns] for timedeltas). There are several issues where things like this are done: a) Series([np.datetime(2013,1,1),np.datetime(2013,1,2)],dtype='M8[ms]') b) Series([datetime(2013,1,1),datetime(2013,1,2)],dtype='M8[D]') in a) the np.datetimes are by default [us], so we need to do a conversion to M8[ns], ok, can do that to keep the internal rep, but what about the dtype specified? is this effectively an astype, but then is this conceptually just a display thing, e.g. the user wants to view the data as [ms], rather than [ns] several options to think about: 1) ignore completely the passed dtype and do some conversions on np.datetime64 (which we already do) to guarantee a M8[ns] internally (we do this now, but bork on a passed dtype that is not M8[ns] when the data is M8) 2) keep the passed dtype (or the inferred dtype) internally, effectively making datetimes a suite of M8[ms,D,s,ns......] 3) keep data a M8[ns] internally and provide an asfreq which works kind of like the PeriodIndex method, which can provide a DateTimeIndex I guess of the requested frequency? but then I keep thinking, is there any actual difference between 20130101 15:00:01.12345 in [ms], or [ns] (right now no) Any thoughts....I know I am ramblings a bit, but confused over what is even necessary here... Jeff
On Tue, Apr 30, 2013 at 1:42 PM, Jeff Reback <jeffreback@gmail.com> wrote:
Currently we allow ONLY datetime64[ns] as an internal representation (and analgously timedelta64[ns] for timedeltas).
There are several issues where things like this are done:
a) Series([np.datetime(2013,1,1),np.datetime(2013,1,2)],dtype='M8[ms]')
b) Series([datetime(2013,1,1),datetime(2013,1,2)],dtype='M8[D]')
in a) the np.datetimes are by default [us], so we need to do a conversion to M8[ns], ok, can do that to keep the internal rep, but what about the dtype specified? is this effectively an astype, but then is this conceptually just a display thing, e.g. the user wants to view the data as [ms], rather than [ns]
several options to think about:
1) ignore completely the passed dtype and do some conversions on np.datetime64 (which we already do) to guarantee a M8[ns] internally (we do this now, but bork on a passed dtype that is not M8[ns] when the data is M8) 2) keep the passed dtype (or the inferred dtype) internally, effectively making datetimes a suite of M8[ms,D,s,ns......] 3) keep data a M8[ns] internally and provide an asfreq which works kind of like the PeriodIndex method, which can provide a DateTimeIndex I guess of the requested frequency? but then I keep thinking, is there any actual difference between 20130101 15:00:01.12345 in [ms], or [ns] (right now no)
Any thoughts....I know I am ramblings a bit, but confused over what is even necessary here...
Jeff
_______________________________________________ Pandas-dev mailing list Pandas-dev@python.org http://mail.python.org/mailman/listinfo/pandas-dev
Little slow getting back. I'm pretty unhappy with how things turned out in NumPy-- I guess it's my fault for not speaking up when the work was being done in 2010 and 2011, but back then no one in the Scientific Python establishment took pandas very seriously. My thinking has always been we should have either have: a) a single timestamp and timedelta data type, and different lower frequencies (annual, montly, etc.) can be handled by the period data type. This is the approach taken by pandas right now b) A timestamp with parametric units. This is the approach taken in NumPy, but with essentially no APIs to help you with that. I'm fine with always yielding datetime64[ns] out of whatever datetime64 dtype is passed. The NumPy data type system is just an ugly implementation detail at this point, especially in this area. - Wes
These 2 PR's (already mergerd) will basically disallow astype('datetime64[s]') and such so that we ALWAYS have datetime64[ns] internally https://github.com/pydata/pandas/pull/3550 https://github.com/pydata/pandas/pull/3516 This issue: Timestamp should have alternate constructor for UTC millisecond timestamps https://github.com/pydata/pandas/issues/3540 Will allow passing of a 'unit' keyword to have datetime constructors (primarily Timestamp) to correctly interpret passed in values that are not easily discernable, e.g. ints (or epoch times that are say in ms), and provide conversions to internal datetime64[ns] the only remaining issue I think then would be if we want to have some sort of conversion to something like datetime64[s] (as an output). Not sure if this is even useful. Jeff On Thu, May 9, 2013 at 7:32 PM, Wes McKinney <wesmckinn@gmail.com> wrote:
On Tue, Apr 30, 2013 at 1:42 PM, Jeff Reback <jeffreback@gmail.com> wrote:
Currently we allow ONLY datetime64[ns] as an internal representation (and analgously timedelta64[ns] for timedeltas).
There are several issues where things like this are done:
a) Series([np.datetime(2013,1,1),np.datetime(2013,1,2)],dtype='M8[ms]')
b) Series([datetime(2013,1,1),datetime(2013,1,2)],dtype='M8[D]')
in a) the np.datetimes are by default [us], so we need to do a conversion to M8[ns], ok, can do that to keep the internal rep, but what about the dtype specified? is this effectively an astype, but then is this conceptually just a display thing, e.g. the user wants to view the data as [ms], rather than [ns]
several options to think about:
1) ignore completely the passed dtype and do some conversions on np.datetime64 (which we already do) to guarantee a M8[ns] internally (we do this now, but bork on a passed dtype that is not M8[ns] when the data is M8) 2) keep the passed dtype (or the inferred dtype) internally, effectively making datetimes a suite of M8[ms,D,s,ns......] 3) keep data a M8[ns] internally and provide an asfreq which works kind of like the PeriodIndex method, which can provide a DateTimeIndex I guess of the requested frequency? but then I keep thinking, is there any actual difference between 20130101 15:00:01.12345 in [ms], or [ns] (right now no)
Any thoughts....I know I am ramblings a bit, but confused over what is even necessary here...
Jeff
_______________________________________________ Pandas-dev mailing list Pandas-dev@python.org http://mail.python.org/mailman/listinfo/pandas-dev
Little slow getting back. I'm pretty unhappy with how things turned out in NumPy-- I guess it's my fault for not speaking up when the work was being done in 2010 and 2011, but back then no one in the Scientific Python establishment took pandas very seriously.
My thinking has always been we should have either have:
a) a single timestamp and timedelta data type, and different lower frequencies (annual, montly, etc.) can be handled by the period data type. This is the approach taken by pandas right now
b) A timestamp with parametric units. This is the approach taken in NumPy, but with essentially no APIs to help you with that.
I'm fine with always yielding datetime64[ns] out of whatever datetime64 dtype is passed. The NumPy data type system is just an ugly implementation detail at this point, especially in this area.
- Wes _______________________________________________ Pandas-dev mailing list Pandas-dev@python.org http://mail.python.org/mailman/listinfo/pandas-dev
participants (2)
-
Jeff Reback -
Wes McKinney