RFC: A (second) proposal for implementing some date/time types in NumPy

Hi, After tons of excellent feedback received for our first proposal about the date/time types in NumPy Ivan and me have had another brainstorming session and ended with a new proposal for your consideration. While this one does not reap all and every of the suggestions you have made, we think that it does represent a fair balance between capabilities and simplicity and that it can be a solid and efficient basis for build-up more date/time niceties on top of it (read a full-fledged ``DateTime`` array class). Although the proposal is not complete, the essentials are there. So, please read on. We will be glad to hear your opinions. Thanks! -- Francesc Alted ==================================================================== A (second) proposal for implementing some date/time types in NumPy ==================================================================== :Author: Francesc Alted i Abad :Contact: faltet@pytables.com :Author: Ivan Vilata i Balaguer :Contact: ivan@selidor.net :Date: 2008-07-16 Executive summary ================= A date/time mark is something very handy to have in many fields where one has to deal with data sets. While Python has several modules that define a date/time type (like the integrated ``datetime`` [1]_ or ``mx.DateTime`` [2]_), NumPy has a lack of them. In this document, we are proposing the addition of a series of date/time types to fill this gap. The requirements for the proposed types are two-folded: 1) they have to be fast to operate with and 2) they have to be as compatible as possible with the existing ``datetime`` module that comes with Python. Types proposed ============== To start with, it is virtually impossible to come up with a single date/time type that fills the needs of every case of use. So, after pondering about different possibilities, we have stick with *two* different types, namely ``datetime64`` and ``timedelta64`` (these names are preliminary and can be changed), that can have different resolutions so as to cover different needs. **Important note:** the resolution is conceived here as a metadata that *complements* a date/time dtype, *without changing the base type*. Now it goes a detailed description of the proposed types. ``datetime64`` -------------- It represents a time that is absolute (i.e. not relative). It is implemented internally as an ``int64`` type. The internal epoch is POSIX epoch (see [3]_). Resolution ~~~~~~~~~~ It accepts different resolutions and for each of these resolutions, it will support different time spans. The table below describes the resolutions supported with its corresponding time spans. +----------------------+----------------------------------+ | Resolution | Time span (years) | +----------------------+----------------------------------+ | Code | Meaning | | +======================+==================================+ | Y | year | [9.2e18 BC, 9.2e18 AC] | | Q | quarter | [3.0e18 BC, 3.0e18 AC] | | M | month | [7.6e17 BC, 7.6e17 AC] | | W | week | [1.7e17 BC, 1.7e17 AC] | | d | day | [2.5e16 BC, 2.5e16 AC] | | h | hour | [1.0e15 BC, 1.0e15 AC] | | m | minute | [1.7e13 BC, 1.7e13 AC] | | s | second | [ 2.9e9 BC, 2.9e9 AC] | | ms | millisecond | [ 2.9e6 BC, 2.9e6 AC] | | us | microsecond | [290301 BC, 294241 AC] | | ns | nanosecond | [ 1678 AC, 2262 AC] | +----------------------+----------------------------------+ Building a ``datetime64`` dtype ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The proposed way to specify the resolution in the dtype constructor is: Using parameters in the constructor:: dtype('datetime64', res="us") # the default res. is microseconds Using the long string notation:: dtype('datetime64[us]') # equivalent to dtype('datetime64') Using the short string notation:: dtype('T8[us]') # equivalent to dtype('T8') Compatibility issues ~~~~~~~~~~~~~~~~~~~~ This will be fully compatible with the ``datetime`` class of the ``datetime`` module of Python only when using a resolution of microseconds. For other resolutions, the conversion process will loose precision or will overflow as needed. ``timedelta64`` --------------- It represents a time that is relative (i.e. not absolute). It is implemented internally as an ``int64`` type. Resolution ~~~~~~~~~~ It accepts different resolutions and for each of these resolutions, it will support different time spans. The table below describes the resolutions supported with its corresponding time spans. +----------------------+--------------------------+ | Resolution | Time span | +----------------------+--------------------------+ | Code | Meaning | | +======================+==========================+ | W | week | +- 1.7e17 years | | D | day | +- 2.5e16 years | | h | hour | +- 1.0e15 years | | m | minute | +- 1.7e13 years | | s | second | +- 2.9e12 years | | ms | millisecond | +- 2.9e9 years | | us | microsecond | +- 2.9e6 years | | ns | nanosecond | +- 292 years | | ps | picosecond | +- 106 days | | fs | femtosecond | +- 2.6 hours | | as | attosecond | +- 9.2 seconds | +----------------------+--------------------------+ Building a ``timedelta64`` dtype ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The proposed way to specify the resolution in the dtype constructor is: Using parameters in the constructor:: dtype('timedelta64', res="us") # the default res. is microseconds Using the long string notation:: dtype('timedelta64[us]') # equivalent to dtype('datetime64') Using the short string notation:: dtype('t8[us]') # equivalent to dtype('t8') Compatibility issues ~~~~~~~~~~~~~~~~~~~~ This will be fully compatible with the ``timedelta`` class of the ``datetime`` module of Python only when using a resolution of microseconds. For other resolutions, the conversion process will loose precision or will overflow as needed. Example of use ============== Here it is an example of use for the ``datetime64``:: In [10]: t = numpy.zeros(5, dtype="datetime64[ms]") In [11]: t[0] = datetime.datetime.now() # setter in action In [12]: t[0] Out[12]: '2008-07-16T13:39:25.315' # representation in ISO 8601 format In [13]: print t [2008-07-16T13:39:25.315 1970-01-01T00:00:00.0 1970-01-01T00:00:00.0 1970-01-01T00:00:00.0 1970-01-01T00:00:00.0] In [14]: t[0].item() # getter in action Out[14]: datetime.datetime(2008, 7, 16, 13, 39, 25, 315000) In [15]: print t.dtype datetime64[ms] And here it goes an example of use for the ``timedelta64``:: In [8]: t1 = numpy.zeros(5, dtype="datetime64[s]") In [9]: t2 = numpy.ones(5, dtype="datetime64[s]") In [10]: t = t2 - t1 In [11]: t[0] = 24 # setter in action (setting to 24 seconds) In [12]: t[0] Out[12]: 24 # representation as an int64 In [13]: print t [24 1 1 1 1] In [14]: t[0].item() # getter in action Out[14]: datetime.timedelta(0, 24) In [15]: print t.dtype timedelta64[s] Operating with date/time arrays =============================== ``datetime64`` vs ``datetime64`` -------------------------------- The only operation allowed between absolute dates is the subtraction:: In [10]: numpy.ones(5, "T8") - numpy.zeros(5, "T8") Out[10]: array([1, 1, 1, 1, 1], dtype=timedelta64[us]) But not other operations:: In [11]: numpy.ones(5, "T8") + numpy.zeros(5, "T8") TypeError: unsupported operand type(s) for +: 'numpy.ndarray' and 'numpy.ndarray' ``datetime64`` vs ``timedelta64`` --------------------------------- It will be possible to add and subtract relative times from absolute dates:: In [10]: numpy.zeros(5, "T8[Y]") + numpy.ones(5, "t8[Y]") Out[10]: array([1971, 1971, 1971, 1971, 1971], dtype=datetime64[Y]) In [11]: numpy.ones(5, "T8[Y]") - 2 * numpy.ones(5, "t8[Y]") Out[11]: array([1969, 1969, 1969, 1969, 1969], dtype=datetime64[Y]) But not other operations:: In [12]: numpy.ones(5, "T8[Y]") * numpy.ones(5, "t8[Y]") TypeError: unsupported operand type(s) for *: 'numpy.ndarray' and 'numpy.ndarray' ``timedelta64`` vs anything --------------------------- Finally, it will be possible to operate with relative times as if they were regular int64 dtypes *as long as* the result can be converted back into a ``timedelta64``:: In [10]: numpy.ones(5, 't8') Out[10]: array([1, 1, 1, 1, 1], dtype=timedelta64[us]) In [11]: (numpy.ones(5, 't8[M]') + 2) ** 3 Out[11]: array([27, 27, 27, 27, 27], dtype=timedelta64[M]) But:: In [12]: numpy.ones(5, 't8') + 1j TypeError: The result cannot be converted into a ``timedelta64`` dtype/resolution conversions ============================ For changing the date/time dtype of an existing array, we propose to use the ``.astype()`` method. This will be mainly useful for changing resolutions. For example, for absolute dates:: In[10]: t1 = numpy.zeros(5, dtype="datetime64[s]") In[11]: print t1 [1970-01-01T00:00:00 1970-01-01T00:00:00 1970-01-01T00:00:00 1970-01-01T00:00:00 1970-01-01T00:00:00] In[12]: print t1.astype('datetime64[d]') [1970-01-01 1970-01-01 1970-01-01 1970-01-01 1970-01-01] For relative times:: In[10]: t1 = numpy.ones(5, dtype="timedelta64[s]") In[11]: print t1 [1 1 1 1 1] In[12]: print t1.astype('timedelta64[ms]') [1000 1000 1000 1000 1000] Changing directly from/to relative to/from absolute dtypes will not be supported:: In[13]: numpy.zeros(5, dtype="datetime64[s]").astype('timedelta64') TypeError: data type cannot be converted to the desired type Final considerations ==================== Why the ``origin`` metadata disappeared --------------------------------------- During the discussion of the date/time dtypes in the NumPy list, the idea of having an ``origin`` metadata that complemented the definition of the absolute ``datetime64`` was initially found to be useful. However, after thinking more about this, Ivan and me find that the combination of an absolute ``datetime64`` with a relative ``timedelta64`` does offer the same functionality while removing the need for the additional ``origin`` metadata. This is why we have removed it from this proposal. Resolution and dtype issues --------------------------- The date/time dtype's resolution metadata cannot be used in general as part of typical dtype usage. For example, in:: numpy.zeros(5, dtype=numpy.datetime64) we have to found yet a sensible way to pass the resolution. Perhaps the next would work:: numpy.zeros(5, dtype=numpy.datetime64(res='Y')) but we are not sure if this would collide with the spirit of the NumPy dtypes. At any rate, one can always do:: numpy.zeros(5, dtype=numpy.dtype('datetime64', res='Y')) BTW, prior to all of this, one should also elucidate whether:: numpy.dtype('datetime64', res='Y') or:: numpy.dtype('datetime64[Y]') numpy.dtype('T8[Y]') would be a consistent way to instantiate a dtype in NumPy. We do really think that could be a good way, but we would need to hear the opinion of the expert. Travis? .. [1] http://docs.python.org/lib/module-datetime.html .. [2] http://www.egenix.com/products/python/mxBase/mxDateTime .. [3] http://en.wikipedia.org/wiki/Unix_time .. Local Variables: .. mode: rst .. coding: utf-8 .. fill-column: 72 .. End:

Francesc Alted (el 2008-07-16 a les 18:44:36 +0200) va dir::
After tons of excellent feedback received for our first proposal about the date/time types in NumPy Ivan and me have had another brainstorming session and ended with a new proposal for your consideration.
After re-reading the proposal, Francesc and me found some points that needed small corrections and some clarifications or enhancements. Here you have a new version of the proposal. The changes aren't fundamental: * Reference to POSIX-like treatment of leap seconds. * Notes on default resolutions. * Meaning of the stored values. * Usage examples for scalar constructor. * Using an ISO 8601 string as a date value. * Fixed str() and repr() representations. * Note on operations with mixed resolutions. * Other small corrections. Thanks for the feedback! ---- ==================================================================== A (second) proposal for implementing some date/time types in NumPy ==================================================================== :Author: Francesc Alted i Abad :Contact: faltet@pytables.com :Author: Ivan Vilata i Balaguer :Contact: ivan@selidor.net :Date: 2008-07-18 Executive summary ================= A date/time mark is something very handy to have in many fields where one has to deal with data sets. While Python has several modules that define a date/time type (like the integrated ``datetime`` [1]_ or ``mx.DateTime`` [2]_), NumPy has a lack of them. In this document, we are proposing the addition of a series of date/time types to fill this gap. The requirements for the proposed types are two-folded: 1) they have to be fast to operate with and 2) they have to be as compatible as possible with the existing ``datetime`` module that comes with Python. Types proposed ============== To start with, it is virtually impossible to come up with a single date/time type that fills the needs of every case of use. So, after pondering about different possibilities, we have stuck with *two* different types, namely ``datetime64`` and ``timedelta64`` (these names are preliminary and can be changed), that can have different resolutions so as to cover different needs. .. Important:: the resolution is conceived here as metadata that *complements* a date/time dtype, *without changing the base type*. It provides information about the *meaning* of the stored numbers, not about their *structure*. Now follows a detailed description of the proposed types. ``datetime64`` -------------- It represents a time that is absolute (i.e. not relative). It is implemented internally as an ``int64`` type. The internal epoch is the POSIX epoch (see [3]_). Like POSIX, the representation of a date doesn't take leap seconds into account. Resolution ~~~~~~~~~~ It accepts different resolutions, each of them implying a different time span. The table below describes the resolutions supported with their corresponding time spans. ======== =============== ========================== Resolution Time span (years) ------------------------ -------------------------- Code Meaning ======== =============== ========================== Y year [9.2e18 BC, 9.2e18 AC] Q quarter [3.0e18 BC, 3.0e18 AC] M month [7.6e17 BC, 7.6e17 AC] W week [1.7e17 BC, 1.7e17 AC] d day [2.5e16 BC, 2.5e16 AC] h hour [1.0e15 BC, 1.0e15 AC] m minute [1.7e13 BC, 1.7e13 AC] s second [ 2.9e9 BC, 2.9e9 AC] ms millisecond [ 2.9e6 BC, 2.9e6 AC] us microsecond [290301 BC, 294241 AC] ns nanosecond [ 1678 AC, 2262 AC] ======== =============== ========================== When a resolution is not provided, the default resolution of microseconds is used. The value of an absolute date is thus *an integer number of units of the chosen resolution* passed since the internal epoch. Building a ``datetime64`` dtype ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The proposed way to specify the resolution in the dtype constructor is: Using parameters in the constructor:: dtype('datetime64', res="us") # the default res. is microseconds Using the long string notation:: dtype('datetime64[us]') # equivalent to dtype('datetime64') Using the short string notation:: dtype('T8[us]') # equivalent to dtype('T8') Compatibility issues ~~~~~~~~~~~~~~~~~~~~ This will be fully compatible with the ``datetime`` class of the ``datetime`` module of Python only when using a resolution of microseconds. For other resolutions, the conversion process will loose precision or will overflow as needed. The conversion from/to a ``datetime`` object doesn't take leap seconds into account. ``timedelta64`` --------------- It represents a time that is relative (i.e. not absolute). It is implemented internally as an ``int64`` type. Resolution ~~~~~~~~~~ It accepts different resolutions, each of them implying a different time span. The table below describes the resolutions supported with their corresponding time spans. ======== =============== ========================== Resolution Time span ------------------------ -------------------------- Code Meaning ======== =============== ========================== W week +- 1.7e17 years d day +- 2.5e16 years h hour +- 1.0e15 years m minute +- 1.7e13 years s second +- 2.9e12 years ms millisecond +- 2.9e9 years us microsecond +- 2.9e6 years ns nanosecond +- 292 years ps picosecond +- 106 days fs femtosecond +- 2.6 hours as attosecond +- 9.2 seconds ======== =============== ========================== When a resolution is not provided, the default resolution of microseconds is used. The value of a time delta is thus *an integer number of units of the chosen resolution*. Building a ``timedelta64`` dtype ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The proposed way to specify the resolution in the dtype constructor is: Using parameters in the constructor:: dtype('timedelta64', res="us") # the default res. is microseconds Using the long string notation:: dtype('timedelta64[us]') # equivalent to dtype('timedelta64') Using the short string notation:: dtype('t8[us]') # equivalent to dtype('t8') Compatibility issues ~~~~~~~~~~~~~~~~~~~~ This will be fully compatible with the ``timedelta`` class of the ``datetime`` module of Python only when using a resolution of microseconds. For other resolutions, the conversion process will loose precision or will overflow as needed. Example of use ============== Here it is an example of use for the ``datetime64``:: In [5]: numpy.datetime64(42) # use default resolution of "us" Out[5]: datetime64(42, 'us') In [6]: print numpy.datetime64(42) # use default resolution of "us" 1970-01-01T00:00:00.000042 # representation in ISO 8601 format In [7]: print numpy.datetime64(367.7, 'D') # decimal part is lost 1971-01-02 # still ISO 8601 format In [8]: numpy.datetime('2008-07-18T12:23:18', 'm') # from ISO 8601 Out[8]: datetime64(20273063, 'm') In [9]: print numpy.datetime('2008-07-18T12:23:18', 'm') Out[9]: 2008-07-18T12:23 In [10]: t = numpy.zeros(5, dtype="datetime64[ms]") In [11]: t[0] = datetime.datetime.now() # setter in action In [12]: print t [2008-07-16T13:39:25.315 1970-01-01T00:00:00.000 1970-01-01T00:00:00.000 1970-01-01T00:00:00.000 1970-01-01T00:00:00.000] In [13]: t[0].item() # getter in action Out[13]: datetime.datetime(2008, 7, 16, 13, 39, 25, 315000) In [14]: print t.dtype dtype('datetime64[ms]') And here it goes an example of use for the ``timedelta64``:: In [5]: numpy.timedelta64(10) # use default resolution of "us" Out[5]: timedelta64(10, 'us') In [6]: print numpy.timedelta64(10) # use default resolution of "us" 0:00:00.010 In [7]: print numpy.timedelta64(3600.2, 'm') # decimal part is lost 2 days, 12:00 In [8]: t1 = numpy.zeros(5, dtype="datetime64[ms]") In [9]: t2 = numpy.ones(5, dtype="datetime64[ms]") In [10]: t = t2 - t1 In [11]: t[0] = datetime.timedelta(0, 24) # setter in action In [12]: print t [0:00:24.000 0:00:01.000 0:00:01.000 0:00:01.000 0:00:01.000] In [13]: t[0].item() # getter in action Out[13]: datetime.timedelta(0, 24) In [14]: print t.dtype dtype('timedelta64[s]') Operating with date/time arrays =============================== ``datetime64`` vs ``datetime64`` -------------------------------- The only arithmetic operation allowed between absolute dates is the subtraction:: In [10]: numpy.ones(5, "T8") - numpy.zeros(5, "T8") Out[10]: array([1, 1, 1, 1, 1], dtype=timedelta64[us]) But not other operations:: In [11]: numpy.ones(5, "T8") + numpy.zeros(5, "T8") TypeError: unsupported operand type(s) for +: 'numpy.ndarray' and 'numpy.ndarray' Comparisons between absolute dates are allowed. ``datetime64`` vs ``timedelta64`` --------------------------------- It will be possible to add and subtract relative times from absolute dates:: In [10]: numpy.zeros(5, "T8[Y]") + numpy.ones(5, "t8[Y]") Out[10]: array([1971, 1971, 1971, 1971, 1971], dtype=datetime64[Y]) In [11]: numpy.ones(5, "T8[Y]") - 2 * numpy.ones(5, "t8[Y]") Out[11]: array([1969, 1969, 1969, 1969, 1969], dtype=datetime64[Y]) But not other operations:: In [12]: numpy.ones(5, "T8[Y]") * numpy.ones(5, "t8[Y]") TypeError: unsupported operand type(s) for *: 'numpy.ndarray' and 'numpy.ndarray' ``timedelta64`` vs anything --------------------------- Finally, it will be possible to operate with relative times as if they were regular int64 dtypes *as long as* the result can be converted back into a ``timedelta64``:: In [10]: numpy.ones(5, 't8') Out[10]: array([1, 1, 1, 1, 1], dtype=timedelta64[us]) In [11]: (numpy.ones(5, 't8[M]') + 2) ** 3 Out[11]: array([27, 27, 27, 27, 27], dtype=timedelta64[M]) But:: In [12]: numpy.ones(5, 't8') + 1j TypeError: the result cannot be converted into a ``timedelta64`` dtype/resolution conversions ============================ For changing the date/time dtype of an existing array, we propose to use the ``.astype()`` method. This will be mainly useful for changing resolutions. For example, for absolute dates:: In[10]: t1 = numpy.zeros(5, dtype="datetime64[s]") In[11]: print t1 [1970-01-01T00:00:00 1970-01-01T00:00:00 1970-01-01T00:00:00 1970-01-01T00:00:00 1970-01-01T00:00:00] In[12]: print t1.astype('datetime64[d]') [1970-01-01 1970-01-01 1970-01-01 1970-01-01 1970-01-01] For relative times:: In[10]: t1 = numpy.ones(5, dtype="timedelta64[s]") In[11]: print t1 [1 1 1 1 1] In[12]: print t1.astype('timedelta64[ms]') [1000 1000 1000 1000 1000] Changing directly from/to relative to/from absolute dtypes will not be supported:: In[13]: numpy.zeros(5, dtype="datetime64[s]").astype('timedelta64') TypeError: data type cannot be converted to the desired type Final considerations ==================== Why the ``origin`` metadata disappeared --------------------------------------- During the discussion of the date/time dtypes in the NumPy list, the idea of having an ``origin`` metadata that complemented the definition of the absolute ``datetime64`` was initially found to be useful. However, after thinking more about this, we found that the combination of an absolute ``datetime64`` with a relative ``timedelta64`` does offer the same functionality while removing the need for the additional ``origin`` metadata. This is why we have removed it from this proposal. Operations with mixed resolutions --------------------------------- Whenever an operation between two time values of the same dtype with the same resolution is accepted, the same operation with time values of different resolutions should be possible (e.g. adding a time delta in seconds and one in microseconds), resulting in an adequate resolution. The exact semantics of this kind of operations is yet to be defined, though. Resolution and dtype issues --------------------------- The date/time dtype's resolution metadata cannot be used in general as part of typical dtype usage. For example, in:: numpy.zeros(5, dtype=numpy.datetime64) we have yet to find a sensible way to pass the resolution. At any rate, one can explicitly create a dtype:: numpy.zeros(5, dtype=numpy.dtype('datetime64', res='Y')) BTW, prior to all of this, one should also elucidate whether:: numpy.dtype('datetime64', res='Y') or:: numpy.dtype('datetime64[Y]') numpy.dtype('T8[Y]') numpy.dtype('T[Y]') would be a consistent way to instantiate a dtype in NumPy. We do really think that could be a good way, but we would need to hear the opinion of the expert. Travis? .. [1] http://docs.python.org/lib/module-datetime.html .. [2] http://www.egenix.com/products/python/mxBase/mxDateTime .. [3] http://en.wikipedia.org/wiki/Unix_time .. Local Variables: .. mode: rst .. coding: utf-8 .. fill-column: 72 .. End: ---- :: Ivan Vilata i Balaguer @ Welcome to the European Banana Republic! @ http://www.selidor.net/ @ http://www.nosoftwarepatents.com/ @

Hi, Well, as there were no replies to our second proposal for the date/time dtype, I assume that everbody agrees with it ;-) At any rate, we would like to proceed with the implementation phase very soon now. However, it happens that Enthought is sponsoring this job and they clearly stated that the implementation should cover the needs of as much users as possible. So, most in particular, we would like that one of the most heavier users of date/time objects, i.e. the TimeSeries authors, would be comfortable with the new date/time dtypes, and specially that they can benefit from them. For this goal, we are proposing a decoupling of the date/time use cases in two different groups: 1. A pure ``datetime`` dtype (absolute or relative) that would be useful for timestamping purposes in general (i.e. registering dates without a need that they be evenly spaced in time). 2. A class based on the ``frequency`` concept that would be useful for measurements that are done on a regular basis or in business applications. With this, we are preventing the dtype implementation at the core of NumPy from being too cluttered with the relatively complex needs of the ``frequency`` concept users, factoring it out to a external class (``Date`` to follow the TimeSeries naming convention). More importantly, this decoupling will also avoid the mix of those two concepts that, although they are about time measurements, they have quite a different meanings indeed. Another important advantage of this distinction is that the ``datetime`` timestamp requires less meta-information to worry about (basically, the 'resolution' property), while a ``frequency`` à la TimeSeries will need more additional meta-information, like the 'start' and 'end' of periods, as well as a more complex way to code frequencies (there exists much more time-periods to be coded, as it can be seen in [1]_). This can be utterly important to allow the NumPy data based on the ``datetime`` dtype to be quickly saved and retrieved on databases like ZODB (object database) or PyTables (HDF5-based database). Our ultimate goal is that the ``Date`` and ``DateArray`` classes in the TimeSeries would be rewritten in terms of the new date/time dtype so as to get advantage of its features but also for getting rid of duplicated code. I honestly think that this can be a big advantage for TimeSeries indeed (at the cost of taking some time for doing the migration). Does that approach make sense for people? .. [1] http://scipy.org/scipy/scikits/wiki/TimeSeries#Frequencies -- Francesc Alted

Francesc, Could you clarify a couple of points ? [datetime64] If I understand properly, your datetime64 would be time units from the POSIX epoch (1970/01/01 00:00:00), right ? So +7d would be 1970/01/08 (7 days after the epoch) -7W would be 1969/11/13 (7*7 days before the epoch) With this approach, a series [1,2,3,7] at a resolution 'd' would correspond to 1970/01/01, 1970/01/02, 1970/01/03 and 1970/01/07, right ? I'm all for that, **AS LONG AS we have a business day resolution** 'b', so that +7b would be 1970/01/09. [timedelta64] I like your idea of a timedelta64 being relative, but in that case, why not having the same resolutions as datetime64 ? [scikits.timeseries] We can currently perform the following operations in scikits.timeseries
import scikits.timeseries as ts series = ts.date_array(['1970-01', '1970-02', '1970-09'], freq='M') series DateArray([Jan-1970, Feb-1970, Sep-1970], freq='M') series.asfreq('A') DateArray([1970, 1970, 1970], freq='A-DEC') series.asfreq('A-MAR') DateArray([1970, 1970, 1971], freq='A-MAR') "A-MAR" means that year YY ends on 03/31 and that year (YY+1) starts on 04/01.
I use that a lot in my work, when I need to average daily data by water years (a water year starts usually on 04/01 and ends the following 03/31). How would I do that with datetime64 and timedelta64 ? Apart from that, I'd be of course quite happy to help as much as I can. P. ############################################ On Friday 25 July 2008 07:09:33 Francesc Alted wrote:
Hi,
Well, as there were no replies to our second proposal for the date/time dtype, I assume that everbody agrees with it ;-) At any rate, we would like to proceed with the implementation phase very soon now.
However, it happens that Enthought is sponsoring this job and they clearly stated that the implementation should cover the needs of as much users as possible. So, most in particular, we would like that one of the most heavier users of date/time objects, i.e. the TimeSeries authors, would be comfortable with the new date/time dtypes, and specially that they can benefit from them.
For this goal, we are proposing a decoupling of the date/time use cases in two different groups:
1. A pure ``datetime`` dtype (absolute or relative) that would be useful for timestamping purposes in general (i.e. registering dates without a need that they be evenly spaced in time).
2. A class based on the ``frequency`` concept that would be useful for measurements that are done on a regular basis or in business applications.
With this, we are preventing the dtype implementation at the core of NumPy from being too cluttered with the relatively complex needs of the ``frequency`` concept users, factoring it out to a external class (``Date`` to follow the TimeSeries naming convention). More importantly, this decoupling will also avoid the mix of those two concepts that, although they are about time measurements, they have quite a different meanings indeed.
Another important advantage of this distinction is that the ``datetime`` timestamp requires less meta-information to worry about (basically, the 'resolution' property), while a ``frequency`` à la TimeSeries will need more additional meta-information, like the 'start' and 'end' of periods, as well as a more complex way to code frequencies (there exists much more time-periods to be coded, as it can be seen in [1]_). This can be utterly important to allow the NumPy data based on the ``datetime`` dtype to be quickly saved and retrieved on databases like ZODB (object database) or PyTables (HDF5-based database).
Our ultimate goal is that the ``Date`` and ``DateArray`` classes in the TimeSeries would be rewritten in terms of the new date/time dtype so as to get advantage of its features but also for getting rid of duplicated code. I honestly think that this can be a big advantage for TimeSeries indeed (at the cost of taking some time for doing the migration).
Does that approach make sense for people?
.. [1] http://scipy.org/scipy/scikits/wiki/TimeSeries#Frequencies

Hi Pierre, A Friday 25 July 2008, Pierre GM escrigué:
Francesc,
Could you clarify a couple of points ?
[datetime64] If I understand properly, your datetime64 would be time units from the POSIX epoch (1970/01/01 00:00:00), right ? So
+7d would be 1970/01/08 (7 days after the epoch) -7W would be 1969/11/13 (7*7 days before the epoch)
With this approach, a series [1,2,3,7] at a resolution 'd' would correspond to 1970/01/01, 1970/01/02, 1970/01/03 and 1970/01/07, right ?
I'm all for that, **AS LONG AS we have a business day resolution** 'b', so that +7b would be 1970/01/09.
We have been analyzing the addition of a business day resolution into the bag, but this has the problem that such an entity cannot be considered as a 'resolution' as such. The problem is that the business day does not have a fixed time-span (2 days of the week doesn't count, and that introduces a non-regular behaviour in many situations). Having said that, it is apparent that the bussiness day is a **strong requeriment** on your side, and you know that we would like to keep you happy. So, for allowing this to happen, we have concluded that a conceptual change in our second proposal is needed: instead of a 'resolution', we can introduce the 'time unit' concept. A 'time unit' can be considered as an extent of time that doesn't necessarily need to be fixed, but can change depending on the context of use. As the 'time unit' concept has this less restrictive meaning, we think that the user can be easily persuaded that a 'business day' can enter into this definition (which can be difficult/weird to explain in case of using the 'resolution' concept). We have given this some thought, and while it is certain that this will suppose a bit more of complexity (not too much, really). So, yes, we are willing to rewrite the proposal with the new 'time unit' concept and include the 'business day' too. With this, we hope to better serve the needs of the TimeSeries authors and users. Also, adding the 'time unit' concept (and its corresponding infraestructure) into the dtype opens the door to the adoption of other 'XXXX units' inside NumPy so that, for example, people can easily convert from, say, miles and kilometers easily this: lengths_in_miles_array.astype('length[Km]') but well, this is another history.
[timedelta64] I like your idea of a timedelta64 being relative, but in that case, why not having the same resolutions as datetime64 ?
At the beginning our argument to stay with weeks as the minimum resolution for relative times was that the duration of months and years was not well defined (a month can last between 28 and 31 days, and a year 365 or 366 days) for a time that was meant to be *relative* (for example, the duration of a relative month is different if the reference time is June or July). However, after thinking more about this, we think now that a relative time of months or years has a clear meaning indeed: it makes a lot of sense to say "3 months after July 1998" or "5 months before August 2008", i.e. they make complete sense when it is used in combination with an absolute date. One thing that will not be possible though, is to change the time unit of a relative time expressed in say, years, to another time unit expressed in say, days. This is because the impossibility to know how many days has a year that is relative (i.e. not bound to a given year). More in general, it will not be possible to perform 'time unit' conversions between units above and below a relative week (because it is the maximum time unit that has a definite number of seconds). So, yes, will be adding months and years to the relative times too.
[scikits.timeseries] We can currently perform the following operations in scikits.timeseries
import scikits.timeseries as ts series = ts.date_array(['1970-01', '1970-02', '1970-09'], freq='M') series
DateArray([Jan-1970, Feb-1970, Sep-1970], freq='M')
series.asfreq('A')
DateArray([1970, 1970, 1970], freq='A-DEC')
series.asfreq('A-MAR')
DateArray([1970, 1970, 1971], freq='A-MAR') "A-MAR" means that year YY ends on 03/31 and that year (YY+1) starts on 04/01.
I use that a lot in my work, when I need to average daily data by water years (a water year starts usually on 04/01 and ends the following 03/31).
How would I do that with datetime64 and timedelta64 ?
series = numpy.array(['1970-01', '1970-02', '1970-09'],
Well, as we don't like an 'origin' to have part of our proposal, you won't be able to do exactly that with the proposed plain dtype. However, we think that by making a rational use of smaller time units (i.e. with more resolution, using the old convention) and a combination of absolute and relative times, it is easy to cover this use case. To continue with your example, you will be able to do: dtype='T[M]')
series.astype('Y') array([1970, 1970, 1970], dtype='T8[Y]')
series2 = series + 3 # Add 3 relative months series2.astype('Y') array([1970, 1970, 1971], dtype='T8[Y]')
I hope you get the idea.
Apart from that, I'd be of course quite happy to help as much as I can. P.
Well, I really hope that you would be ok with the modifications that we are planning to do for the new (third) proposal. Many thanks! Francesc
############################################
On Friday 25 July 2008 07:09:33 Francesc Alted wrote:
Hi,
Well, as there were no replies to our second proposal for the date/time dtype, I assume that everbody agrees with it ;-) At any rate, we would like to proceed with the implementation phase very soon now.
However, it happens that Enthought is sponsoring this job and they clearly stated that the implementation should cover the needs of as much users as possible. So, most in particular, we would like that one of the most heavier users of date/time objects, i.e. the TimeSeries authors, would be comfortable with the new date/time dtypes, and specially that they can benefit from them.
For this goal, we are proposing a decoupling of the date/time use cases in two different groups:
1. A pure ``datetime`` dtype (absolute or relative) that would be useful for timestamping purposes in general (i.e. registering dates without a need that they be evenly spaced in time).
2. A class based on the ``frequency`` concept that would be useful for measurements that are done on a regular basis or in business applications.
With this, we are preventing the dtype implementation at the core of NumPy from being too cluttered with the relatively complex needs of the ``frequency`` concept users, factoring it out to a external class (``Date`` to follow the TimeSeries naming convention). More importantly, this decoupling will also avoid the mix of those two concepts that, although they are about time measurements, they have quite a different meanings indeed.
Another important advantage of this distinction is that the ``datetime`` timestamp requires less meta-information to worry about (basically, the 'resolution' property), while a ``frequency`` à la TimeSeries will need more additional meta-information, like the 'start' and 'end' of periods, as well as a more complex way to code frequencies (there exists much more time-periods to be coded, as it can be seen in [1]_). This can be utterly important to allow the NumPy data based on the ``datetime`` dtype to be quickly saved and retrieved on databases like ZODB (object database) or PyTables (HDF5-based database).
Our ultimate goal is that the ``Date`` and ``DateArray`` classes in the TimeSeries would be rewritten in terms of the new date/time dtype so as to get advantage of its features but also for getting rid of duplicated code. I honestly think that this can be a big advantage for TimeSeries indeed (at the cost of taking some time for doing the migration).
Does that approach make sense for people?
.. [1] http://scipy.org/scipy/scikits/wiki/TimeSeries#Frequencies
_______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
-- Francesc Alted

On Monday 28 July 2008 12:17:41 Francesc Alted wrote:
So, for allowing this to happen, we have concluded that a conceptual change in our second proposal is needed: instead of a 'resolution', we can introduce the 'time unit' concept.
I'm all for that, thanks !
One thing that will not be possible though, is to change the time unit of a relative time expressed in say, years, to another time unit expressed in say, days. This is because the impossibility to know how many days has a year that is relative (i.e. not bound to a given year).
OK, that makes sense for timedeltas. But would I still be able to add a timedelta['Y'] (in years) to a datetime['D'] (in days) and get the proper result ?
More in general, it will not be possible to perform 'time unit' conversions between units above and below a relative week (because it is the maximum time unit that has a definite number of seconds).
Could you rephrase that ? You're still talking about conversion for timedelta, not datetime, right ?
series.asfreq('A-MAR')
Well, as we don't like an 'origin' to have part of our proposal, you won't be able to do exactly that with the proposed plain dtype.
That's what I was afraid of. Oh well, I'm sure we'll come with a way... Looking forward to reading the third version !

A Monday 28 July 2008, Pierre GM escrigué:
On Monday 28 July 2008 12:17:41 Francesc Alted wrote:
So, for allowing this to happen, we have concluded that a conceptual change in our second proposal is needed: instead of a 'resolution', we can introduce the 'time unit' concept.
I'm all for that, thanks !
One thing that will not be possible though, is to change the time unit of a relative time expressed in say, years, to another time unit expressed in say, days. This is because the impossibility to know how many days has a year that is relative (i.e. not bound to a given year).
OK, that makes sense for timedeltas. But would I still be able to add a timedelta['Y'] (in years) to a datetime['D'] (in days) and get the proper result ?
Hmmm, good point. Well, provided that we plan to set the casting rules so that the time unit of the outcome will be the largest of the time units of the operands, and assuming aproximate values for the number of days in a year (365.2425, i.e. the average year length of the Gregorian calendar) and in a month (30.436875 = 365.2425/12), I think the next operations would be feasible:
numpy.timedelta(20, unit='Y') + numpy.timedelta(365, unit='D') 20 # unit is Year numpy.timedelta(20, unit='Y') + numpy.timedelta(366, unit='D') 21 # unit is Year
numpy.timedelta(43, unit='M') + numpy.timedelta(30, unit='D') 43 # unit is Month numpy.timedelta(43, unit='M') + numpy.timedelta(31, unit='D') 44 # unit is Month
Would that be ok for you?
More in general, it will not be possible to perform 'time unit' conversions between units above and below a relative week (because it is the maximum time unit that has a definite number of seconds).
Could you rephrase that ? You're still talking about conversion for timedelta, not datetime, right ?
Yes. I was talking about the relative timedelta in that case. The initial idea was to forbid conversions among relative timedeltas with different units that imply assumptions in the number of days. But after largely pondering about the example above, I think now that it would be sensible to allow conversions from time units shorter than a week to larger than a week ones (but not the inverse), assuming the truncation of the outcome. For example, the next would be allowed:
numpy.timedelta(43, unit='D').astype("t8[M]") 1 # One complete month numpy.timedelta(365, unit='D').astype("t8[Y]") 0 # Not a complete year
But this would not:
numpy.timedelta(2, unit='M').astype("t8[d]") raise ``IncompatibleUnitError`` # How many days could have 2 months? numpy.timedelta(1, unit='Y').astype("t8[d]") raise ``IncompatibleUnitError`` # How many days could have 1 year?
This will add more complexity to the code, but the functionality looks sensible to my eyes. What do you think?
series.asfreq('A-MAR')
Well, as we don't like an 'origin' to have part of our proposal, you won't be able to do exactly that with the proposed plain dtype.
That's what I was afraid of. Oh well, I'm sure we'll come with a way...
Looking forward to reading the third version !
Well, as we are still discussing and changing things, we would like to wait a bit more until all the dust has settled. But we are looking forward to produce the third version of the proposal before the end of this week. Cheers, -- Francesc Alted

For this goal, we are proposing a decoupling of the date/time use cases in two different groups:
1. A pure ``datetime`` dtype (absolute or relative) that would be useful for timestamping purposes in general (i.e. registering dates without a need that they be evenly spaced in time).
I agree with this split. A basic datetime data type would be useful to a lot of people that don't need fancier time series capabilities. I would recommend focusing on implementing this first as it will probably provide lots of useful learning experiences and examples for the more complicated task of a "frequency" aware date type later on.
2. A class based on the ``frequency`` concept that would be useful for measurements that are done on a regular basis or in business applications. ... Our ultimate goal is that the ``Date`` and ``DateArray`` classes in the TimeSeries would be rewritten in terms of the new date/time dtype so as to get advantage of its features but also for getting rid of duplicated code.
I'm excited to hear such interest in time series work with python and numpy. I certainly support the goals and more collaboration and sharing of code is always a good thing. My biggest concern would be not losing existing functionality. A decent amount of work went into implementing all the different frequencies, and losing any of the currently supported frequencies could mean the difference between the dtype being very useful to someone, or not useful at all. Just thinking out loud here... but in terms of improving on the Date implementation in the timeseries module, it would be nice to have a more "plug in" kind of architecture for implementing different frequencies so that it could be extended more easily with custom frequencies by other users. There is no end to the list of possible frequencies that people might potentially use and the current timeseries implementation isn't as flexibile as it could be in that area. The automatic string parsing has been mentioned before, but it is a feature I am personally very fond of. I use it all the time, and I suspect a lot of people would like it very much if they used it. It's not suited for high performance code, but is fantastic for interactive and ad-hoc work. This is supported right in the "constructor" of the current Date class, along with conversion from datetime objects. I'd love to see such support built into the new date type, although I guess it could be added on easily enough with a factory function. Another extra feature (or hack depending on your point of view) in the timeseries Date class is the addition of a couple extra custom directives for string formatting. Specifically the %q and %Q directives for printing out Quarter information. Obviously these are non-standard directives, but when you are talking about dates with custom frequencies I think it sometimes make sense to have custom format directives. A plug in architecture that somehow lets you define new custom directives for various frequencies would also be really nice. Anyway, I'm very much in support of this initiative. I'm not sure I'll be able to help much on the initial implementation, but once you have a framework in place I may be able to pitch in with some of the details. Please keep us posted. - Matt

On Fri, Jul 25, 2008 at 8:22 PM, Matt Knox <mattknox.ca@gmail.com> wrote:
The automatic string parsing has been mentioned before, but it is a feature I am personally very fond of. I use it all the time, and I suspect a lot of people would like it very much if they used it. It's not suited for high performance code, but is fantastic for interactive and ad-hoc work. This is supported right in the "constructor" of the current Date class, along with conversion from datetime objects. I'd love to see such support built into the new date type, although I guess it could be added on easily enough with a factory function.
There is a module dateutil.parser which is released under the PSF license if there is interest in including something like this. Not sure if it is appropriate for numpy because of the speed implications, but its out there. mpl ships dateutil, so it is already available with all mpl installs. JDH

A Saturday 26 July 2008, Matt Knox escrigué:
For this goal, we are proposing a decoupling of the date/time use cases in two different groups:
1. A pure ``datetime`` dtype (absolute or relative) that would be useful for timestamping purposes in general (i.e. registering dates without a need that they be evenly spaced in time).
I agree with this split. A basic datetime data type would be useful to a lot of people that don't need fancier time series capabilities.
Excellent, this is our thought too.
I would recommend focusing on implementing this first as it will probably provide lots of useful learning experiences and examples for the more complicated task of a "frequency" aware date type later on.
Definitely. We plan to do exactly this.
2. A class based on the ``frequency`` concept that would be useful for measurements that are done on a regular basis or in business applications. ... Our ultimate goal is that the ``Date`` and ``DateArray`` classes in the TimeSeries would be rewritten in terms of the new date/time dtype so as to get advantage of its features but also for getting rid of duplicated code.
I'm excited to hear such interest in time series work with python and numpy. I certainly support the goals and more collaboration and sharing of code is always a good thing. My biggest concern would be not losing existing functionality. A decent amount of work went into implementing all the different frequencies, and losing any of the currently supported frequencies could mean the difference between the dtype being very useful to someone, or not useful at all.
Just thinking out loud here... but in terms of improving on the Date implementation in the timeseries module, it would be nice to have a more "plug in" kind of architecture for implementing different frequencies so that it could be extended more easily with custom frequencies by other users. There is no end to the list of possible frequencies that people might potentially use and the current timeseries implementation isn't as flexibile as it could be in that area.
We completely agree with the idea of the plug-in architecture for the ``Date`` class. Are you thinking in something concrete already?
The automatic string parsing has been mentioned before, but it is a feature I am personally very fond of. I use it all the time, and I suspect a lot of people would like it very much if they used it. It's not suited for high performance code, but is fantastic for interactive and ad-hoc work. This is supported right in the "constructor" of the current Date class, along with conversion from datetime objects. I'd love to see such support built into the new date type, although I guess it could be added on easily enough with a factory function.
Well, what we are planning is to support only three kinds of assignments: - From ``datetime.datetime`` (absolute time) or ``datetime.timedelta`` (relative time) objects. - From integers or floating points numbers (relative time). - From ISO-8601 strings (absolute time). The last input mode does imply a parser, but our intention is to support directly just the standard ISO. We think that if you want to specifiy other string formats it is better to rely on the ``datetime`` parsers or, as John Hunter suggests, the ``dateutil`` module. We believe that incorporating more parsers into the ``Date`` class may represent an unnecessary duplication of code.
Another extra feature (or hack depending on your point of view) in the timeseries Date class is the addition of a couple extra custom directives for string formatting. Specifically the %q and %Q directives for printing out Quarter information. Obviously these are non-standard directives, but when you are talking about dates with custom frequencies I think it sometimes make sense to have custom format directives. A plug in architecture that somehow lets you define new custom directives for various frequencies would also be really nice.
Maybe you are right, yes. However, I'd consider using the ``datetime`` or ``dateutil`` for this first. If there are use cases that escape to existing modules, then we can start thinking about this, but not before.
Anyway, I'm very much in support of this initiative. I'm not sure I'll be able to help much on the initial implementation, but once you have a framework in place I may be able to pitch in with some of the details. Please keep us posted.
Yes, that's the idea. We plan to send a third proposal (tomorrow?) based on the latests suggestions by Pierre. Once we reach a consensus, we will start the implementation of the date/time dtype based on the final proposal (hopefully, the third one). It would be great if, based on this, and before or during the implementation phase of the dtype, you can start thinking about the architecture of the new ``Date`` class (with all the added fanciness that you are proposing) so that we can have time to include possible details that escaped from the final proposal for the date/time dtype. Thanks a lot! -- Francesc Alted

Hi, Sorry for the very long delay in commenting on this. In short, it looks great, and thanks for your efforts. A couple small comments:
In [11]: t[0] = datetime.datetime.now() # setter in action
In [12]: t[0] Out[12]: '2008-07-16T13:39:25.315' # representation in ISO 8601 format
I like that, but what about:
In [8]: t1 = numpy.zeros(5, dtype="datetime64[s]") In [9]: t2 = numpy.ones(5, dtype="datetime64[s]")
In [10]: t = t2 - t1
In [11]: t[0] = 24 # setter in action (setting to 24 seconds)
Is there a way to set in any other units? (hours, days, etc.)
In [12]: t[0] Out[12]: 24 # representation as an int64
why not a "pretty" representation of timedelta64 too? I'd like that better (at least for __str__, perhaps __repr__ should be the raw numbers. how will operations between different types work?
t1 = numpy.ones(5, dtype="timedelta64[s]") t2 = numpy.ones(5, dtype="timedelta64[ms]")
t1 + t2
??????
-Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

A Monday 28 July 2008, Christopher Barker escrigué:
Hi,
Sorry for the very long delay in commenting on this.
Don't worry, we are still in time to receive more comments (but if there is people willing to contribute more comments, hurry up, please!).
In short, it looks great, and thanks for your efforts.
A couple small comments:
In [11]: t[0] = datetime.datetime.now() # setter in action
In [12]: t[0] Out[12]: '2008-07-16T13:39:25.315' # representation in ISO 8601
format
I like that, but what about:
In [8]: t1 = numpy.zeros(5, dtype="datetime64[s]") In [9]: t2 = numpy.ones(5, dtype="datetime64[s]")
In [10]: t = t2 - t1
In [11]: t[0] = 24 # setter in action (setting to 24 seconds)
Is there a way to set in any other units? (hours, days, etc.)
Yes. You will be able to use a scalar ``timedelta64``. For example, if t is an array with dtype = 'timedelta64[s]' (i.e. with a time unit of seconds), you will be able to do the next:
t[0] = numpy.timedelta64(2, unit="[D]")
where you are adding 2 days to the 0-element of t. However, you won't be able to do the next:
t[0] = numpy.timedelta64(2, unit="[M]")
because a month has not a definite number of seconds. This will typically raise a ``TypeError`` exception, or perhaps a ``numpy.IncompatibleUnitError`` which would be more auto-explaining.
In [12]: t[0] Out[12]: 24 # representation as an int64
why not a "pretty" representation of timedelta64 too? I'd like that better (at least for __str__, perhaps __repr__ should be the raw numbers.
That could be an interesting feature. Here it is what the ``datetime`` module does:
delta = datetime.datetime(1980,2,1)-datetime.datetime(1970,1,1) delta.__str__() '3683 days, 0:00:00' delta.__repr__() 'datetime.timedelta(3683)'
For the NumPy ``timedelta64`` with a time unit of days, it could be something like:
delta_days.__str__() '3683 days' delta_days.__repr__() 3683
while for a ``timedelta64`` with a time unit of microseconds it could be:
delta_us.__str__() '3683 days, 3:04:05.000064' delta_us.__repr__() 318222245000064
But I'm open to other suggestions, of course.
how will operations between different types work?
t1 = numpy.ones(5, dtype="timedelta64[s]") t2 = numpy.ones(5, dtype="timedelta64[ms]")
t1 + t2
??????
Yeah. While the proposal stated that these operations should be possible, it is true that the casting rules where not stablished yet. After thinking a bit about this, we find that we should prioritize avoiding overflows rather than trying to keep the maximum precision. With this rule in mind, the outcome will always have the larger of the units in the operands. In your example, t1 + t2 will have '[s]' units. Would that make sense for most of people? Cheers, -- Francesc Alted

A Tuesday 29 July 2008, Francesc Alted escrigué: [snip]
In [12]: t[0] Out[12]: 24 # representation as an int64
why not a "pretty" representation of timedelta64 too? I'd like that better (at least for __str__, perhaps __repr__ should be the raw numbers.
That could be an interesting feature. Here it is what the ``datetime``
module does:
delta = datetime.datetime(1980,2,1)-datetime.datetime(1970,1,1) delta.__str__()
'3683 days, 0:00:00'
delta.__repr__()
'datetime.timedelta(3683)'
For the NumPy ``timedelta64`` with a time unit of days, it could be
something like:
delta_days.__str__()
'3683 days'
delta_days.__repr__()
3683
while for a ``timedelta64`` with a time unit of microseconds it could
be:
delta_us.__str__()
'3683 days, 3:04:05.000064'
delta_us.__repr__()
318222245000064
But I'm open to other suggestions, of course.
Sorry, but I've been a bit inconsistent here as this is documented in the proposal already. Just to clarify things, here it goes the str/repr suggestions (just a bit more populated with examples) in the second version of the second proposal. For absolute times: In [5]: numpy.datetime64(42, 'us') Out[5]: datetime64(42, 'us') In [6]: print numpy.datetime64(42) 1970-01-01T00:00:00.000042 # representation in ISO 8601 format In [7]: print numpy.datetime64(367.7, 'D') # decimal part is lost 1971-01-02 # still ISO 8601 format In [8]: numpy.datetime('2008-07-18T12:23:18', 'm') # from ISO 8601 Out[8]: datetime64(20273063, 'm') In [9]: print numpy.datetime('2008-07-18T12:23:18', 'm') Out[9]: 2008-07-18T12:23 In [10]: t = numpy.zeros(5, dtype="datetime64[D]") In [11]: print t [1970-01-01 1970-01-01 1970-01-01 1970-01-01 1970-01-01] In [12]: repr(t) Out[12]: array([0, 0, 0, 0, 0], dtype="datetime64[D]") In [13]: print t[0] 1970-01-01 In [14]: t[0] Out[14]: datetime64(0, unit='D') In [15]: t[0].item() # getter in action Out[15]: datetime.datetime(1970, 1, 1, 0, 0) For relative times: In [5]: numpy.timedelta64(10, 'us') Out[5]: timedelta64(10, 'us') In [6]: print numpy.timedelta64(10, 'ms') 0:00:00.010 In [7]: print numpy.timedelta64(3600.2, 'm') # decimal part is lost 2 days, 12:00 In [8]: t0 = numpy.zeros(5, dtype="datetime64[ms]") In [9]: t1 = numpy.ones(5, dtype="datetime64[ms]") In [10]: t = t1 - t1 In [11]: t[0] = datetime.timedelta(0, 24) # setter in action In [12]: print t [0:00:24.000 0:00:01.000 0:00:01.000 0:00:01.000 0:00:01.000] In [13]: repr(t) Out[13]: array([24000, 1, 1, 1, 1], dtype="timedelta64[ms]") In [14]: print t[0] 0:00:24.000 In [15]: t[0] Out[15]: timedelta(24000, unit='ms') In [16]: t[0].item() # getter in action Out[16]: datetime.timedelta(0, 24) Cheers, -- Francesc Alted
participants (6)
-
Christopher Barker
-
Francesc Alted
-
Ivan Vilata i Balaguer
-
John Hunter
-
Matt Knox
-
Pierre GM