The date/time dtype and the casting issue
series = numpy.array(['1970-01-01', '1970-02-01', '1970-09-01'],
series2 = series + numpy.timedelta(1, 'Y') # Add 2 relative years series2 array(['1972-01-01', '1972-02-01', '1972-09-01'],
Hi, During the making of the date/time proposals and the subsequent discussions in this list, we have changed a couple of times our point of view about the way how the castings would work between different date/time types and the different time units (previously called resolutions). So I'd like to expose this issue in detail here, and give yet another new proposal about this, so as to gather feedback from the community before consolidating it in the final date/time proposal. Casting proposal for date/time types ==================================== The operations among the proposed date/time types can be divided in three groups: * Absolute time versus relative time * Absolute time versus absolute time * Relative time versus relative time Now, here are our considerations for each case: Absolute time versus relative time ---------------------------------- We think that in this case the absolute time should have priority for determining the time unit of the outcome. That would represent what the people wants to do most of the times. For example, this would allow to do: dtype='datetime64[D]') dtype='datetime64[D]') # the 'D'ay time unit has been chosen Absolute time versus absolute time ---------------------------------- When operating (basically, only the substraction will be allowed) two absolute times with different unit times, we are proposing that the outcome would be to raise an exception. This is because the ranges and timespans of the different time units can be very different, and it is not clear at all what time unit will be preferred for the user. For example, this should be allowed:
numpy.ones(3, dtype="T8[Y]") - numpy.zeros(3, dtype="T8[Y]") array([1, 1, 1], dtype="timedelta64[Y]")
But the next should not:
numpy.ones(3, dtype="T8[Y]") - numpy.zeros(3, dtype="T8[ns]") raise numpy.IncompatibleUnitError # what unit to choose?
Relative time versus relative time ---------------------------------- This case would be the same than the previous one (absolute vs absolute). Our proposal is to forbid this operation if the time units of the operands are different. For example, this should be allowed:
numpy.ones(3, dtype="t8[Y]") + 3*numpy.ones(3, dtype="t8[Y]") array([4, 4, 4], dtype="timedelta64[Y]")
But the next should not:
numpy.ones(3, dtype="t8[Y]") + numpy.zeros(3, dtype="t8[fs]") raise numpy.IncompatibleUnitError # what unit to choose?
Introducing a time casting function ----------------------------------- As forbidding operations among absolute/absolute and relative/relative types can be unacceptable in many situations, we are proposing an explicit casting mechanism so that the user can inform about the desired time unit of the outcome. For this, a new NumPy function, called, say, ``numpy.change_unit()`` (this name is for the purposes of the discussion and can be changed) will be provided. The signature for the function will be: change_unit(time_object, new_unit, reference) where 'time_object' is the time object whose unit is to be changed, 'new_unit' is the desired new time unit, and 'reference' is an absolute date that will be used to allow the conversion of relative times in case of using time units with an uncertain number of smaller time units (relative years or months cannot be expressed in days). For example, that would allow to do:
numpy.change_unit( numpy.array([1,2], 'T[Y]'), 'T[d]' ) array([365, 731], dtype="datetime64[d]")
or:
ref = numpy.datetime64('1971', 'T[Y]') numpy.change_unit( numpy.array([1,2], 't[Y]'), 't[d]', ref ) array([366, 365], dtype="timedelta64[d]")
Note: we refused to use the ``.astype()`` method because of the additional 'time_reference' parameter that will sound strange for other typical uses of ``.astype()``. Opinions? -- Francesc Alted
series = numpy.array(['1970-01-01', '1970-02-01', '1970-09-01'], dtype='datetime64[D]') series2 = series + numpy.timedelta(1, 'Y') # Add 2 years ^^^
Ops, after reviewing this document, I've discovered a couple of typos. A Tuesday 29 July 2008, Francesc Alted escrigué: [snip] the above line should read:
series2 = series + numpy.timedelta(2, 'Y') # Add 2 years
series2
array(['1972-01-01', '1972-02-01', '1972-09-01'], dtype='datetime64[D]') # the 'D'ay time unit has been chosen
numpy.change_unit( numpy.array([1,2], 'T[Y]'), 'T[d]' )
array([365, 731], dtype="datetime64[d]")
or:
ref = numpy.datetime64('1971', 'T[Y]') numpy.change_unit( numpy.array([1,2], 't[Y]'), 't[d]', ref )
array([366, 365], dtype="timedelta64[d]") ^^^
[snip] the above line should read: array([366, 731], dtype="timedelta64[d]") -- Francesc Alted
numpy.ones(3, dtype="t8[Y]").add(numpy.zeros(3, dtype="t8[fs]"),
Hi, Silent casting is often a source of bugs and I appreciate the strict rules you want to enforce. However, I think there should be a simpler mechanism for operations between different types than creating a copy of a variable with the correct type. My suggestion is to have a dtype argument for methods such as add and subs: dtype="t8[fs]") This way, `implicit` operations (+,-) enforce strict rules, and `explicit` operations (add, subs) let's you do want you want at your own risk. David On Tue, Jul 29, 2008 at 9:12 AM, Francesc Alted <faltet@pytables.org> wrote:
Hi,
During the making of the date/time proposals and the subsequent discussions in this list, we have changed a couple of times our point of view about the way how the castings would work between different date/time types and the different time units (previously called resolutions). So I'd like to expose this issue in detail here, and give yet another new proposal about this, so as to gather feedback from the community before consolidating it in the final date/time proposal.
Casting proposal for date/time types ====================================
The operations among the proposed date/time types can be divided in three groups:
* Absolute time versus relative time
* Absolute time versus absolute time
* Relative time versus relative time
Now, here are our considerations for each case:
Absolute time versus relative time ----------------------------------
We think that in this case the absolute time should have priority for determining the time unit of the outcome. That would represent what the people wants to do most of the times. For example, this would allow to do:
series = numpy.array(['1970-01-01', '1970-02-01', '1970-09-01'], dtype='datetime64[D]') series2 = series + numpy.timedelta(1, 'Y') # Add 2 relative years series2 array(['1972-01-01', '1972-02-01', '1972-09-01'], dtype='datetime64[D]') # the 'D'ay time unit has been chosen
Absolute time versus absolute time ----------------------------------
When operating (basically, only the substraction will be allowed) two absolute times with different unit times, we are proposing that the outcome would be to raise an exception. This is because the ranges and timespans of the different time units can be very different, and it is not clear at all what time unit will be preferred for the user. For example, this should be allowed:
numpy.ones(3, dtype="T8[Y]") - numpy.zeros(3, dtype="T8[Y]") array([1, 1, 1], dtype="timedelta64[Y]")
But the next should not:
numpy.ones(3, dtype="T8[Y]") - numpy.zeros(3, dtype="T8[ns]") raise numpy.IncompatibleUnitError # what unit to choose?
Relative time versus relative time ----------------------------------
This case would be the same than the previous one (absolute vs absolute). Our proposal is to forbid this operation if the time units of the operands are different. For example, this should be allowed:
numpy.ones(3, dtype="t8[Y]") + 3*numpy.ones(3, dtype="t8[Y]") array([4, 4, 4], dtype="timedelta64[Y]")
But the next should not:
numpy.ones(3, dtype="t8[Y]") + numpy.zeros(3, dtype="t8[fs]") raise numpy.IncompatibleUnitError # what unit to choose?
Introducing a time casting function -----------------------------------
As forbidding operations among absolute/absolute and relative/relative types can be unacceptable in many situations, we are proposing an explicit casting mechanism so that the user can inform about the desired time unit of the outcome. For this, a new NumPy function, called, say, ``numpy.change_unit()`` (this name is for the purposes of the discussion and can be changed) will be provided. The signature for the function will be:
change_unit(time_object, new_unit, reference)
where 'time_object' is the time object whose unit is to be changed, 'new_unit' is the desired new time unit, and 'reference' is an absolute date that will be used to allow the conversion of relative times in case of using time units with an uncertain number of smaller time units (relative years or months cannot be expressed in days). For example, that would allow to do:
numpy.change_unit( numpy.array([1,2], 'T[Y]'), 'T[d]' ) array([365, 731], dtype="datetime64[d]")
or:
ref = numpy.datetime64('1971', 'T[Y]') numpy.change_unit( numpy.array([1,2], 't[Y]'), 't[d]', ref ) array([366, 365], dtype="timedelta64[d]")
Note: we refused to use the ``.astype()`` method because of the additional 'time_reference' parameter that will sound strange for other typical uses of ``.astype()``.
Opinions?
-- Francesc Alted _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Francesc,
Absolute time versus relative time ----------------------------------
We think that in this case the absolute time should have priority for determining the time unit of the outcome.
+1
Absolute time versus absolute time ----------------------------------
When operating (basically, only the substraction will be allowed) two absolute times with different unit times, we are proposing that the outcome would be to raise an exception.
+1 (However, I don't think that np.zeros(3, dtype="T8[Y]") is the most useful example ;))
Relative time versus relative time ----------------------------------
This case would be the same than the previous one (absolute vs absolute). Our proposal is to forbid this operation if the time units of the operands are different.
Mmh, less sure on this one. Can't we use a hierarchy of time units, and force to the lowest ? For example:
numpy.ones(3, dtype="t8[Y]") + 3*numpy.ones(3, dtype="t8[M]") array([15,15,15], dtype="t8['M']")
I agree that adding ns to years makes no sense, but ns to s ? min to hr or days ? In short: systematically raising an exception looks a bit too drastic. There are some simple unambiguous cases that sould be allowed (Y+M, Y+Q, M+Q, H+D...)
Introducing a time casting function -----------------------------------
change_unit(time_object, new_unit, reference)
where 'time_object' is the time object whose unit is to be changed, 'new_unit' is the desired new time unit, and 'reference' is an absolute date that will be used to allow the conversion of relative times in case of using time units with an uncertain number of smaller time units (relative years or months cannot be expressed in days).
reference default to the POSIX epoch, right ? So this function could be a first step towards our problem of frequency conversion...
Note: we refused to use the ``.astype()`` method because of the additional 'time_reference' parameter that will sound strange for other typical uses of ``.astype()``.
A method would be really, really helpful, though... Back to a previous email:
numpy.timedelta(20, unit='Y') + numpy.timedelta(365, unit='D') 20 # unit is Year
I would have expected days, or an exception (as there's an ambiguity in the length in days of a year)
numpy.timedelta(20, unit='Y') + numpy.timedelta(366, unit='D') 21 # unit is Year
numpy.timedelta(43, unit='M') + numpy.timedelta(30, unit='D') 43 # unit is Month
numpy.timedelta(43, unit='M') + numpy.timedelta(31, unit='D') 44 # unit is Month
Would that be ok for you?
Gah, I dunno. Adding relative values is always tricky... I understand the last statement as 43 months and 31 days, which could be 44 months if we're speaking in months, or 3 years, 7 months, and 31 days...
Francesc, The datetime proposal is very impressive in its depth and thought. For me as well as many other people this would be a massive improvement to numpy and allow numpy to get a foothold in areas like econometrics where R/S is now dominant. I had one question regarding casting of strings: I think it would be ideal if things like the following worked:
series = numpy.array(['1970-02-01','1970-09-01'], dtype = 'datetime64[D]') series == '1970-02-01' [True, False]
I view this as similar to:
series = numpy.array([1,2,3], dtype=float) series == 2 [False,True,False]
1. However it does numpy recognizes that an int is comparable with a float and does the float cast. I think you want the same behavior between strings that parse into dates and date arrays. Some might object that the relationship between string and date is more tenuous than float and int, which is true, but having used my own homespun date array numpy extension for over a year, I've found that the first thing I did was wrap it into an object that handles these string->date translations elegantly and that made it infinately more usable from an ipython session. 2. Even more important to me, however, is the issue of date parsing. The mx library does many things badly but it does do a great job of parsing dates of many formats. When you parse '1/1/95' or 1995-01-01' it knows that you mean 19950101 which is really nice. I believe the scipy timeseries code for parsing dates is based on it. I would highly suggest starting with that level of functionality. The one major issue with it is an uninterpretable date doesn't throw an error but becomes whatever date is right now. That is obviously unfavorable. 3. Finally my current implementation uses floats uses nan to represent an invalid date. When you assign an element of an date array to None it uses nan as the value. When you assign a real date it puts in the equivalent floating point value. I have found this to be hugely beneficial and just wanted to float the idea of reserving a value to indicate the floating point equivalent of nan. People might prefer masked arrays as a solution, but I just wanted to float the idea. Forgive me if any of this has already been covered. There has been a lot of volume on this subject and I've tried to read it all diligently but may have missed a point or two. --Tom
A Tuesday 29 July 2008, David Huard escrigué:
Hi,
Silent casting is often a source of bugs and I appreciate the strict rules you want to enforce. However, I think there should be a simpler mechanism for operations between different types than creating a copy of a variable with the correct type.
My suggestion is to have a dtype argument for methods such as add and subs:
numpy.ones(3, dtype="t8[Y]").add(numpy.zeros(3, dtype="t8[fs]"),
dtype="t8[fs]")
This way, `implicit` operations (+,-) enforce strict rules, and `explicit` operations (add, subs) let's you do want you want at your own risk.
Hmm, the idea of the ``.add()`` and ``.subtract()`` methods is tempting, but I not sure it is a good idea to add new methods to the ndarray object that are meant to be used with just the date/time dtype. I'm afraid that I'm -1 here. Cheers, -- Francesc Alted
On Tuesday 29 July 2008 14:08:28 Francesc Alted wrote:
A Tuesday 29 July 2008, David Huard escrigué:
Hmm, the idea of the ``.add()`` and ``.subtract()`` methods is tempting, but I not sure it is a good idea to add new methods to the ndarray object that are meant to be used with just the date/time dtype.
I'm afraid that I'm -1 here.
I fully agree with Francesc, .add and .subtract will be quite confusing. About inplace conversions, the right-end (other) is cast to the type of the left end (self) by default following the basic rule of casting when there's no ambiguity and raising an exception otherwise ?
David Huard (el 2008-07-29 a les 12:31:54 -0400) va dir::
Silent casting is often a source of bugs and I appreciate the strict rules you want to enforce. However, I think there should be a simpler mechanism for operations between different types than creating a copy of a variable with the correct type.
My suggestion is to have a dtype argument for methods such as add and subs:
numpy.ones(3, dtype="t8[Y]").add(numpy.zeros(3, dtype="t8[fs]"), dtype="t8[fs]")
This way, `implicit` operations (+,-) enforce strict rules, and `explicit` operations (add, subs) let's you do want you want at your own risk.
Umm, that looks like a big change (or addition) to the NumPy interface. I think similar "include a dtype argument for method X" issues hava been discussed before in the list. However, given the big change of adding the new explicit operation methods I think your proposal falls beyond the scope of the project being discussed. However, since yours isn't necessarily a time-related proposal, you may ask what people think of it in a separate thread. :: Ivan Vilata i Balaguer @ Intellectual Monopoly hinders Innovation! @ http://www.selidor.net/ @ http://www.nosoftwarepatents.com/ @
A Tuesday 29 July 2008, Tom Denniston escrigué:
Francesc,
The datetime proposal is very impressive in its depth and thought. For me as well as many other people this would be a massive improvement to numpy and allow numpy to get a foothold in areas like econometrics where R/S is now dominant.
I had one question regarding casting of strings:
I think it would be ideal if things like the following worked:
series = numpy.array(['1970-02-01','1970-09-01'], dtype = 'datetime64[D]') series == '1970-02-01'
[True, False]
I view this as similar to:
series = numpy.array([1,2,3], dtype=float) series == 2
[False,True,False]
Good point. Well, I agree that adding the support for setting elements from strings, i.e.:
t = numpy.ones(3, 'T8[D]') t[0] = '2001-01-01'
should be supported. With this, and appyling the broadcasting rules, then the next:
t == '2001-01-01' [True, False, False]
should work without problems. We will try to add this explicitely into the new proposal.
1. However it does numpy recognizes that an int is comparable with a float and does the float cast. I think you want the same behavior between strings that parse into dates and date arrays. Some might object that the relationship between string and date is more tenuous than float and int, which is true, but having used my own homespun date array numpy extension for over a year, I've found that the first thing I did was wrap it into an object that handles these string->date translations elegantly and that made it infinately more usable from an ipython session.
Well, you should not worry because of this. Hopefully, in the
t == '2001-01-01'
comparison, the scalar part of the expression can be casted into a date array, and then the proper comparison will be performed. If this cannot be done for some reason that scapes me, one will always be able to do:
t == N.datetime64('2001-01-01', 'Y') [True, False, False]
which is a bit more verbose, but much more clear too.
2. Even more important to me, however, is the issue of date parsing. The mx library does many things badly but it does do a great job of parsing dates of many formats. When you parse '1/1/95' or 1995-01-01' it knows that you mean 19950101 which is really nice. I believe the scipy timeseries code for parsing dates is based on it. I would highly suggest starting with that level of functionality. The one major issue with it is an uninterpretable date doesn't throw an error but becomes whatever date is right now. That is obviously unfavorable.
Hmmm. We would not like to clutter too much the NumPy core with too much date string parsing code. As it is said in the proposal, we only plan to support the parsing for the ISO 8601. That should be enough for most of purposes. However, I'm sure that parsing for other formats will be available in the ``Date`` class of the TimeSeries package.
3. Finally my current implementation uses floats uses nan to represent an invalid date. When you assign an element of an date array to None it uses nan as the value. When you assign a real date it puts in the equivalent floating point value. I have found this to be hugely beneficial and just wanted to float the idea of reserving a value to indicate the floating point equivalent of nan. People might prefer masked arrays as a solution, but I just wanted to float the idea.
Hmm, that's another very valid point. In fact, Ivan and me had already foreseen the existence of a NaT (Not A Time), as the maximum negative integer (-2**63). However, as the underlying type of the proposed time type is an int64, the arithmetic operations with the time types will be done through integer arithmetic, and unfortunately, the majority of platforms out there perform this kind of arithmetic as two's-complement arithmetic. That means that there is not provision for handling NaT's in hardware: In [58]: numpy.int64(-2**63) Out[58]: -9223372036854775808 # this is a NaT In [59]: numpy.int64(-2**63)+1 Out[59]: -9223372036854775807 # no longer a NaT In [60]: numpy.int64(-2**63)-1 Out[60]: 9223372036854775807 # idem, and besides, positive! So, well, due to this limitation, I'm afraid that we will have to live without a proper handling of NaT times. Perhaps this would be the biggest limitation of choosing int64 as the base type of the date/time dtype (float64 is better in that regard, but has also its disadvantages, like the variable precision which is intrinsic to it).
Forgive me if any of this has already been covered. There has been a lot of volume on this subject and I've tried to read it all diligently but may have missed a point or two.
Not at all. You've touched important issues. Thanks! -- Francesc Alted
Tom Denniston (el 2008-07-29 a les 12:21:39 -0500) va dir::
[...] I think it would be ideal if things like the following worked:
series = numpy.array(['1970-02-01','1970-09-01'], dtype = 'datetime64[D]') series == '1970-02-01' [True, False]
I view this as similar to:
series = numpy.array([1,2,3], dtype=float) series == 2 [False,True,False]
1. However it does numpy recognizes that an int is comparable with a float and does the float cast. I think you want the same behavior between strings that parse into dates and date arrays. Some might object that the relationship between string and date is more tenuous than float and int, which is true, but having used my own homespun date array numpy extension for over a year, I've found that the first thing I did was wrap it into an object that handles these string->date translations elegantly and that made it infinately more usable from an ipython session.
That may be feasible as long as there is a very clear rule for what time units you get given a string. For instance, '1970' could yield years and '1970-03-12T12:00' minutes, but then we don't have a way of creating a time in business days... However, it looks interesting. Any more people interested in this behaviour?
2. Even more important to me, however, is the issue of date parsing. The mx library does many things badly but it does do a great job of parsing dates of many formats. When you parse '1/1/95' or 1995-01-01' it knows that you mean 19950101 which is really nice. I believe the scipy timeseries code for parsing dates is based on it. I would highly suggest starting with that level of functionality. The one major issue with it is an uninterpretable date doesn't throw an error but becomes whatever date is right now. That is obviously unfavorable.
Umm, that may get quite complex. E.g. does '1/2/95' refer to February the 1st. or January the 2nd.? There are sooo many date formats and standards that maybe using an external parser code (like mx, TimeSeries or even datetime/strptime) for them would be preferable. I think the ISO 8601 is enough for a basic, well defined time string support. At least to start with.
3. Finally my current implementation uses floats uses nan to represent an invalid date. When you assign an element of an date array to None it uses nan as the value. When you assign a real date it puts in the equivalent floating point value. I have found this to be hugely beneficial and just wanted to float the idea of reserving a value to indicate the floating point equivalent of nan. People might prefer masked arrays as a solution, but I just wanted to float the idea. [...]
Good news! Our next proposal includes a "Not a Time" value which came around due to the impossibility of converting some times into business days. Stay tuned. However I should point out that the NaT value isn't as powerful as the floating-point NaN, since the former is completely lacking of any sense to hardware, and patching that in all cases would make computations quite slower. Using floating point values doesn't look like an option anymore, since they don't have a fixed precision given a time unit. Cheers, :: Ivan Vilata i Balaguer @ Intellectual Monopoly hinders Innovation! @ http://www.selidor.net/ @ http://www.nosoftwarepatents.com/ @
Pierre GM (el 2008-07-29 a les 12:38:19 -0400) va dir::
Relative time versus relative time ----------------------------------
This case would be the same than the previous one (absolute vs absolute). Our proposal is to forbid this operation if the time units of the operands are different.
Mmh, less sure on this one. Can't we use a hierarchy of time units, and force to the lowest ? For example:
numpy.ones(3, dtype="t8[Y]") + 3*numpy.ones(3, dtype="t8[M]") array([15,15,15], dtype="t8['M']")
I agree that adding ns to years makes no sense, but ns to s ? min to hr or days ? In short: systematically raising an exception looks a bit too drastic. There are some simple unambiguous cases that sould be allowed (Y+M, Y+Q, M+Q, H+D...)
Do you mean using the most precise unit for operations with "near enough", different units? I see the point, but what makes me doubt about it is giving the user the false impression that the most precise unit is *always* expected. I'd rather spare the user as many surprises as possible, by simplifying rules in favour of explicitness (but that may be debated).
Introducing a time casting function -----------------------------------
change_unit(time_object, new_unit, reference)
where 'time_object' is the time object whose unit is to be changed, 'new_unit' is the desired new time unit, and 'reference' is an absolute date that will be used to allow the conversion of relative times in case of using time units with an uncertain number of smaller time units (relative years or months cannot be expressed in days).
reference default to the POSIX epoch, right ? So this function could be a first step towards our problem of frequency conversion...
Note: we refused to use the ``.astype()`` method because of the additional 'time_reference' parameter that will sound strange for other typical uses of ``.astype()``.
A method would be really, really helpful, though... [...]
Yay, but what doesn't seem to fit for me is that the method would only have sense to time values. NumPy is pretty orthogonal in that every method and attribute applies to every type. However, if "units" were to be adopted by NumPy, the method would fit in well. In fact, we are thinking of adding a ``unit`` attribute to dtypes to support time units (being ``None`` for normal NumPy types). But full unit support in NumPy looks so far away that I'm not sure to adopt the method. Thanks for the insights. Cheers, :: Ivan Vilata i Balaguer @ Intellectual Monopoly hinders Innovation! @ http://www.selidor.net/ @ http://www.nosoftwarepatents.com/ @
On Tuesday 29 July 2008 15:14:13 Ivan Vilata i Balaguer wrote:
Pierre GM (el 2008-07-29 a les 12:38:19 -0400) va dir::
Relative time versus relative time ----------------------------------
This case would be the same than the previous one (absolute vs absolute). Our proposal is to forbid this operation if the time units of the operands are different.
Mmh, less sure on this one. Can't we use a hierarchy of time units, and force to the lowest ?
For example:
numpy.ones(3, dtype="t8[Y]") + 3*numpy.ones(3, dtype="t8[M]") array([15,15,15], dtype="t8['M']")
I agree that adding ns to years makes no sense, but ns to s ? min to hr or days ? In short: systematically raising an exception looks a bit too drastic. There are some simple unambiguous cases that sould be allowed (Y+M, Y+Q, M+Q, H+D...)
Do you mean using the most precise unit for operations with "near enough", different units? I see the point, but what makes me doubt about it is giving the user the false impression that the most precise unit is *always* expected. I'd rather spare the user as many surprises as possible, by simplifying rules in favour of explicitness (but that may be debated).
Let me rephrase: Adding different relative time units should be allowed when there's no ambiguity on the output: For example, a relative year timedelta is always 12 month timedeltas, or 4 quarter timedeltas. In that case, I should be able to do:
numpy.ones(3, dtype="t8[Y]") + 3*numpy.ones(3, dtype="t8[M]") array([15,15,15], dtype="t8['M']") numpy.ones(3, dtype="t8[Y]") + 3*numpy.ones(3, dtype="t8[Q]") array([7,7,7], dtype="t8['Q']")
Similarly: * an hour is always 3600s, so I could add relative s/ms/us/ns timedeltas to hour timedeltas, and get the result in s/ms/us/ns. * A day is always 24h, so I could add relative hours and days timedeltas and get an hour timedelta * A week is always 7d, so W+D -> D However: * We can't tell beforehand how much days are in any month, so adding relative days and months would raise an exception. * Same thing with weeks and months/quarters/years There'll be only a limited number of time units, therefore a limited number of potential combinations between time units. It'd be just a matter of listing which ones are allowed and which ones will raise an exception.
Note: we refused to use the ``.astype()`` method because of the additional 'time_reference' parameter that will sound strange for other typical uses of ``.astype()``.
A method would be really, really helpful, though... [...]
Yay, but what doesn't seem to fit for me is that the method would only have sense to time values.
Well, what about a .tounit(new_unit, reference=None) ? By default, the reference would be None and default to the POSIX epoch. We could also go for .totunit (for to time unit)
NumPy is pretty orthogonal in that every method and attribute applies to every type. However, if "units" were to be adopted by NumPy, the method would fit in well. In fact, we are thinking of adding a ``unit`` attribute to dtypes to support time units (being ``None`` for normal NumPy types). But full unit support in NumPy looks so far away that I'm not sure to adopt the method.
Thanks for the insights. Cheers,
Pierre GM (el 2008-07-29 a les 15:47:52 -0400) va dir::
On Tuesday 29 July 2008 15:14:13 Ivan Vilata i Balaguer wrote:
Pierre GM (el 2008-07-29 a les 12:38:19 -0400) va dir::
Relative time versus relative time ----------------------------------
This case would be the same than the previous one (absolute vs absolute). Our proposal is to forbid this operation if the time units of the operands are different.
Mmh, less sure on this one. Can't we use a hierarchy of time units, and force to the lowest ?
For example:
numpy.ones(3, dtype="t8[Y]") + 3*numpy.ones(3, dtype="t8[M]") array([15,15,15], dtype="t8['M']")
I agree that adding ns to years makes no sense, but ns to s ? min to hr or days ? In short: systematically raising an exception looks a bit too drastic. There are some simple unambiguous cases that sould be allowed (Y+M, Y+Q, M+Q, H+D...)
Do you mean using the most precise unit for operations with "near enough", different units? I see the point, but what makes me doubt about it is giving the user the false impression that the most precise unit is *always* expected. I'd rather spare the user as many surprises as possible, by simplifying rules in favour of explicitness (but that may be debated).
Let me rephrase: Adding different relative time units should be allowed when there's no ambiguity on the output: For example, a relative year timedelta is always 12 month timedeltas, or 4 quarter timedeltas. In that case, I should be able to do:
numpy.ones(3, dtype="t8[Y]") + 3*numpy.ones(3, dtype="t8[M]") array([15,15,15], dtype="t8['M']") numpy.ones(3, dtype="t8[Y]") + 3*numpy.ones(3, dtype="t8[Q]") array([7,7,7], dtype="t8['Q']")
Similarly: * an hour is always 3600s, so I could add relative s/ms/us/ns timedeltas to hour timedeltas, and get the result in s/ms/us/ns. * A day is always 24h, so I could add relative hours and days timedeltas and get an hour timedelta * A week is always 7d, so W+D -> D
However: * We can't tell beforehand how much days are in any month, so adding relative days and months would raise an exception. * Same thing with weeks and months/quarters/years
There'll be only a limited number of time units, therefore a limited number of potential combinations between time units. It'd be just a matter of listing which ones are allowed and which ones will raise an exception.
That's "keep the precision" over "keep the range". At first Francesc and I opted for "keep the range" because that's what NumPy does, e.g. when operating an int64 with an uint64. Then, since we weren't sure about what the best choice would be for the majority of users, we decided upon letting (or forcing) the user to be explicit. However, the use of time units and integer values is precisely intended to "keep the precision", and overflow won't be so frequent given the correct time unit and the span of uint64, so you may be right in the end. :)
Note: we refused to use the ``.astype()`` method because of the additional 'time_reference' parameter that will sound strange for other typical uses of ``.astype()``.
A method would be really, really helpful, though... [...]
Yay, but what doesn't seem to fit for me is that the method would only have sense to time values.
Well, what about a .tounit(new_unit, reference=None) ? By default, the reference would be None and default to the POSIX epoch. We could also go for .totunit (for to time unit)
Yes, that'd be the signature for a method. The ``reference`` argument shouldn't be allowed for ``datetime64`` values (absolute times, no ambiguities) but it should be mandatory for ``timedelta64`` ones. Sorry, but I can't see the use of having a default reference, unless one wanted to work with Epoch-based deltas, which looks like an extremely particular case. Could you please show me a use case for having a reference defaulting to the POSIX epoch? Cheers, :: Ivan Vilata i Balaguer @ Intellectual Monopoly hinders Innovation! @ http://www.selidor.net/ @ http://www.nosoftwarepatents.com/ @
A Wednesday 30 July 2008, Ivan Vilata i Balaguer escrigué:
Pierre GM (el 2008-07-29 a les 15:47:52 -0400) va dir::
On Tuesday 29 July 2008 15:14:13 Ivan Vilata i Balaguer wrote:
Pierre GM (el 2008-07-29 a les 12:38:19 -0400) va dir::
Relative time versus relative time ----------------------------------
This case would be the same than the previous one (absolute vs absolute). Our proposal is to forbid this operation if the time units of the operands are different.
Mmh, less sure on this one. Can't we use a hierarchy of time units, and force to the lowest ?
For example:
>numpy.ones(3, dtype="t8[Y]") + 3*numpy.ones(3, > dtype="t8[M]") array([15,15,15], dtype="t8['M']")
I agree that adding ns to years makes no sense, but ns to s ? min to hr or days ? In short: systematically raising an exception looks a bit too drastic. There are some simple unambiguous cases that sould be allowed (Y+M, Y+Q, M+Q, H+D...)
Do you mean using the most precise unit for operations with "near enough", different units? I see the point, but what makes me doubt about it is giving the user the false impression that the most precise unit is *always* expected. I'd rather spare the user as many surprises as possible, by simplifying rules in favour of explicitness (but that may be debated).
Let me rephrase: Adding different relative time units should be allowed when there's no ambiguity on the output: For example, a relative year timedelta is always 12 month timedeltas, or 4
quarter timedeltas. In that case, I should be able to do:
numpy.ones(3, dtype="t8[Y]") + 3*numpy.ones(3, dtype="t8[M]")
array([15,15,15], dtype="t8['M']")
numpy.ones(3, dtype="t8[Y]") + 3*numpy.ones(3, dtype="t8[Q]")
array([7,7,7], dtype="t8['Q']")
Similarly: * an hour is always 3600s, so I could add relative s/ms/us/ns timedeltas to hour timedeltas, and get the result in s/ms/us/ns. * A day is always 24h, so I could add relative hours and days timedeltas and get an hour timedelta * A week is always 7d, so W+D -> D
However: * We can't tell beforehand how much days are in any month, so adding relative days and months would raise an exception. * Same thing with weeks and months/quarters/years
There'll be only a limited number of time units, therefore a limited number of potential combinations between time units. It'd be just a matter of listing which ones are allowed and which ones will raise an exception.
That's "keep the precision" over "keep the range". At first Francesc and I opted for "keep the range" because that's what NumPy does, e.g. when operating an int64 with an uint64. Then, since we weren't sure about what the best choice would be for the majority of users, we decided upon letting (or forcing) the user to be explicit. However, the use of time units and integer values is precisely intended to "keep the precision", and overflow won't be so frequent given the correct time unit and the span of uint64, so you may be right in the end. :)
Well, I do think that the "keep the precision" rule can be a quite sensible approach for this case, so I am in favor to it. Also, the Pierre suggestion of allowing automatic castings for all the time units except when the 'Y'ear and 'M'onth are involved makes a lot of sense too. I'll adopt these for the third version of the proposal then.
Note: we refused to use the ``.astype()`` method because of the additional 'time_reference' parameter that will sound strange for other typical uses of ``.astype()``.
A method would be really, really helpful, though... [...]
Yay, but what doesn't seem to fit for me is that the method would only have sense to time values.
Well, what about a .tounit(new_unit, reference=None) ? By default, the reference would be None and default to the POSIX epoch. We could also go for .totunit (for to time unit)
Yes, that'd be the signature for a method. The ``reference`` argument shouldn't be allowed for ``datetime64`` values (absolute times, no ambiguities) but it should be mandatory for ``timedelta64`` ones. Sorry, but I can't see the use of having a default reference, unless one wanted to work with Epoch-based deltas, which looks like an extremely particular case. Could you please show me a use case for having a reference defaulting to the POSIX epoch?
Yeah, I agree with Ivan in that a default reference time makes little sense for general relative times. IMO, and provided that we will be allowing an implicit casting for most of time units for relative vs relative and in absolute vs relative, the use of forced casting will not be as frequent, and that a function would be enough. Having said that, I still see the merit of method for some situations, so I'll mention that in the third proposal as a possible improvement. -- Francesc Alted
On Wednesday 30 July 2008 06:35:32 Francesc Alted wrote:
A Wednesday 30 July 2008, Ivan Vilata i Balaguer escrigué:
Pierre GM (el 2008-07-29 a les 15:47:52 -0400) va dir::
On Tuesday 29 July 2008 15:14:13 Ivan Vilata i Balaguer wrote:
Pierre GM (el 2008-07-29 a les 12:38:19 -0400) va dir::
[Pierre]
Well, what about a .tounit(new_unit, reference=None) ? By default, the reference would be None and default to the POSIX epoch. We could also go for .totunit (for to time unit)
[Ivan]
Yes, that'd be the signature for a method. The ``reference`` argument shouldn't be allowed for ``datetime64`` values (absolute times, no ambiguities) but it should be mandatory for ``timedelta64`` ones. Sorry, but I can't see the use of having a default reference, unless one wanted to work with Epoch-based deltas, which looks like an extremely particular case. Could you please show me a use case for having a reference defaulting to the POSIX epoch?
[Francesc]
Yeah, I agree with Ivan in that a default reference time makes little sense for general relative times. IMO, and provided that we will be allowing an implicit casting for most of time units for relative vs relative and in absolute vs relative, the use of forced casting will not be as frequent, and that a function would be enough. Having said that, I still see the merit of method for some situations, so I'll mention that in the third proposal as a possible improvement.
In my mind, .tounit(*args) should be available for both relative (timedeltas) and absolute (datetime) times. I agree that for relative times, a default reference is meaningless. However, for absolute times, there's only one possible reference, the POSIX epoch, right ? Now, what format do you consider for this reference ? Moreover, could you give some more examples of interaction between datetime and timedelta ?
A Wednesday 30 July 2008, Pierre GM escrigué:
On Wednesday 30 July 2008 06:35:32 Francesc Alted wrote:
A Wednesday 30 July 2008, Ivan Vilata i Balaguer escrigué:
Pierre GM (el 2008-07-29 a les 15:47:52 -0400) va dir::
On Tuesday 29 July 2008 15:14:13 Ivan Vilata i Balaguer wrote:
Pierre GM (el 2008-07-29 a les 12:38:19 -0400) va dir::
[Pierre]
Well, what about a .tounit(new_unit, reference=None) ? By default, the reference would be None and default to the POSIX epoch. We could also go for .totunit (for to time unit)
[Ivan]
Yes, that'd be the signature for a method. The ``reference`` argument shouldn't be allowed for ``datetime64`` values (absolute times, no ambiguities) but it should be mandatory for ``timedelta64`` ones. Sorry, but I can't see the use of having a default reference, unless one wanted to work with Epoch-based deltas, which looks like an extremely particular case. Could you please show me a use case for having a reference defaulting to the POSIX epoch?
[Francesc]
Yeah, I agree with Ivan in that a default reference time makes little sense for general relative times. IMO, and provided that we will be allowing an implicit casting for most of time units for relative vs relative and in absolute vs relative, the use of forced casting will not be as frequent, and that a function would be enough. Having said that, I still see the merit of method for some situations, so I'll mention that in the third proposal as a possible improvement.
In my mind, .tounit(*args) should be available for both relative (timedeltas) and absolute (datetime) times.
Well, what we are proposing is that the conversion time unit method for absolute times would be '.astype()' because its semantics is respected in this case. The problem is with relative times, and only with conversions between years or months and the rest of time units. This is why I propose the adoption of just a humble function for this cases. Introducing a method (.tounit()) for the ndarray object that only is useful for the date/time types seems a bit too much to my eyes (but I can be wrong, indeed).
I agree that for relative times, a default reference is meaningless. However, for absolute times, there's only one possible reference, the POSIX epoch, right ?
That's correct.
Now, what format do you consider for this reference ?
Whatever that can be converted into a datetime64 scalar. Some examples: ref = '2001-04-01' ref = datetime.datetime(2001, 4, 1)
Moreover, could you give some more examples of interaction between datetime and timedelta ?
In the second proposal there are some examples of this interaction and I'm populating the third proposal with more examples yet. Just wait a bit (maybe a couple of hours) to see the new proposal. Cheers, -- Francesc Alted
On Wed, Jul 30, 2008 at 12:35 PM, Francesc Alted <faltet@pytables.org>wrote:
A Wednesday 30 July 2008, Pierre GM escrigué:
In my mind, .tounit(*args) should be available for both relative (timedeltas) and absolute (datetime) times.
Well, what we are proposing is that the conversion time unit method for absolute times would be '.astype()' because its semantics is respected in this case.
OK
The problem is with relative times, and only with conversions between years or months and the rest of time units. This is why I propose the adoption of just a humble function for this cases.
OK
Introducing a method (.tounit()) for the ndarray object that only is useful for the date/time types seems a bit too much to my eyes (but I can be wrong, indeed).
Ohoh, I see... I was still thinking in terms of subclassing ndarray with a timedelta class, where such a method would have made sense. In fact, you're talking about the dtype. Well, of course, in that case, that makes sense not to have an extra method. We can always implement it in Date/DateArray.
Now, what format do you consider for this reference ?
Whatever that can be converted into a datetime64 scalar. Some examples:
ref = '2001-04-01' ref = datetime.datetime(2001, 4, 1)
Er, should I see ref as having a 'day' unit or 'business day' unit in that case? I know that 'business days' spoil the game, but Matt really needs them, so...
Moreover, could you give some more examples of interaction between datetime and timedelta ?
In the second proposal there are some examples of this interaction and I'm populating the third proposal with more examples yet. Just wait a bit (maybe a couple of hours) to see the new proposal.
OK, with pleasure. It's just that I have trouble understanding the meaning of something like t2 = numpy.ones(5, dtype="datetime64[s]") That's five times one second after the epoch, right ? But in what circumstances would you need t2 ?
A Wednesday 30 July 2008, Pierre GM escrigué:
Now, what format do you consider for this reference ?
Whatever that can be converted into a datetime64 scalar. Some examples:
ref = '2001-04-01' ref = datetime.datetime(2001, 4, 1)
Er, should I see ref as having a 'day' unit or 'business day' unit in that case? I know that 'business days' spoil the game, but Matt really needs them, so...
OK. I was wrong. Of course you need to specify the resolution, so the reference *should* be a NumPy scalar: ref = numpy.datetime64('2001-04-01', unit="B") # 'B'usiness days
Moreover, could you give some more examples of interaction between datetime and timedelta ?
In the second proposal there are some examples of this interaction and I'm populating the third proposal with more examples yet. Just wait a bit (maybe a couple of hours) to see the new proposal.
OK, with pleasure. It's just that I have trouble understanding the meaning of something like t2 = numpy.ones(5, dtype="datetime64[s]")
That's five times one second after the epoch, right ? But in what circumstances would you need t2 ?
I'm not sure I follow you. This is just an example so as to produce an array of time objects quickly. In general, you should also be able to produce the same result by doing: t2 = numpy.array(['1970-01-01T00:00:05', '1970-01-01T00:00:05', '1970-01-01T00:00:05', '1970-01-01T00:00:05', '1970-01-01T00:00:05', dtype="datetime64[s]") which is more visual, but has the drawback that it's just too long for documenting purposes. When you don't need the values for some examples, conciseness is a virtue. -- Francesc Alted
On Wednesday 30 July 2008 13:16:25 Francesc Alted wrote:
A Wednesday 30 July 2008, Pierre GM escrigué:
It's just that I have trouble understanding the meaning of something like t2 = numpy.ones(5, dtype="datetime64[s]")
That's five times one second after the epoch, right ? But in what circumstances would you need t2 ?
I'm not sure I follow you. This is just an example so as to produce an array of time objects quickly. ... When you don't need the values for some examples, conciseness is a virtue.
I'd prefer something like np.range(5, dtype=datetime64['s']), which is both concise and still has a physical meaning I can wrap my mind around. Which brings me to another question: datetime64 and timedelta64 are just dtypes, therefore they don't impose any restriction (in terms of uniqueness of elements, ordering of the elements...) on the underlying ndarray, right ?
When people are refering to busienss days are you talking about weekdays or are you saying weekday non-holidays? On 7/30/08, Francesc Alted <faltet@pytables.org> wrote:
A Wednesday 30 July 2008, Pierre GM escrigué:
Now, what format do you consider for this reference ?
Whatever that can be converted into a datetime64 scalar. Some examples:
ref = '2001-04-01' ref = datetime.datetime(2001, 4, 1)
Er, should I see ref as having a 'day' unit or 'business day' unit in that case? I know that 'business days' spoil the game, but Matt really needs them, so...
OK. I was wrong. Of course you need to specify the resolution, so the reference *should* be a NumPy scalar:
ref = numpy.datetime64('2001-04-01', unit="B") # 'B'usiness days
Moreover, could you give some more examples of interaction between datetime and timedelta ?
In the second proposal there are some examples of this interaction and I'm populating the third proposal with more examples yet. Just wait a bit (maybe a couple of hours) to see the new proposal.
OK, with pleasure. It's just that I have trouble understanding the meaning of something like t2 = numpy.ones(5, dtype="datetime64[s]")
That's five times one second after the epoch, right ? But in what circumstances would you need t2 ?
I'm not sure I follow you. This is just an example so as to produce an array of time objects quickly. In general, you should also be able to produce the same result by doing:
t2 = numpy.array(['1970-01-01T00:00:05', '1970-01-01T00:00:05', '1970-01-01T00:00:05', '1970-01-01T00:00:05', '1970-01-01T00:00:05', dtype="datetime64[s]")
which is more visual, but has the drawback that it's just too long for documenting purposes. When you don't need the values for some examples, conciseness is a virtue.
-- Francesc Alted _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
A Wednesday 30 July 2008, Pierre GM escrigué:
Which brings me to another question: datetime64 and timedelta64 are just dtypes, therefore they don't impose any restriction (in terms of uniqueness of elements, ordering of the elements...) on the underlying ndarray, right ?
That's right. Perhaps this is the reason why you got mystified about the numpy.ones(5, dtype="datetime64[s]") thing. -- Francesc Alted
A Wednesday 30 July 2008, Tom Denniston escrigué:
When people are refering to busienss days are you talking about weekdays or are you saying weekday non-holidays?
Plain weekdays. Taking in account holidays for all the world round would be certainly much more complex than timezones, which neither are being considered in this proposal. -- Francesc Alted
If it's really just weekdays why not call it that instead of using a term like business days that (quite confusingly) suggests holidays are handled properly? Also, I view the timezone and holiday issues as totally seperate. I would definately NOT recommend basing holidays on a timezone because holidays are totally unrelated to timezones. Usually when you deal with holidays, because they vary by application and country and change over time, you provide a calendar as an outside input. That would be very useful if that were allowed, but might make the implementation rather complex. --Tom On 7/30/08, Francesc Alted <faltet@pytables.org> wrote:
A Wednesday 30 July 2008, Tom Denniston escrigué:
When people are refering to busienss days are you talking about weekdays or are you saying weekday non-holidays?
Plain weekdays. Taking in account holidays for all the world round would be certainly much more complex than timezones, which neither are being considered in this proposal.
-- Francesc Alted _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
A Wednesday 30 July 2008, Tom Denniston escrigué:
If it's really just weekdays why not call it that instead of using a term like business days that (quite confusingly) suggests holidays are handled properly?
Well, we were adopting the name from the TimeSeries package. Perhaps the authors can answer this better than me.
Also, I view the timezone and holiday issues as totally seperate. I would definately NOT recommend basing holidays on a timezone because holidays are totally unrelated to timezones. Usually when you deal with holidays, because they vary by application and country and change over time, you provide a calendar as an outside input. That would be very useful if that were allowed, but might make the implementation rather complex.
Yeah, I agree in that timezone and holiday issues as totally separate issues. I only wanted to stress out that the implementation of these things is *complex*, and that this was the reason to not consider them. -- Francesc Alted
Tom Denniston (el 2008-07-30 a les 13:12:45 -0500) va dir::
If it's really just weekdays why not call it that instead of using a term like business days that (quite confusingly) suggests holidays are handled properly?
Yes, that may be a better term. I guess we didn't choose that because we aren't native English speakers, and because TimeSeries was already using the other term.
Also, I view the timezone and holiday issues as totally seperate. I would definately NOT recommend basing holidays on a timezone because holidays are totally unrelated to timezones. Usually when you deal with holidays, because they vary by application and country and change over time, you provide a calendar as an outside input. That would be very useful if that were allowed, but might make the implementation rather complex.
I think that what Francesc was trying to say is that taking holidays into account would be way too difficult to be implemented. Timezones were just an example of another (unrelated) feature we left out due to its complexity.
On 7/30/08, Francesc Alted <faltet@pytables.org> wrote:
A Wednesday 30 July 2008, Tom Denniston escrigué:
When people are refering to busienss days are you talking about weekdays or are you saying weekday non-holidays?
Plain weekdays. Taking in account holidays for all the world round would be certainly much more complex than timezones, which neither are being considered in this proposal.
:: Ivan Vilata i Balaguer @ Intellectual Monopoly hinders Innovation! @ http://www.selidor.net/ @ http://www.nosoftwarepatents.com/ @
Yes this all makes a lot of sense. I would propose changing the name from business days to weekdays though. Does anyone object wih that? On 7/30/08, Ivan Vilata i Balaguer <ivan@selidor.net> wrote:
Tom Denniston (el 2008-07-30 a les 13:12:45 -0500) va dir::
If it's really just weekdays why not call it that instead of using a term like business days that (quite confusingly) suggests holidays are handled properly?
Yes, that may be a better term. I guess we didn't choose that because we aren't native English speakers, and because TimeSeries was already using the other term.
Also, I view the timezone and holiday issues as totally seperate. I would definately NOT recommend basing holidays on a timezone because holidays are totally unrelated to timezones. Usually when you deal with holidays, because they vary by application and country and change over time, you provide a calendar as an outside input. That would be very useful if that were allowed, but might make the implementation rather complex.
I think that what Francesc was trying to say is that taking holidays into account would be way too difficult to be implemented. Timezones were just an example of another (unrelated) feature we left out due to its complexity.
On 7/30/08, Francesc Alted <faltet@pytables.org> wrote:
A Wednesday 30 July 2008, Tom Denniston escrigué:
When people are refering to busienss days are you talking about weekdays or are you saying weekday non-holidays?
Plain weekdays. Taking in account holidays for all the world round would be certainly much more complex than timezones, which neither are being considered in this proposal.
::
Ivan Vilata i Balaguer @ Intellectual Monopoly hinders Innovation! @ http://www.selidor.net/ @ http://www.nosoftwarepatents.com/ @
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux)
iQCVAwUBSJC1HYB9xakSvZSBAQIqGAP9FES0aN1ioJWL9NsBggBCmdxdA0d973nr 0dP00xdaq9CeGfNa78NOxzphxsL3kiKR4t6eDE1y3DwyhFV9+X9B+w/pFOcZAuRX fIlkRHOiQn0SODf287LwAsSab2dKgL+HpiJjAa45QFMDNUUmz7KCa2HKrBZSdL5y rMfxlbKwAwA= =Vt/d -----END PGP SIGNATURE-----
_______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
If it's really just weekdays why not call it that instead of using a term like business days that (quite confusingly) suggests holidays are handled properly?
Well, we were adopting the name from the TimeSeries package. Perhaps the authors can answer this better than me.
A lot of the inspiration for the original prototype of the timeseries module came from FAME (http://www.sungard.com/Fame/). The proprietary FAME 4GL language does a lot of things well when it comes to time series analysis, but is (not surprisingly) very lacking as a general purpose programming language. Python was the glue language I was using at work, and naturally I wanted to do a lot of the stuff I could do in FAME using Python instead. Most of the frequencies in the timeseries package are named the same as their FAME counterparts. I'm not especially attached to the name "business" instead of "weekday" for the frequency, it is just what I was used to from FAME so I went with it. I won't lose any sleep if you decide to call it "weekday" instead. While on the topic of FAME... being a financial analyst, I really am quite fond of the multitude of quarterly frequencies we have in the timeseries package (with different year end points) because they are very useful when doing things like "calenderizing" earnings from companies with different fiscal year ends. These frequencies are included in FAME, which makes sense since it targets financial users. I know Pierre likes them too for working with different seasons. I think it would be ok to leave them out of an initial implementation, but it might be worth keeping in mind during the design phase about how the dtype could be extended to incorporate such things.
As forbidding operations among absolute/absolute and relative/relative types can be unacceptable in many situations, we are proposing an explicit casting mechanism so that the user can inform about the desired time unit of the outcome. For this, a new NumPy function, called, say, ``numpy.change_unit()`` (this name is for the purposes of the discussion and can be changed) will be provided. The signature for the function will be:
change_unit(time_object, new_unit, reference)
where 'time_object' is the time object whose unit is to be changed, 'new_unit' is the desired new time unit, and 'reference' is an absolute date that will be used to allow the conversion of relative times in case of using time units with an uncertain number of smaller time units (relative years or months cannot be expressed in days). For example, that would allow to do:
numpy.change_unit( numpy.array([1,2], 'T[Y]'), 'T[d]' ) array([365, 731], dtype="datetime64[d]")
If I understand you correctly, this is very close to the "asfreq" method of the Date/DateArray/TimeSeries classes in the timeseries module. One key element missing here (from my point of view anyway) is an equivalent of the 'relation' parameter in the asfreq method in the timeseries module. This is only used when converting from a lower frequency to a higher frequency (eg. annual to daily). For example...
a = ts.Date(freq='Annual', year=2007) a.asfreq('Daily', 'START') <D : 01-Jan-2007> a.asfreq('Daily', 'END') <D : 31-Dec-2007>
This is another one of those things that I use all the time. Now whether it belongs in the core dtype, or some extension module I'm not sure... but it's an important feature in the timeseries module.
A Thursday 31 July 2008, Matt Knox escrigué:
While on the topic of FAME... being a financial analyst, I really am quite fond of the multitude of quarterly frequencies we have in the timeseries package (with different year end points) because they are very useful when doing things like "calenderizing" earnings from companies with different fiscal year ends. These frequencies are included in FAME, which makes sense since it targets financial users. I know Pierre likes them too for working with different seasons. I think it would be ok to leave them out of an initial implementation, but it might be worth keeping in mind during the design phase about how the dtype could be extended to incorporate such things.
Well, introducing a quarter should not be difficult. We just wanted to keep the set of supported time units under a minimum (the list is already quite large). We thought that the quarter fits better as a 'derived' time unit, similarly as biweekly, semester or biyearly (to name just a few examples). However, if quarters are found to be much more important than other derived time units, they can go into the proposal too.
As forbidding operations among absolute/absolute and relative/relative types can be unacceptable in many situations, we are proposing an explicit casting mechanism so that the user can inform about the desired time unit of the outcome. For this, a new NumPy function, called, say, ``numpy.change_unit()`` (this name is for the purposes of the discussion and can be changed) will be provided. The signature for the function will be:
change_unit(time_object, new_unit, reference)
where 'time_object' is the time object whose unit is to be changed, 'new_unit' is the desired new time unit, and 'reference' is an absolute date that will be used to allow the conversion of relative times in case of using time units with an uncertain number of smaller time units (relative years or months cannot be expressed in days). For
example, that would allow to do:
numpy.change_unit( numpy.array([1,2], 'T[Y]'), 'T[d]' )
array([365, 731], dtype="datetime64[d]")
If I understand you correctly, this is very close to the "asfreq" method of the Date/DateArray/TimeSeries classes in the timeseries module. One key element missing here (from my point of view anyway) is an equivalent of the 'relation' parameter in the asfreq method in the timeseries module. This is only used when converting from a lower frequency to a higher frequency (eg. annual to daily). For example...
a = ts.Date(freq='Annual', year=2007) a.asfreq('Daily', 'START')
<D : 01-Jan-2007>
a.asfreq('Daily', 'END')
<D : 31-Dec-2007>
This is another one of those things that I use all the time. Now whether it belongs in the core dtype, or some extension module I'm not sure... but it's an important feature in the timeseries module.
I agree that such a 'relation' parameter in the proposed 'change_timeunit' could be handy in many situations. It should be applicable only to absolute times though. With this, the signature for the function would be: change_timeunit(time_object, new_unit, relation, reference) where 'relation' only can be used with absolute times and 'reference' only with relative times. Who knows, perhaps in the future one can find a way to implement such a 'change_timeunit' function as methods without disturbing too much the method schema of the ndarray objects. Cheers, -- Francesc Alted
A Thursday 31 July 2008, Matt Knox escrigué:
While on the topic of FAME... being a financial analyst, I really am quite fond of the multitude of quarterly frequencies we have in the timeseries package (with different year end points) because they are very useful when doing things like "calenderizing" earnings from companies with different fiscal year ends.
On Thu, 31 Jul 2008, Francesc Alted apparently wrote:
Well, introducing a quarter should not be difficult. We just wanted to keep the set of supported time units under a minimum (the list is already quite large). We thought that the quarter fits better as a 'derived' time unit, similarly as biweekly, semester or biyearly (to name just a few examples). However, if quarters are found to be much more important than other derived time units, they can go into the proposal too.
Quarterly frequency is probably the most analyzed frequency in macroeconometrics. Widely used macroeconometrics packages (e.g., EViews) traditionally support only three explicit frequencies: annual, quarterly, and monthly. Cheers, Alan Isaac
A Thursday 31 July 2008, Alan G Isaac escrigué:
A Thursday 31 July 2008, Matt Knox escrigué:
While on the topic of FAME... being a financial analyst, I really am quite fond of the multitude of quarterly frequencies we have in the timeseries package (with different year end points) because they are very useful when doing things like "calenderizing" earnings from companies with different fiscal year ends.
On Thu, 31 Jul 2008, Francesc Alted apparently wrote:
Well, introducing a quarter should not be difficult. We just wanted to keep the set of supported time units under a minimum (the list is already quite large). We thought that the quarter fits better as a 'derived' time unit, similarly as biweekly, semester or biyearly (to name just a few examples). However, if quarters are found to be much more important than other derived time units, they can go into the proposal too.
Quarterly frequency is probably the most analyzed frequency in macroeconometrics.
Widely used macroeconometrics packages (e.g., EViews) traditionally support only three explicit frequencies: annual, quarterly, and monthly.
I see. However, I forgot to mention that another reason for not including the quarters is that they normally need flexibility to be defined as starting in *any* month of the year. As we don't wanted to provide an ``origin`` metadata in the proposal (things got too complex already, as you can see in the third proposal that I sent to this list yesterday), then the usefulness of such an 'unflexible' quarters would be rather limited. So, in the end, I think it is best to avoid them for the dtype (and add support for them in the ``Date`` class). Cheers, -- Francesc Alted
participants (7)
-
Alan G Isaac
-
David Huard
-
Francesc Alted
-
Ivan Vilata i Balaguer
-
Matt Knox
-
Pierre GM
-
Tom Denniston