datetime64/timedelta64 support in linspace
import numpy as np np.linspace(np.timedelta64(0, "s"), np.timedelta64(1, "s"), 4,
np.linspace(np.timedelta64(0, "ms"), np.timedelta64(1, "s"), 4,
I propose adding support for datetime64/timedelta64 in linspace and solicit feedback on the feature. As is, linspace raises UFuncTypeError when parameters start and stop are datetime64/timedelta64. The complementary function arange supports these types. Work was started on this feature in PR 14700 <https://github.com/numpy/numpy/pull/14700> but has stalled and I would like to complete it, but there are some issues worth getting feedback on. 1. Supporting datetime64/timedelta64 will require a special case code path within linspace. The code path is selected based on the start parameter data type. 2. The output dtype has to be explicitly set. 3. The step size resolution is determined by the lesser resolution of start and dtype. Issue 3 may lead to an unexpected result for an end-user. For example, dtype="timedelta64[ms]") array([ 0, 0, 0, 1000], dtype='timedelta64[ms]') The existing solution in PR 14700 does not override the end-user's start and dtype resolution. In this case, the end-user would have to set both start and dtype to "ms" resolution to get the expected result. dtype="timedelta64[ms]") array([ 0, 333, 666, 1000], dtype='timedelta64[ms]') In PR 14700, there is some discussion of "NaT" handling. In my implementation, "NaT" works the same as "NaN" and I am not aware of any corner cases.
On Sat, 2020-09-26 at 09:52 -0500, Lee Johnston wrote:
I propose adding support for datetime64/timedelta64 in linspace and solicit feedback on the feature. As is, linspace raises UFuncTypeError when parameters start and stop are datetime64/timedelta64. The complementary function arange supports these types. Work was started on this feature in PR 14700 <https://github.com/numpy/numpy/pull/14700> but has stalled and I would like to complete it, but there are some issues worth getting feedback on.
1. Supporting datetime64/timedelta64 will require a special case code path within linspace. The code path is selected based on the start parameter data type. 2. The output dtype has to be explicitly set. 3. The step size resolution is determined by the lesser resolution of start and dtype.
Issue 3 may lead to an unexpected result for an end-user. For example,
import numpy as np np.linspace(np.timedelta64(0, "s"), np.timedelta64(1, "s"), 4, dtype="timedelta64[ms]") array([ 0, 0, 0, 1000], dtype='timedelta64[ms]')
The existing solution in PR 14700 does not override the end-user's start and dtype resolution. In this case, the end-user would have to set both start and dtype to "ms" resolution to get the expected result.
np.linspace(np.timedelta64(0, "ms"), np.timedelta64(1, "s"), 4, dtype="timedelta64[ms]") array([ 0, 333, 666, 1000], dtype='timedelta64[ms]')
Thanks for taking the time and looking into this! Can you explain why your solution of using the input units to represent the step size is better then using the provided one? If this turns out tricky, we could also make the rule: cast everything to a single unit (as long as the cast is considered "safe"), that may force the user to do the cast in the long run, but I maybe most users are not dealing with a mix of units here to begin with? The approach in the last state of the PR, had issues with the timedelta/datetime equivalent of: >>> np.diff(np.linspace(0, 1000, 33, dtype='int64')) array([31, 31, 31, 32, 31, 31, 31, 32, 31, 31, 31, 32, 31, 31, 31, 32, 31, 31, 31, 32, 31, 31, 31, 32, 31, 31, 31, 32, 31, 31, 31, 32]) which has an uneven step size that was not spread out (note the 32 values). I assume you have a solution for that? Maybe it is best if you can just pick up the PR and create a new one (if possible pull in the existing commits, or tests for attribution as well), so we can discuss easier reading the tests.
In PR 14700, there is some discussion of "NaT" handling. In my implementation, "NaT" works the same as "NaN" and I am not aware of any corner cases.
There may not be, I think this had to do with how we approached certain difficulties in the PR (around viewing as int64 or using floats, probably). We just should make sure to have tests for both start and end being NaT. Maybe NaT is not a big issue, because we can probably add an explicit code path if necessary. Cheers, Sebastian
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
participants (2)
-
Lee Johnston
-
Sebastian Berg