[Numpy-discussion] Numpy 1.9 release date
Dave Hirschfeld
dave.hirschfeld at gmail.com
Sun Nov 10 13:15:09 EST 2013
Ralf Gommers <ralf.gommers <at> gmail.com> writes:
>
> On Fri, Nov 8, 2013 at 8:22 PM, Charles R Harris <charlesr.harris <at>
gmail.com> wrote:
>
>
> and think that the main thing missing at this point is fixing the datetime
problems.
>
>
> Is anyone planning to work on this? If yes, you need a rough estimate of
when this is ready to go. If no, it needs to be decided if this is critical
for the release. From the previous discussion I tend to think so. If it's
critical but no one does it, why plan a release.......
>
>
> Ralf
>
Just want to pipe up here as to the criticality of datetime bug.
Below is a minimal example from some data analysis code I found in our
company that was giving incorrect results (fortunately it was caught by
thorough testing):
In [110]: records = [
...: ('2014-03-29 23:00:00', '2014-03-29 23:00:00'),
...: ('2014-03-30 00:00:00', '2014-03-30 00:00:00'),
...: ('2014-03-30 01:00:00', '2014-03-30 01:00:00'),
...: ('2014-03-30 02:00:00', '2014-03-30 02:00:00'),
...: ('2014-03-30 03:00:00', '2014-03-30 03:00:00'),
...: ('2014-10-25 23:00:00', '2014-10-25 23:00:00'),
...: ('2014-10-26 00:00:00', '2014-10-26 00:00:00'),
...: ('2014-10-26 01:00:00', '2014-10-26 01:00:00'),
...: ('2014-10-26 02:00:00', '2014-10-26 02:00:00'),
...: ('2014-10-26 03:00:00', '2014-10-26 03:00:00')]
...:
...:
...: data = np.asarray(records, dtype=[('date obj', 'M8[h]'), ('str
repr', object)])
...: df = pd.DataFrame(data)
In [111]: df
Out[111]:
date obj str repr
0 2014-03-29 23:00:00 2014-03-29 23:00:00
1 2014-03-30 00:00:00 2014-03-30 00:00:00
2 2014-03-30 00:00:00 2014-03-30 01:00:00
3 2014-03-30 01:00:00 2014-03-30 02:00:00
4 2014-03-30 02:00:00 2014-03-30 03:00:00
5 2014-10-25 22:00:00 2014-10-25 23:00:00
6 2014-10-25 23:00:00 2014-10-26 00:00:00
7 2014-10-26 01:00:00 2014-10-26 01:00:00
8 2014-10-26 02:00:00 2014-10-26 02:00:00
9 2014-10-26 03:00:00 2014-10-26 03:00:00
Note the local timezone adjusted `date obj` including the duplicate value at
the clock-change in March and the missing value at the clock-change in
October. As you can imagine this could very easily lead to incorrect
analysis.
If running this exact same code in the (Eastern) US you'd see the following
results:
date obj str repr
0 2014-03-30 03:00:00 2014-03-29 23:00:00
1 2014-03-30 04:00:00 2014-03-30 00:00:00
2 2014-03-30 05:00:00 2014-03-30 01:00:00
3 2014-03-30 06:00:00 2014-03-30 02:00:00
4 2014-03-30 07:00:00 2014-03-30 03:00:00
5 2014-10-26 03:00:00 2014-10-25 23:00:00
6 2014-10-26 04:00:00 2014-10-26 00:00:00
7 2014-10-26 05:00:00 2014-10-26 01:00:00
8 2014-10-26 06:00:00 2014-10-26 02:00:00
9 2014-10-26 07:00:00 2014-10-26 03:00:00
Unfortunately I don't have the skills to meaningfully contribute in this
area but it is a very real problem for users of numpy, many of whom are not
active on the mailing list.
HTH,
Dave
More information about the NumPy-Discussion
mailing list