[Numpy-discussion] Numpy 1.9 release date

Dave Hirschfeld dave.hirschfeld at gmail.com
Sun Nov 10 13:15:09 EST 2013


Ralf Gommers <ralf.gommers <at> gmail.com> writes:

> 

> On Fri, Nov 8, 2013 at 8:22 PM, Charles R Harris <charlesr.harris <at> 
gmail.com> wrote:
> 
> 
> and think that the main thing missing at this point is fixing the datetime 
problems.
> 
> 
> Is anyone planning to work on this? If yes, you need a rough estimate of 
when this is ready to go. If no, it needs to be decided if this is critical 
for the release. From the previous discussion I tend to think so. If it's 
critical but no one does it, why plan a release....... 
> 
> 
> Ralf
> 

Just want to pipe up here as to the criticality of datetime bug.

Below is a minimal example from some data analysis code I found in our 
company that was giving incorrect results (fortunately it was caught by 
thorough testing):

In [110]: records = [
     ...:  ('2014-03-29 23:00:00', '2014-03-29 23:00:00'),
     ...:  ('2014-03-30 00:00:00', '2014-03-30 00:00:00'),
     ...:  ('2014-03-30 01:00:00', '2014-03-30 01:00:00'),
     ...:  ('2014-03-30 02:00:00', '2014-03-30 02:00:00'),
     ...:  ('2014-03-30 03:00:00', '2014-03-30 03:00:00'),
     ...:  ('2014-10-25 23:00:00', '2014-10-25 23:00:00'),
     ...:  ('2014-10-26 00:00:00', '2014-10-26 00:00:00'),
     ...:  ('2014-10-26 01:00:00', '2014-10-26 01:00:00'),
     ...:  ('2014-10-26 02:00:00', '2014-10-26 02:00:00'),
     ...:  ('2014-10-26 03:00:00', '2014-10-26 03:00:00')]
     ...: 
     ...: 
     ...: data = np.asarray(records, dtype=[('date obj', 'M8[h]'), ('str 
repr', object)])
     ...: df = pd.DataFrame(data)

In [111]: df
Out[111]: 
             date obj             str repr
0 2014-03-29 23:00:00  2014-03-29 23:00:00
1 2014-03-30 00:00:00  2014-03-30 00:00:00
2 2014-03-30 00:00:00  2014-03-30 01:00:00
3 2014-03-30 01:00:00  2014-03-30 02:00:00
4 2014-03-30 02:00:00  2014-03-30 03:00:00
5 2014-10-25 22:00:00  2014-10-25 23:00:00
6 2014-10-25 23:00:00  2014-10-26 00:00:00
7 2014-10-26 01:00:00  2014-10-26 01:00:00
8 2014-10-26 02:00:00  2014-10-26 02:00:00
9 2014-10-26 03:00:00  2014-10-26 03:00:00


Note the local timezone adjusted `date obj` including the duplicate value at 
the clock-change in March and the missing value at the clock-change in 
October. As you can imagine this could very easily lead to incorrect 
analysis.

If running this exact same code in the (Eastern) US you'd see the following 
results:
             date obj             str repr
0 2014-03-30 03:00:00  2014-03-29 23:00:00
1 2014-03-30 04:00:00  2014-03-30 00:00:00
2 2014-03-30 05:00:00  2014-03-30 01:00:00
3 2014-03-30 06:00:00  2014-03-30 02:00:00
4 2014-03-30 07:00:00  2014-03-30 03:00:00
5 2014-10-26 03:00:00  2014-10-25 23:00:00
6 2014-10-26 04:00:00  2014-10-26 00:00:00
7 2014-10-26 05:00:00  2014-10-26 01:00:00
8 2014-10-26 06:00:00  2014-10-26 02:00:00
9 2014-10-26 07:00:00  2014-10-26 03:00:00


Unfortunately I don't have the skills to meaningfully contribute in this 
area but it is a very real problem for users of numpy, many of whom are not 
active on the mailing list.

HTH,
Dave





More information about the NumPy-Discussion mailing list