[Tutor] using datetime and calculating hourly average

Tue Jul 7 19:20:12 CEST 2009

2009/7/7 John [H2O] <washakie at gmail.com>:
>
> The data is just x,y data where x = datetime objects from the datetime
> module. y are just floats. It is bundled in a numpy array.

I might be totally off but, did know that you can compare datetime objects?

>>> from datetime import datetime
>>> d1 = datetime.now()
>>> d2 = datetime.now()
>>> d1
datetime.datetime(2009, 7, 7, 18, 43, 15, 952000)
>>> d2
datetime.datetime(2009, 7, 7, 18, 43, 23, 252000)
>>> d1 < d2
True

You can also add and subtract creating a timedelta object.

>>> diff = d2 - d1
>>> diff
datetime.timedelta(0, 7, 300000)
>>> diff.seconds
7

> import datetime as dt
> import numpy as np
>
> I pass the array X, where X is a numpy array of shape [n,2] where n is the
> number of points in the data.
>
>>> It seems a bit slow, however... any thoughts on how to improve it?
>>>
>>> def calc_hravg(X):
>>>     """Calculates hourly average from input data"""
>>>
>>>     X_hr = []
>>>     minX = X[:,0].min()

Why do you need minX? You only use it to create hr which can be done
differently, see below.

>>>     hr = dt.datetime(*minX.timetuple()[0:4])

If i read his correctly you create a date timeobject from a datetime
object via a sliced timetuple? If you just want the date not the hours
and seconds use "X[:,0].date()" assuming X[:,0] is a datetime object.

You should create a timedelta here which makes more sense as it does not change.

           delta = dt.timedelta(hours=1) # Added by me
>>>     while hr <= dt.datetime(*X[-1,0].timetuple()[0:4]):

Again you could use X[-1,0].date().

>>>         nhr = hr + dt.timedelta(hours=1)

If you create the delta one level up the above changes to "nhr += delta".

>>>         ind = np.where( (X[:,0] > hr) & (X[:,0] < nhr) )
>>>         vals = X[ind,1][0].T
>>>         try:
>>>             #hr_avg = np.sum(vals) / len(vals)
>>>             hr_avg = np.average(vals)
>>>
>>>         except:
>>>             hr_avg = np.nan
>>>         X_hr.append([hr,hr_avg])
>>>         hr = hr + dt.timedelta(hours=1)
>>>
>>>     return np.array(X_hr)

How should X_hr look like, example?

To me it looks like it is overly complicated (but that might be me).

Greets
Sander