[SciPy-user] calculations using the datetime information of timeseries

Pierre GM pgmdevlist at gmail.com
Wed Nov 12 20:35:57 EST 2008


Timmie,

Let's go through method #1 first:

> snew = series_dummy
>
> ###method 1
>
> for i in range(0,snew.size):
>     snew[i] = snew[i]* 2 #snew.dates[i].datetime


Your `snew` object is only a reference to `series_dummy`. When you  
modify an element of snew, you're in fact modifying the corresponding  
element of `series_dummy`.  That's a feature of Python, you would get  
the same result with lists:
 >>> a = [0,0,0]
 >>> b = a
 >>> b[0] = 1
 >>> a
[1,0,0]

If you want to avoid that, you can make snew a copy of series_dummy
snew = series_dummy.copy()

Now, method #2:
>
> for i in range(0,snew.size):
>     snew = snew*2

Are you sure that's what you want to do ? you could do
snew = snew*(2**snew.size)
and get the same result.
Anyway: here, you change what snew is at each iteration: initially, it  
was a reference to series_dummy, now, it's a reference to another  
(temporary) object, snew*2. No back propagation of results.

Finally, some comments for method #3:
You want to create a new timeseries based on the result of some  
calculation on the data part, but still using the dates of the initial  
series ?
If you don't have any missing values, perform the computation on  
series._data, that'll be faster. If you have mssing values, use the  
series._series instead to access directly the MaskedArray methods, and  
not the timeseries ones (you don't want to carry the dates around if  
you don't need them).

As a wrap-up:
Try to avoid looping if you can. You said a generic form of your  
function is:
>
> def myfunction(datetime_obj, scaling_factor):
>    pass

Do you really need datetime objects ? In your example, you were using  
series.dates[i].datetime.hour, a list. You should have used  
series.dates.hour, which is an array. Using functions on an array as a  
whole is far more efficient than using the same functions on each  
element of the array.


Let me know how it goes, and don't hesitate to contact me off-list if  
you need some help with your function.

Cheers
P.


>
> I found out that I can get the datetime for each entry with
>
> for i in range(0, series.size):
> 	series[i] =  myfunction(series.dates.tolist()[i], 10.)
>
> Now, I noticed a strange thing.
>
> If I have a base series "base_series" and assige it to a new one with
>
> new_series = base_series
>
> The base_series gets updated/changed according to all calculations I
> perform on new_series (Please see method 1 below).
>
> The only way I could imagine to make my code work is createding lots  
> of
> template series like in method 3 below. This way lets me calculate my
> new values in new_series using the datetime information and still
> retrain base_series with its original values.
>
> I kindly ask you to shed some light why the base_series get changed  
> when
> I change derived series.
>
> Is there a more efficient way to acomplish my task that I may haven't
> thought of so far?
>
> Thanks in advance!
> Kind regards,
> Timmie
>
>
>
> #### BELOW A SAMPLE SCRIPT THAT MAY ILLUSTRATE ####
>
> #!/usr/bin/env python
> # -*- coding: utf-8 -*-
>
> import datetime
> import scikits.timeseries as ts
>
> import numpy as np
>
> #create dummy series
> data = np.zeros(600)+1
> now = datetime.datetime.now()
> start = datetime.datetime(now.year, now.month, now.day)
> #print start
> start_date = ts.Date('H', datetime=start)
> #print start_date
> series_dummy = ts.time_series(data, dtype=np.float_, freq='H',
> start_date=start_date)
>
> snew = series_dummy
>
> ###method 1
>
> for i in range(0,snew.size):
>     snew[i] = snew[i]* 2 #snew.dates[i].datetime
>
> print "method 1:", snew.sum()-series_dummy.sum()
>
> ###method 2
>
> for i in range(0,snew.size):
>     snew = snew*2
>
> print "method 2:", snew.sum()-series_dummy.sum()
>
> #method 3:
>
> data = np.zeros(series_dummy.size)+1
> dt_arr = series_dummy.dates
> cser = ts.time_series(data.astype(np.float_), dt_arr)
> for i in range(0,cser.size):
> #        note: cser.dates[i].datetime.hour is just used as an example
> #        my function performes calculations based on the value of the
> datetime of each data point for each data point (current datetime is  
> the
> input parameter).
>
> cser[i] = cser.dates[i].datetime.hour
>
> print "method 3:", cser.sum()-series_dummy.sum()
>
> _______________________________________________
> SciPy-user mailing list
> SciPy-user at scipy.org
> http://projects.scipy.org/mailman/listinfo/scipy-user




More information about the SciPy-User mailing list