[SciPy-user] sorting timeseries data.

Mon May 5 09:46:33 EDT 2008

Thanks Pierre, I'll have a look at the scikits.timeseries package when I have some time. Is it part of scipy/numpy or do I have to download it separately?

Another question with the duplicates. I have a dataset with multple datapoints on each day is there a simple way to take the maximum (or minimum,or mean) for each day and assign it to that day

ie if my data looks like

1986 10 01 16.3
1986 10 01 22.9
1986 10 01 13.2
1986 10 02 24.3
1986 10 02 22.1
1986 10 03 19.8
1986 10 03 20.1
1986 10 03 23.4
...

take the max of each day to get :

1986 10 01 22.9
1986 10 02 24.3
1986 10 03 23.4
...

thanks

- dharhas

>>> Pierre GM <pgmdevlist at gmail.com> 5/2/2008 11:34 AM >>>
On Friday 02 May 2008 12:11:04 Dharhas Pothina wrote:
> I want to sort the data to be monotonically increasing by the variable
> seconds and filter out duplicate values (say by deleting the second
> occurrence).

Dharhas,
>>>idx = seconds.argsort()
>>>sorted_seconds = seconds[idx]
>>>sorted_data = data[idx]
 will do the trick. Look at the help for the argsort method if you need to use 
a specific sorting algorithm. 'mergesort' is stable and can be preferred.

Then, you can try to find the duplicates that way:
>>>diffs = numpy.ediff1d(sorted_seconds, to begin=1)
>>>unq = (diffs!=0)
>>>final_seconds = sorted_seconds.compress(unq)
>>>final_data = sorted_data.compress(unq)

In a side note, you may want to give scikits.timeseries a try: we develop this 
package specifically to handle time series (ie, series indexed in time). The 
sorting part would be automatic, and finding the duplicates is also quite 
easy.
HIH
_______________________________________________
SciPy-user mailing list
SciPy-user at scipy.org 
http://projects.scipy.org/mailman/listinfo/scipy-user