how to get only complete years from series?
![](https://secure.gravatar.com/avatar/6f3cb304671ae5b6ea04dfe0e7948651.jpg?s=120&d=mm&r=g)
Hello, I am unsing the scikit.timeseries to evaluate a long-term measurement data set. How can I extract those years, which have complete measurements? In the below, years 2004 & 2008 are not complete. Is there a generic possibility that all incomplete years get masked? Thanks & regards, Timmie ###code import numpy as np import numpy.ma as ma import scikits.timeseries as ts data = np.arange(0, 40800) start_dt = ts.Date(freq='H', year=2004, month=3, day=1, hour=0) s_all = ts.time_series(data, freq='H', start_date=start_dt)
![](https://secure.gravatar.com/avatar/56b215661867f3b4f4a3b28077de66b3.jpg?s=120&d=mm&r=g)
Timmie, There's no generic function to perform what you want as it'll depend on the frequency. What you can do is: 1. get a list of years
singleyears = set(s_all.years)
2. for each year, check what are the first and last days of the year:
firstandlast = [tuple([year] +s_all[s_all.years==year].yeardays[[0,-1]].tolist()) for year in singleyears]
That gives you a list of tuples (year, first day, last day) 3. find the years for which the first day is strictly larger than 1 and the last strictly lower than 365.
maskyears = [y for (y,f,l) in firstandlast if f>1 or l<365]
4. Mask the corresponding years
for y in maskyears: s_all[s_all.years==y] = ma.masked
That's far from efficient and rather ugly, but that should give you a generic idea. Let me know how it goes. P. On Nov 17, 2008, at 3:34 PM, Timmie wrote:
Hello, I am unsing the scikit.timeseries to evaluate a long-term measurement data set.
How can I extract those years, which have complete measurements?
In the below, years 2004 & 2008 are not complete. Is there a generic possibility that all incomplete years get masked?
Thanks & regards, Timmie
###code
import numpy as np import numpy.ma as ma import scikits.timeseries as ts
data = np.arange(0, 40800) start_dt = ts.Date(freq='H', year=2004, month=3, day=1, hour=0) s_all = ts.time_series(data, freq='H', start_date=start_dt)
_______________________________________________ SciPy-user mailing list SciPy-user@scipy.org http://projects.scipy.org/mailman/listinfo/scipy-user
![](https://secure.gravatar.com/avatar/56b215661867f3b4f4a3b28077de66b3.jpg?s=120&d=mm&r=g)
Timmie, There's smarter than the previous answer, if you're not afraid of temporary arrays. Here's a copy-pasted version, commented. Let me know how it goes. Cheers P. #### BELOW A SAMPLE SCRIPT THAT MAY ILLUSTRATE #### #!/usr/bin/env python # -*- coding: utf-8 -*- import datetime import scikits.timeseries as ts import numpy as np #import numpy as np import numpy.ma as ma import scikits.timeseries as ts data = np.arange(0, 40800) start_dt = ts.Date(freq='H', year=2004, month=3, day=1, hour=0) s_all = ts.time_series(data, freq='H', start_date=start_dt) # Convert to a (5,24*366) annual series: each row is a year, each column an hour # Because of lapse years, we have 24*366 cols, not 24*365 a_s_all = s_all.convert('A') # If the first column (the first date) is masked, mask the row. a_s_all[a_s_all[:,0].mask] = ma.masked # If the column -25 (last hour of 12/31 or 12/30) is masked, masked the column a_s_all[a_s_all[:,-25].mask] = ma.masked # Make a new series from the annual series. # We can't us convert because the annual series is 2D. # Instead, we create a new series starting at the first date of the annual series, # converted to the correct frequency (s_all.freq). # As the method asfreq defaults to END, we need to force 'START' for relation # (check the docstring of asfreq). starting_date = a_s_all.dates[0].asfreq(s_all.freq, relation='START') # For the data, we can't use a_s_all.ravel() directly because a_s_all is 2D, # but we only need the data actually, not the dates. s_new = ts.time_series(a_s_all._series.ravel(), start_date=starting_date) # And if you want, you can force the starting and ending dates of this new series # to the initial ones s_mod = ts.align_with(s_all, s_new) On Nov 17, 2008, at 3:34 PM, Timmie wrote:
Hello, I am unsing the scikit.timeseries to evaluate a long-term measurement data set.
How can I extract those years, which have complete measurements?
In the below, years 2004 & 2008 are not complete. Is there a generic possibility that all incomplete years get masked?
Thanks & regards, Timmie
###code
import numpy as np import numpy.ma as ma import scikits.timeseries as ts
data = np.arange(0, 40800) start_dt = ts.Date(freq='H', year=2004, month=3, day=1, hour=0) s_all = ts.time_series(data, freq='H', start_date=start_dt)
_______________________________________________ SciPy-user mailing list SciPy-user@scipy.org http://projects.scipy.org/mailman/listinfo/scipy-user
participants (2)
-
Pierre GM
-
Timmie