[SciPy-User] Status of TimeSeries SciKit

Wes McKinney wesmckinn at gmail.com
Wed Jul 27 13:16:35 EDT 2011


On Wed, Jul 27, 2011 at 12:28 PM, Andreas <lists at hilboll.de> wrote:
> While we're at it:
>
>> Frequency conversion flexibility:
>>     - when going from a higher frequency to lower frequency (eg. daily to
>>       monthly), the timeseries module adds an extra dimension and groups the
>>       points so you still have all the original data rather than discarding
>>       data
>
> I'm using scikits.timeseries for analysis of atmospheric measurements.
> I've always wanted several things, and now that discussion is under way,
> maybe it's a good time to point them out:
>
> * When plotting a series, have the flexibility to have the value marked
> down at the center of the frequency. What I mean is, when I have monthly
> data and make a plot of one year, have each value be printed at the
> middle of the corresponding month, e.g. Jan 16, etc. Otherwise, It's not
> obvious to the reader whether the value printed on July 1 is actually
> that for June or that for July.

Seems like this could be pretty easy to do, need only add an
"tick_offset" option to the plotting function, I think.

> * Have full support for n-dimensional series. When I have a n-d array of
> data values for each point in time (n>0), many things don't work. The
> biggest problem here seems to be that pickling actually *seems* to work
> (a file is created), but when I load the file again, the entries in the
> array are somehow screwed up (like transposed).

support in pandas is very good for working with multiple univariate
time series using DataFrame, not quite as good for panel data (3d),
but I'm planing to build out an n-dimensional NDFrame which could
potentially address your needs. If you can show me your data and tell
me what you need to be able to do with it, it would be helpful to me.
The majority of my work in pandas has been motivated by use cases I've
experienced in applications.

> * Enable rolling means for sparse data. For example, if I have irregular
> (in time) measurements, say, every one to six days, I would still like
> to be able to calculate a rolling n-day-average. Missing values should
> be ignored (speaking numpy: timeslice.compressed().mean())

Either pandas or bottleneck will do this for you, so you can say something like:

rolling_mean(ts, window=50, min_periods=5)

and any sample with at least 5 data points in the window will compute
a value, missing (NaN) data will be excluded. Bottleneck has move_mean
and move_nanmean which will outperform pandas.rolling_mean a little
bit since the Cython code is more specialized.

> I don't know if any of this is already implemented in pandas, as I've
> never used it up till now. But perhaps someone would be interested in
> implementing these issues ...
>
> Cheers,
> Andreas.
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>



More information about the SciPy-User mailing list