Dear list members, I am looking for some hints and recommendations. I want to use python to analyse and evaluate (measurement) time series. Therefore I am looking for some modules and documentation or tutorials which could help me in * filtering time series for a cetain criteria * reduce them to a lower time resolution: e.g. from houly values to daily mean values * filling data gaps for days or hours without valid data using statistical methods like regression * general data quality and trend assessment I found a conversion of the |Stat ("pipe-stat") programs written by Gary Perlman into a python module: pstat.py - http://www.nmr.mgh.harvard.edu/Neural_Systems_Group/gary/python/pstat.py This can be used as a start. But I am interested in what you could recommend. Thanks. Kind regards, Timmie
Timmie,
I want to use python to analyse and evaluate (measurement) time series.
The package TimeSeries was designed to perform the kind of operations you want (filtering, frequency conversion, handling missing data...). It is currently available in the sandbox of scipy. More info here: http://www.scipy.org/SciPyPackages/TimeSeries
* filling data gaps for days or hours without valid data using statistical methods like regression
This one might be a tad trickier. You may want to give pyloess a try, it's a wrapper around the dloess functions. Also available in the scipy sandbox.
Thanks for your answer.
The package TimeSeries was designed to perform the kind of operations you want (filtering, frequency conversion, handling missing data...). It is currently available in the sandbox of scipy. More info here: http://www.scipy.org/SciPyPackages/TimeSeries Wow, this looks great. But a little complex ;-) Well, one could write functions for common tasks that fascilitate it a bit...
* filling data gaps for days or hours without valid data using statistical methods like regression
This one might be a tad trickier. You may want to give pyloess a try, it's a wrapper around the dloess functions. Also available in the scipy sandbox.
Any idea when there will be a first binary release of all this together with maskedarray? Has anyone else done this kind of things? Maybe with a interface to R? Regards, Timmie
On Monday 05 November 2007 14:24:24 Timmie wrote:
Wow, this looks great. But a little complex ;-)
Well, one could write functions for common tasks that fascilitate it a bit...
You'd be surprised of how easy it is to use after a while. The conversion functions are quite useful. Please don't hesitate to contact em off-list if you have some special requests
Any idea when there will be a first binary release of all this together with maskedarray?
Nope. The inclusion of maskedarray in numpy is still in the air, timeseries may become a scikit sooner or later.
Has anyone else done this kind of things? What kind of things ?
Maybe with a interface to R? I started that way (with the rpy package), but decided to stick to pure Python to avoid surprises...
Wow, this looks great. But a little complex
Well, one could write functions for common tasks that fascilitate it a bit...
If you have any ideas for simplifying/improving things, we are certainly open to suggestions and would love the feedback. Being a sandbox package currently, there is no better time then now to get your ideas incorporated into the timeseries module.
Any idea when there will be a first binary release of all this together with maskedarray?
Nope. The inclusion of maskedarray in numpy is still in the air, timeseries may become a scikit sooner or later.
The timeseries package won't likely be moving anywhere until maskedarray moves somewhere else since it is dependent on it. I won't personally be providing any binaries while it is still in the sandbox either, but I am happy to provide advice on how to compile it on Windows. Assuming maskedarray eventually moves into the core of numpy, my preference would be to put the timeseries module right into the main scipy trunk as I believe it is general purpose enough to warrant inclusion in scipy, and many people have expressed interest in it. But much discussion and debate will have to take place before that will happen. Looking ahead really long term, I think built in support in matplotlib for TimeSeries objects (along the lines of the "plotlib" timeseries sub module) would be an improvement over the current approach to time series plotting in matplotlib too. But I have not discussed such things with any of the matplotlib developers and have no idea how they feel about that, and it is too early to discuss that yet anyway. - Matt
Wow, this looks great. But a little complex
Well, one could write functions for common tasks that fascilitate it a bit... I believe you. Time is the matter.
If you have any ideas for simplifying/improving things, we are certainly open to suggestions and would love the feedback. Being a sandbox package currently, there is no better time then now to get your ideas incorporated into the timeseries module. Well, since I am still a Python beginner somehow and still need to know how to use it efficiently for my data analysis I will not really be able to contribute to core. At my current state of py knowledge I can alredy write my own simple modules consisting of functions that I find useful.
Some things I can imagine are the following: create a tools directory under the timeseries tree. then there we could place things like * common frequency conversions: reduce to hourly values * error checking of measurement data: statistically and logically
Any idea when there will be a first binary release of all this together with maskedarray?
Nope. The inclusion of maskedarray in numpy is still in the air, timeseries may become a scikit sooner or later. Is it safe for me to replace/patch my current maskarray?
The timeseries package won't likely be moving anywhere until maskedarray moves somewhere else since it is dependent on it. I won't personally be providing any binaries while it is still in the sandbox either, but I am happy to provide advice on how to compile it on Windows. I have experience in compiling on linux but have to work on a windows box. Therefore an advice on that would be useful.
Is there a possibility to subscibe to SVN to get an email on chnages? Thanks for your responses. Timmie
Some things I can imagine are the following: create a tools directory under the timeseries tree.
There is a "lib" sub-directory for stuff that falls outside the core Date/TimeSeries classes. It currently includes a sub-module for "moving functions" (moving average, etc...), and interpolation.
* common frequency conversions: reduce to hourly values
import numpy as np import maskedarray as ma import timeseries as ts h = ts.time_series(np.arange(50, dtype=np.float32), start_date=ts.today ('hourly')) h timeseries([ 0. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12.
Frequency conversions are simple to do using the "convert" method of the TimeSeries class. Here is an example converting an hourly frequency series to daily... 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49.], dates = [06-Nov-2007 06:00 ... 08-Nov-2007 07:00], freq = H)
d = h.convert('daily') d timeseries( [[-- -- -- -- -- -- 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0 12.0 13.0 14.0 15.0 16.0 17.0] [18.0 19.0 20.0 21.0 22.0 23.0 24.0 25.0 26.0 27.0 28.0 29.0 30.0 31.0 32.0 33.0 34.0 35.0 36.0 37.0 38.0 39.0 40.0 41.0] [42.0 43.0 44.0 45.0 46.0 47.0 48.0 49.0 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --]], dates = [06-Nov-2007 ... 08-Nov-2007], freq = D)
d_avg = h.convert('daily', ma.average) d_avg timeseries([ 8.5 29.5 45.5], dates = [06-Nov-2007 ... 08-Nov-2007], freq = D)
=============================================== If any of the above seems mysterious, let me know and I can offer a more detailed explanation.
* error checking of measurement data: statistically and logically
Some data error checking algorithms could be useful, yes. I won't likely be working on them in the near future though.
Is it safe for me to replace/patch my current maskarray?
Generally speaking, the maskedarray package is mostly backwards compatible with the current numpy.ma package , and thus the api is very stable so you should be able to update it without any problems.
I have experience in compiling on linux but have to work on a windows box. Therefore an advice on that would be useful.
If you are using Python 2.5 on windows, it is easy to do using mingw. If you are using an earlier version of Python, Visual studio 2003 is the easiest way to go.
Is there a possibility to subscibe to SVN to get an email on chnages?
I don't know of any svn clients that have a built in way to do this, but it likely wouldn't be difficult to write your own script to do this since subversion has a command line interface. - Matt
Hi, I am starting in Python. I usually use Maple and I am with a problem to do the same project in Python. I would like to solve an equation like x+y+1=0 for x. In Maple I use "solve(x+y+1=0,x);". Do exist a similar command in Python/NumPy/SciPy? Regards, Eduardo.
These examples for solving equations are in scipy_tutorial.pdf from scipy import mat, linalg """ Solves three simultaneous equations: x + 3y + 5z = 10 2x + 5y + z = 8 2x + 3y + 8z =3 """ XYparameters = mat('[1 3 5; 2 5 1; 2 3 8]') constants = mat('[10; 8; 3]') print linalg.solve( XYparameters, constants ) # array([[-9.28], # [ 5.16], # [ 0.76]]) """ Solves two simultaneous equations: 5x + 2y - 9 = 0 3x + 4y - 4 = 0 """ XYparameters = mat('[5 2; 3 4]') constants = mat('[9; 4]') print linalg.solve( XYparameters, constants ) # should print (2,-0.5) Eduardo Rodrigues <elr1979@gmail.com> wrote: Hi, I am starting in Python. I usually use Maple and I am with a problem to do the same project in Python. I would like to solve an equation like x+y+1=0 for x. In Maple I use "solve(x+y+1=0,x);". Do exist a similar command in Python/NumPy/SciPy? Regards, Eduardo. _______________________________________________ SciPy-user mailing list SciPy-user@scipy.org http://projects.scipy.org/mailman/listinfo/scipy-user
Le mardi 06 novembre 2007 à 13:42 +0000, Michael Nandris a écrit :
These examples for solving equations are in scipy_tutorial.pdf Solves three simultaneous equations: x + 3y + 5z = 10 2x + 5y + z = 8 2x + 3y + 8z =3 """ print linalg.solve( XYparameters, constants )
Eduardo Rodrigues <elr1979@gmail.com> wrote: Hi, I am starting in Python. I usually use Maple and I am with a problem to do the same project in Python. I would like to solve an equation like x+y+1=0 for x. In Maple I use "solve(x+y+1=0,x);". Do exist a similar command in Python/NumPy/SciPy?
Isn't the request about symbolic computations, like in maple ? -- Fabrice Silva <Fabrice.Silva@crans.org>
If I want to solve an equation with linalg.solve I must know the equation, right?
From help: "solve(a, b) Return the solution of a*x = b"
But if I don't know - or if I don't want to look - the equation for x, how can I solve it? I will import equations from a file. []'s. ----- Original Message ----- From: Michael Nandris To: SciPy Users List Sent: Tuesday, November 06, 2007 10:42 AM Subject: Re: [SciPy-user] Solve These examples for solving equations are in scipy_tutorial.pdf from scipy import mat, linalg """ Solves three simultaneous equations: x + 3y + 5z = 10 2x + 5y + z = 8 2x + 3y + 8z =3 """ XYparameters = mat('[1 3 5; 2 5 1; 2 3 8]') constants = mat('[10; 8; 3]') print linalg.solve( XYparameters, constants ) # array([[-9.28], # [ 5.16], # [ 0.76]]) """ Solves two simultaneous equations: 5x + 2y - 9 = 0 3x + 4y - 4 = 0 """ XYparameters = mat('[5 2; 3 4]') constants = mat('[9; 4]') print linalg.solve( XYparameters, constants ) # should print (2,-0.5) Eduardo Rodrigues <elr1979@gmail.com> wrote: Hi, I am starting in Python. I usually use Maple and I am with a problem to do the same project in Python. I would like to solve an equation like x+y+1=0 for x. In Maple I use "solve(x+y+1=0,x);". Do exist a similar command in Python/NumPy/SciPy? Regards, Eduardo. _______________________________________________ SciPy-user mailing list SciPy-user@scipy.org http://projects.scipy.org/mailman/listinfo/scipy-user ------------------------------------------------------------------------------ _______________________________________________ SciPy-user mailing list SciPy-user@scipy.org http://projects.scipy.org/mailman/listinfo/scipy-user
Maybe I'm misunderstanding, but it seems you look at solve(a,b) as a function to return the solution of the linear symbolic equation: a * x = b (whose solution is: x = b / a), with x scalar. Instead, solve(a,b) gives the numerical solution of the linear system a * x = b, with x and b 1d-arrays and a 2d-array. For examples, something like: 2 * x1 + 3 * x2 = 4 4 * x1 + 5 * x2 = 6 where x = [x1, x2], b = [4, 6] and a = [[2, 3], [4, 5]]. solve(a,b) returns [-1, 2]. If from the file you import a string with the symbolic representation of you equation, and you want to solve it symbolically, stay with simpy. hth, L. On 11/6/07, Eduardo Rodrigues <elr1979@gmail.com> wrote:
If I want to solve an equation with linalg.solve I must know the equation, right?
From help: "solve(a, b) Return the solution of a*x = b"
But if I don't know - or if I don't want to look - the equation for x, how can I solve it? I will import equations from a file.
[]'s.
----- Original Message ----- *From:* Michael Nandris <mnandris@btinternet.com> *To:* SciPy Users List <scipy-user@scipy.org> *Sent:* Tuesday, November 06, 2007 10:42 AM *Subject:* Re: [SciPy-user] Solve
These examples for solving equations are in *scipy_tutorial.pdf*
from scipy import mat, linalg
""" Solves three simultaneous equations: x + 3y + 5z = 10 2x + 5y + z = 8 2x + 3y + 8z =3 """
XYparameters = mat('[1 3 5; 2 5 1; 2 3 8]') constants = mat('[10; 8; 3]') print linalg.solve( XYparameters, constants ) # array([[-9.28], # [ 5.16], # [ 0.76]])
""" Solves two simultaneous equations: 5x + 2y - 9 = 0 3x + 4y - 4 = 0 """
XYparameters = mat('[5 2; 3 4]') constants = mat('[9; 4]') print linalg.solve( XYparameters, constants ) # should print (2,-0.5)
*Eduardo Rodrigues <elr1979@gmail.com>* wrote:
Hi, I am starting in Python. I usually use Maple and I am with a problem to do the same project in Python. I would like to solve an equation like x+y+1=0 for x. In Maple I use "solve(x+y+1=0,x);". Do exist a similar command in Python/NumPy/SciPy? Regards, Eduardo.
_______________________________________________ SciPy-user mailing list SciPy-user@scipy.org http://projects.scipy.org/mailman/listinfo/scipy-user
------------------------------
_______________________________________________ SciPy-user mailing list SciPy-user@scipy.org http://projects.scipy.org/mailman/listinfo/scipy-user
_______________________________________________ SciPy-user mailing list SciPy-user@scipy.org http://projects.scipy.org/mailman/listinfo/scipy-user
On Tue, Nov 06, 2007 at 01:31:39PM -0300, Eduardo Rodrigues wrote:
If I want to solve an equation with linalg.solve I must know the equation, right?
From help: "solve(a, b) Return the solution of a*x = b"
But if I don't know - or if I don't want to look - the equation for x, how can I solve it? I will import equations from a file.
How are these equations defined ? Can you give us an example. Cheers, Gaël
On Tue, Nov 06, 2007 at 10:41:17AM -0300, Eduardo Rodrigues wrote:
Hi, I am starting in Python. I usually use Maple and I am with a problem to do the same project in Python. I would like to solve an equation like x+y+1=0 for x. In Maple I use "solve(x+y+1=0,x);". Do exist a similar command in Python/NumPy/SciPy?
Do you want a numericaly result, given the value of y, or do you want a formal result ? The numerical value can be calculated using scipy, for linear equation with the method exposed by Michael Nandris, for non linear equations, you should look at scipy.optimize.zeros. If you want a formal result, or a functional result, you can use sympy: from sympy import solve, Symbol x = Symbol('x') y = Symbol('y') f = solve(x+y+1, x) # f gives you the symbolic result F = lambda v: f.subs(y, v) F is a function that returns the exact expression of the root of you equation. If you have several roots, than you need to do something like: F = lambda v: [r.subs(y, v) for r in f] Currently sympy is still yound, and won't be able to solve everythin you throw at it. HTH, Gaël
Matt Knox <mattknox_ca <at> hotmail.com> writes:
>
> > Some things I can imagine are the following:
> > create a tools directory under the timeseries tree.
>
> There is a "lib" sub-directory for stuff that falls outside the core
> Date/TimeSeries classes. It currently includes a sub-module for "moving
> functions" (moving average, etc...), and interpolation.
>
> > * common frequency conversions: reduce to hourly values
>
> Frequency conversions are simple to do using the "convert" method of the
> TimeSeries class. Here is an example converting an hourly frequency series to
> daily...
>
> >>> import numpy as np
> >>> import maskedarray as ma
> >>> import timeseries as ts
> >>> h = ts.time_series(np.arange(50, dtype=np.float32), start_date=ts.today
> ('hourly'))
> >>> h
> timeseries([ 0. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12.
> 13. 14.
> 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29.
> 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44.
> 45. 46. 47. 48. 49.],
> dates = [06-Nov-2007 06:00 ... 08-Nov-2007 07:00],
> freq = H)
>
> >>> d = h.convert('daily')
> >>> d
> timeseries(
> [[-- -- -- -- -- -- 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0 12.0
> 13.0 14.0 15.0 16.0 17.0]
> [18.0 19.0 20.0 21.0 22.0 23.0 24.0 25.0 26.0 27.0 28.0 29.0 30.0 31.0
> 32.0 33.0 34.0 35.0 36.0 37.0 38.0 39.0 40.0 41.0]
> [42.0 43.0 44.0 45.0 46.0 47.0 48.0 49.0 -- -- -- -- -- -- -- -- -- -- --
> -- -- -- -- --]],
> dates =
> [06-Nov-2007 ... 08-Nov-2007],
> freq = D)
>
> >>> d_avg = h.convert('daily', ma.average)
> >>> d_avg
> timeseries([ 8.5 29.5 45.5],
> dates = [06-Nov-2007 ... 08-Nov-2007],
> freq = D)
>
> ===============================================
> If any of the above seems mysterious, let me know and I can offer a more
> detailed explanation.
>
> > * error checking of measurement data: statistically and logically
>
> Some data error checking algorithms could be useful, yes. I won't likely be
> working on them in the near future though.
I think that I will have a closer look at the package and then see how I can
use it. Maybe I can contribute something or at least give you feedback.
Well, I am very happy that I found this package. I really uses some Google
search but nothing helpful had turned out.
What I also fould (I think via the moin wiki at www.python.org):
* It is a Python package designed to accomplish some usual tasks during the
analysis of climate variability using Python: http://www.pyclimate.org/
* CDAT makes use of an open-source, object-oriented, easy-to-learn scripting
language (Python) to link together separate software subsystems and packages to
form an integrated environment for data analysis. http://www-
pcmdi.llnl.gov/software-portal/cdat
These two packages seem to do quite similar tasks. Or at least head in the same
direction. But they are depending mainly on the use of netCDF file storage. But
maybe there could be some likeage?
Kind regards and thanks for your help,
Timmie
P.S.: You my also send me PM on this, too.
Timmie, Matt and I started TimeSeries each on our own, before merging our efforts last Christmas. On my side, I was trying to find something equivalent to pyclimate for my own purpose, but supporting numpy. I poked around this package a bit, wasn't completely happy with it, then figured it would be as easy to redo on top of numpy. It's while I was struggling with subclassing the masked arrays of numpy.core.ma that I decided to reimplement maskedarray, which eventually leads to the current version of timeseries. If I'm not completely mistaken, numpy.core.ma is a translation via numeric of Paul Dubois' s implementation for CDAT. So yeah, I could have followed that path, but I was learning python at the same time and it was a very good exercise to reinvent the wheel, thus adding more noise and confusion. I do agree that there could be some linkage between timeseries, pyclimate and CDAT, at least in terms of converting objects from package to another. However, it's unlikely to happen any time soon. Nevertheless, I may try to implement some of the functions of pyclimate and CDAT in numpy/scipy, when I'll have the need for them. What I would suggest you is to start with timeseries, as it's pretty easy to use. Then, depending on what your needs are, you can start developing your own functions. In any case, don't hesitate to contact Matt or myself if you need some specific help. The TimeSeriesPackage page of the scipy wiki is a good start, Matt did a terrific job. Sincerely P.
Matt Knox <mattknox_ca <at> hotmail.com> writes:
Wow, this looks great. But a little complex
Well, one could write functions for common tasks that fascilitate it a bit...
If you have any ideas for simplifying/improving things, we are certainly open to suggestions and would love the feedback. Being a sandbox package currently, there is no better time then now to get your ideas incorporated into the timeseries module.
Hi, I was just looking through this list for ideas on how I could better use scipy. My issue is that I'm working with time series data, with missing values, that is in half-hour increments. As far as I can tell, with the current implementation of timeseries many operations will only work if the data has a point for every increment of a built in frequency. This means I have to either put my data in hourly form, which would require averaging, or convert it to minute frequency with missing values, which expands the size of my data by 30x. Is there a way around this, where I could just process half-hourly data? Thanks for your help! Timeseries looks like a really useful package if I can get it to play well with my data! Ben Sulman
participants (9)
-
Ben Sulman -
Eduardo Rodrigues -
Fabrice Silva -
Gael Varoquaux -
lorenzo bolla -
Matt Knox -
Michael Nandris -
Pierre GM -
Timmie