OT: advice on modelling a time series
Hello, While not directly Python related I am always impressed with the quality of scientific advice available on this list, and was hoping I could receive some... I have a limited amount of an experimentally obtained time series from a biological system. I would like to come up with a generative model which would allow production of large quantities of data with statistics as similar as possible to the experimental data. The time series represents a position, and I am particularly interested in transient high velocity/acceleration events (which are often not very visible by eye in the position trace), so ideally any model should reproduce those with particular care. An example plot of a small section of the data (pos vel and acc) (1s) is available here: http://i41.tinypic.com/ou42de.jpg If it makes any difference it is sampled at 4kHz. I tried fitting a basic autoregressive model. An order 38 model reproduced the position signal visually quite well, but velocity and acceleration were far too regular. I tried fitting one to the velocity, but I think the events of interest are too far apart in bins so the order required is too large. So, could anyone point me to anything that would be helpful in python (so far I did the AR with a matlab package I found)? Also any suggestions for how to proceed would be great - other than reading the wikipedia article I am completely new to this type of AR modelling. So far the only ideas I have involve either downsampling the signal (to try to reduce the order of AR model needed), or splitting it in frequency to low f/high f components and attempting to model them separately then recombine. Do either of these seem sensible? Is it likely some non-linear model would be required (pos,vel and acc all have high kurtosis), or are normal AR models capable of recreating this kind of fine structure if tweaked sufficiently? Thanks in advance for any pointers, Cheers Robin
On Fri, Mar 12, 2010 at 09:46, Robin <robince@gmail.com> wrote:
So, could anyone point me to anything that would be helpful in python (so far I did the AR with a matlab package I found)? Also any suggestions for how to proceed would be great - other than reading the wikipedia article I am completely new to this type of AR modelling. So far the only ideas I have involve either downsampling the signal (to try to reduce the order of AR model needed), or splitting it in frequency to low f/high f components and attempting to model them separately then recombine. Do either of these seem sensible?
The econometricians here will probably have more to add here, but I suspect that you could use an ARCH or GARCH model: http://en.wikipedia.org/wiki/Autoregressive_conditional_heteroskedasticity Josef was working on implementing GARCH, but I don't think he finished. There is MATLAB code: http://www.kevinsheppard.com/wiki/MFE_Toolbox And R code: http://cran.r-project.org/web/views/Finance.html
Is it likely some non-linear model would be required (pos,vel and acc all have high kurtosis), or are normal AR models capable of recreating this kind of fine structure if tweaked sufficiently?
It is possible that once you allow heterskedasticity, you can account for the marginal kurtosis. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
On Fri, Mar 12, 2010 at 10:46 AM, Robin <robince@gmail.com> wrote:
Hello,
While not directly Python related I am always impressed with the quality of scientific advice available on this list, and was hoping I could receive some...
I have a limited amount of an experimentally obtained time series from a biological system. I would like to come up with a generative model which would allow production of large quantities of data with statistics as similar as possible to the experimental data. The time series represents a position, and I am particularly interested in transient high velocity/acceleration events (which are often not very visible by eye in the position trace), so ideally any model should reproduce those with particular care.
An example plot of a small section of the data (pos vel and acc) (1s) is available here: http://i41.tinypic.com/ou42de.jpg
If it makes any difference it is sampled at 4kHz. I tried fitting a basic autoregressive model. An order 38 model reproduced the position signal visually quite well, but velocity and acceleration were far too regular. I tried fitting one to the velocity, but I think the events of interest are too far apart in bins so the order required is too large.
So, could anyone point me to anything that would be helpful in python (so far I did the AR with a matlab package I found)? Also any suggestions for how to proceed would be great - other than reading the wikipedia article I am completely new to this type of AR modelling. So far the only ideas I have involve either downsampling the signal (to try to reduce the order of AR model needed), or splitting it in frequency to low f/high f components and attempting to model them separately then recombine. Do either of these seem sensible?
Is it likely some non-linear model would be required (pos,vel and acc all have high kurtosis), or are normal AR models capable of recreating this kind of fine structure if tweaked sufficiently?
Thanks in advance for any pointers,
In statsmodels we are working on some time series analysis, but it is still a bit to early for real use. We have AR, but for this kind of data I would recommend scikits.talkbox which has a Levinson-Durbin recursion implemented that gives a more robust estimate of longer AR polynomial (maybe nitime also has it now.) I don't know of any implementation of non-linear models for time series analysis in python, e.g. a markov switching or threshold model, or of any models that would allow for fat-tailes or asymmetric shock distributions. If you just want to generate sample data with similar features, then this will be much easier than estimation. (I have some tentative simulation code for continuous time diffusion processes but not cleaned up) Your acceleration data looks like a GARCH process, that is the variance is autocorrelated but not (much) the mean. There also, I have an initial version but not yet good enough to be reliable.
From the graph, it also looks like the three observations are strongly related, so separate (univariate) modeling doesn't look like the most appropriate choice.
Josef
Cheers
Robin _______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
On Fri, Mar 12, 2010 at 11:32 AM, <josef.pktd@gmail.com> wrote:
On Fri, Mar 12, 2010 at 10:46 AM, Robin <robince@gmail.com> wrote:
Hello,
While not directly Python related I am always impressed with the quality of scientific advice available on this list, and was hoping I could receive some...
I have a limited amount of an experimentally obtained time series from a biological system. I would like to come up with a generative model which would allow production of large quantities of data with statistics as similar as possible to the experimental data. The time series represents a position, and I am particularly interested in transient high velocity/acceleration events (which are often not very visible by eye in the position trace), so ideally any model should reproduce those with particular care.
An example plot of a small section of the data (pos vel and acc) (1s) is available here: http://i41.tinypic.com/ou42de.jpg
If it makes any difference it is sampled at 4kHz. I tried fitting a basic autoregressive model. An order 38 model reproduced the position signal visually quite well, but velocity and acceleration were far too regular. I tried fitting one to the velocity, but I think the events of interest are too far apart in bins so the order required is too large.
So, could anyone point me to anything that would be helpful in python (so far I did the AR with a matlab package I found)? Also any suggestions for how to proceed would be great - other than reading the wikipedia article I am completely new to this type of AR modelling. So far the only ideas I have involve either downsampling the signal (to try to reduce the order of AR model needed), or splitting it in frequency to low f/high f components and attempting to model them separately then recombine. Do either of these seem sensible?
Is it likely some non-linear model would be required (pos,vel and acc all have high kurtosis), or are normal AR models capable of recreating this kind of fine structure if tweaked sufficiently?
Thanks in advance for any pointers,
In statsmodels we are working on some time series analysis, but it is still a bit to early for real use. We have AR, but for this kind of data I would recommend scikits.talkbox which has a Levinson-Durbin recursion implemented that gives a more robust estimate of longer AR polynomial (maybe nitime also has it now.)
I don't know of any implementation of non-linear models for time series analysis in python, e.g. a markov switching or threshold model, or of any models that would allow for fat-tailes or asymmetric shock distributions.
If you just want to generate sample data with similar features, then this will be much easier than estimation. (I have some tentative simulation code for continuous time diffusion processes but not cleaned up)
Your acceleration data looks like a GARCH process, that is the variance is autocorrelated but not (much) the mean.
There also, I have an initial version but not yet good enough to be reliable.
From the graph, it also looks like the three observations are strongly related, so separate (univariate) modeling doesn't look like the most appropriate choice.
from looking at the graphs: If acceleration is an independent GARCH process, then velocity would just be the integral (plus noise), isn't it? Position should also be heteroscedastic, because the change in position will depend on the velocity, but it looks mean reverting, so it might be highly autocorrelated in mean with variance given by velocity, or maybe with an MA error if the position doesn't change direction too suddenly. (I'm not sure about how the steering wheel should be modeled) Josef
Josef
Cheers
Robin _______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
On Fri, Mar 12, 2010 at 10:46, <josef.pktd@gmail.com> wrote:
On Fri, Mar 12, 2010 at 11:32 AM, <josef.pktd@gmail.com> wrote:
From the graph, it also looks like the three observations are strongly related, so separate (univariate) modeling doesn't look like the most appropriate choice.
from looking at the graphs:
If acceleration is an independent GARCH process, then velocity would just be the integral (plus noise), isn't it?
Robin will have to confirm, but I suspect that only the position was actually measured and that he derived the velocity and acceleration from the position time series numerically. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
On Fri, Mar 12, 2010 at 12:01 PM, Robert Kern <robert.kern@gmail.com> wrote:
On Fri, Mar 12, 2010 at 10:46, <josef.pktd@gmail.com> wrote:
On Fri, Mar 12, 2010 at 11:32 AM, <josef.pktd@gmail.com> wrote:
From the graph, it also looks like the three observations are strongly related, so separate (univariate) modeling doesn't look like the most appropriate choice.
from looking at the graphs:
If acceleration is an independent GARCH process, then velocity would just be the integral (plus noise), isn't it?
Robin will have to confirm, but I suspect that only the position was actually measured and that he derived the velocity and acceleration from the position time series numerically.
In that case it might not be to difficult to go in reverse, from acceleration to position. From the graph, I would think that acceleration is the random input for the process. Maybe with some adjustments to correct for specification errors. Josef
-- Robert Kern
"I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco _______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
On Fri, Mar 12, 2010 at 5:10 PM, <josef.pktd@gmail.com> wrote:
On Fri, Mar 12, 2010 at 12:01 PM, Robert Kern <robert.kern@gmail.com> wrote:
On Fri, Mar 12, 2010 at 10:46, <josef.pktd@gmail.com> wrote:
On Fri, Mar 12, 2010 at 11:32 AM, <josef.pktd@gmail.com> wrote:
From the graph, it also looks like the three observations are strongly related, so separate (univariate) modeling doesn't look like the most appropriate choice.
from looking at the graphs:
If acceleration is an independent GARCH process, then velocity would just be the integral (plus noise), isn't it?
Robin will have to confirm, but I suspect that only the position was actually measured and that he derived the velocity and acceleration from the position time series numerically.
In that case it might not be to difficult to go in reverse, from acceleration to position. From the graph, I would think that acceleration is the random input for the process. Maybe with some adjustments to correct for specification errors.
Right - I am primarily working with position - vel and acc are shown just to illustrate the features I am hoping to preserve/replicate (ie transient high acceleration events which aren't really visible in the position trace, even when blown up quite a lot). I did the differentiation with basic finite differences (diff), smoothed with a 1ms (4bin moving average) (this was the method used by others previously). Arguably acceleration is the most important of the representations (this is part of the hypothesis we are planning to test) so I agree starting from acceleration and integrating to get the actual position to use could be a good idea. In the mean time I have to look up a lot of the other things you mentioned before I have anything sensible to add (heteroscedastic is a new one on me and I will look up GARCH and ARCH also) . Thanks very much for the quick and useful responses! Cheers Robin
On Fri, Mar 12, 2010 at 12:16 PM, Robin <robince@gmail.com> wrote:
On Fri, Mar 12, 2010 at 5:10 PM, <josef.pktd@gmail.com> wrote:
On Fri, Mar 12, 2010 at 12:01 PM, Robert Kern <robert.kern@gmail.com> wrote:
On Fri, Mar 12, 2010 at 10:46, <josef.pktd@gmail.com> wrote:
On Fri, Mar 12, 2010 at 11:32 AM, <josef.pktd@gmail.com> wrote:
From the graph, it also looks like the three observations are strongly related, so separate (univariate) modeling doesn't look like the most appropriate choice.
from looking at the graphs:
If acceleration is an independent GARCH process, then velocity would just be the integral (plus noise), isn't it?
Robin will have to confirm, but I suspect that only the position was actually measured and that he derived the velocity and acceleration from the position time series numerically.
In that case it might not be to difficult to go in reverse, from acceleration to position. From the graph, I would think that acceleration is the random input for the process. Maybe with some adjustments to correct for specification errors.
Right - I am primarily working with position - vel and acc are shown just to illustrate the features I am hoping to preserve/replicate (ie transient high acceleration events which aren't really visible in the position trace, even when blown up quite a lot). I did the differentiation with basic finite differences (diff), smoothed with a 1ms (4bin moving average) (this was the method used by others previously).
Arguably acceleration is the most important of the representations (this is part of the hypothesis we are planning to test) so I agree starting from acceleration and integrating to get the actual position to use could be a good idea.
In the mean time I have to look up a lot of the other things you mentioned before I have anything sensible to add (heteroscedastic is a new one on me and I will look up GARCH and ARCH also) .
Thanks very much for the quick and useful responses!
If you have rpy and R installed, you can try something like this on your position data, to see whether GARCH is helpful (ret is my 1d numpy.array with the time series) from rpy import r r.library('fGarch') # pure garch on mean corrected data f = r.formula('~garch(1, 1)') fit = r.garchFit(f, data = ret - ret.mean(), include_mean=False) #ARMA in mean, GARCH in variance f = r.formula('~arma(1,1) + ~garch(1, 1)') fit = r.garchFit(f, data = ret) I haven't figured out how to do many of the options for GARCH with rpy. Josef
Cheers
Robin _______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
-----Original Message----- From: scipy-user-bounces@scipy.org [mailto:scipy-user-bounces@scipy.org] On Behalf Of josef.pktd@gmail.com Sent: Friday, March 12, 2010 9:53 AM To: SciPy Users List Subject: Re: [SciPy-User] OT: advice on modelling a time series If you have rpy and R installed, you can try something like this on your position data, to see whether GARCH is helpful (ret is my 1d numpy.array with the time series)
from rpy import r r.library('fGarch')
# pure garch on mean corrected data f = r.formula('~garch(1, 1)') fit = r.garchFit(f, data = ret - ret.mean(), include_mean=False)
#ARMA in mean, GARCH in variance f = r.formula('~arma(1,1) + ~garch(1, 1)') fit = r.garchFit(f, data = ret)
I haven't figured out how to do many of the options for GARCH with rpy.
I don't know about rpy (classic), but in rpy2, you can input any R expression as a string. import rpy2.robjects as ro R = ro.r [code that defines x and y in python] # set the variables up in the R environment R.globalEnv['xR'] = x # needs to be a list? R.globalEnv['yR'] = y Fit = R("some pure R expression using xR and yR") Probably won't be all that fast, but it'll work. -paul h.
participants (4)
-
josef.pktd@gmail.com
-
PHobson@Geosyntec.com
-
Robert Kern
-
Robin