[Numpy-discussion] Numpy correlate

Mon Mar 18 13:10:16 EDT 2013

On Mon, Mar 18, 2013 at 1:00 PM, Pierre Haessig <pierre.haessig at crans.org>wrote:

>  Hi Sudheer,
>
> Le 14/03/2013 10:18, Sudheer Joseph a écrit :
>
> Dear Numpy/Scipy experts,
>                                               Attached is a script which I
> made to test the numpy.correlate ( which is called py plt.xcorr) to see how
> the cross correlation is calculated. From this it appears the if i call
> plt.xcorr(x,y)
> Y is slided back in time compared to x. ie if y is a process that causes a
> delayed response in x after 5 timesteps then there should be a high
> correlation at Lag 5. However in attached plot the response is seen in only
> -ve side of the lags.
> Can any one advice me on how to see which way exactly the 2 series
> are slided back or forth.? and understand the cause result relation
> better?( I understand merely by correlation one cannot assume cause and
> result relation, but it is important to know which series is older in time
> at a given lag.
>
> You indeed pointed out a lack of documentation of in matplotlib.xcorr
> function because the definition of covariance can be ambiguous.
>
> The way I would try to get an interpretation of xcorr function (& its
> friends) is to go back to the theoretical definition of cross-correlation,
> which is a normalized version of the covariance.
>
> In your example you've created a time series X(k) and a lagged one : Y(k)
> = X(k-5)
>
> Now, the covariance function of X and Y is commonly defined as :
>  Cov_{X,Y}(h) = E(X(k+h) * Y(k))   where E is the expectation
>  (assuming that X and Y are centered for the sake of clarity).
>
> If I plug in the definition of Y, I get Cov(h) = E(X(k+h) * X(k-5)). This
> yields naturally the fact that the covariance is indeed maximal at h=-5 and
> not h=+5.
>
> Note that this reasoning does yield the opposite result with a different
> definition of the covariance, ie. Cov_{X,Y}(h) = E(X(k) * Y(k+h))  (and
> that's what I first did !).
>
>
> Therefore, I think there should be a definition in of cross correlation in
> matplotlib xcorr docstring. In R's acf doc, there is this mention : "The
> lag k value returned by ccf(x, y) estimates the correlation between x[t+k]
> and y[t]. "
> (see http://stat.ethz.ch/R-manual/R-devel/library/stats/html/acf.html)
>
> Now I believe, this upper discussion really belongs to matplotlib ML. I'll
> put an issue on github (I just spotted a mistake the definition of
> normalization anyway)
>

You might be interested in the statsmodels implementation which should be
similar to the R functionality.

http://nbviewer.ipython.org/urls/raw.github.com/jseabold/tutorial/master/tsa_arma.ipynb
http://statsmodels.sourceforge.net/devel/generated/statsmodels.tsa.stattools.acf.html<http://statsmodels.sourceforge.net/devel/generated/statsmodels.tsa.stattools.acf.html?highlight=acf#statsmodels.tsa.stattools.acf>
http://statsmodels.sourceforge.net/devel/generated/statsmodels.graphics.tsaplots.plot_acf.html<http://statsmodels.sourceforge.net/devel/generated/statsmodels.graphics.tsaplots.plot_acf.html?highlight=acf#statsmodels.graphics.tsaplots.plot_acf>

Skipper
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130318/7651e638/attachment.html>