[Numpy-discussion] Numpy correlate

Mon Mar 18 13:00:02 EDT 2013

Hi Sudheer,

Le 14/03/2013 10:18, Sudheer Joseph a écrit :
> Dear Numpy/Scipy experts,
>                                               Attached is a script
> which I made to test the numpy.correlate ( which is called py
> plt.xcorr) to see how the cross correlation is calculated. From this
> it appears the if i call plt.xcorr(x,y)
> Y is slided back in time compared to x. ie if y is a process that
> causes a delayed response in x after 5 timesteps then there should be
> a high correlation at Lag 5. However in attached plot the response is
> seen in only -ve side of the lags.
> Can any one advice me on how to see which way exactly the 2 series
> are slided back or forth.? and understand the cause result relation
> better?( I understand merely by correlation one cannot assume cause
> and result relation, but it is important to know which series is older
> in time at a given lag.
You indeed pointed out a lack of documentation of in matplotlib.xcorr
function because the definition of covariance can be ambiguous.

The way I would try to get an interpretation of xcorr function (& its
friends) is to go back to the theoretical definition of
cross-correlation, which is a normalized version of the covariance.

In your example you've created a time series X(k) and a lagged one :
Y(k) = X(k-5)

Now, the covariance function of X and Y is commonly defined as :
 Cov_{X,Y}(h) = E(X(k+h) * Y(k))   where E is the expectation
 (assuming that X and Y are centered for the sake of clarity).

If I plug in the definition of Y, I get Cov(h) = E(X(k+h) * X(k-5)).
This yields naturally the fact that the covariance is indeed maximal at
h=-5 and not h=+5.

Note that this reasoning does yield the opposite result with a different
definition of the covariance, ie. Cov_{X,Y}(h) = E(X(k) * Y(k+h))  (and
that's what I first did !).

Therefore, I think there should be a definition in of cross correlation
in matplotlib xcorr docstring. In R's acf doc, there is this mention :
"The lag k value returned by ccf(x, y) estimates the correlation between
x[t+k] and y[t]. "
(see http://stat.ethz.ch/R-manual/R-devel/library/stats/html/acf.html)

Now I believe, this upper discussion really belongs to matplotlib ML.
I'll put an issue on github (I just spotted a mistake the definition of
normalization anyway)

Coming back to numpy :
There's a strange thing, the definition of numpy.correlate seems to give
the other definition "z[k] = sum_n a[n] * conj(v[n+k])" (
http://docs.scipy.org/doc/numpy/reference/generated/numpy.correlate.html) although
its usage prooves otherwise. What did I miss ?

best,
Pierre
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130318/a73c8b14/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 900 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130318/a73c8b14/attachment.sig>