<html><body><div style="color:#000; background-color:#fff; font-family:times new roman, new york, times, serif;font-size:12pt"><div></div><div style="font-family: 'times new roman', 'new york', times, serif; font-size: 12pt;"><div style="font-family: 'times new roman', 'new york', times, serif; font-size: 12pt;"><div dir="ltr"><font size="2" face="Arial"><b><span style="font-weight:bold;">Thank you Pierre,</span></b></font></div><div dir="ltr"><font size="2" face="Arial"><span><span style="font-weight: bold;">                        </span>It appears the numpy.correlate uses the frequency domain method for getting the ccf. I would like to know how serious or exactly what is the issue with normalization?. I have computed cross correlation using the function and interpreting the results based on it. It will be helpful if you could tell me if there is a significant bug in the

 function</span></font></div><div dir="ltr"><font size="2" face="Arial"><span>with best regards,</span></font></div><div dir="ltr"><font size="2" face="Arial"><span>Sudheer</span></font></div><div dir="ltr"><font size="2" face="Arial">  <b><span style="font-weight:bold;">From:</span></b> Pierre Haessig <pierre.haessig@crans.org><br> <b><span style="font-weight: bold;">To:</span></b> numpy-discussion@scipy.org <br> <b><span style="font-weight: bold;">Sent:</span></b> Monday, 18 March 2013 10:30 PM<br> <b><span style="font-weight: bold;">Subject:</span></b> Re: [Numpy-discussion] Numpy correlate<br> </font> </div> <br><div id="yiv47526519">

  
  <div>

    Hi Sudheer,<br>

    <br>

    Le 14/03/2013 10:18, Sudheer Joseph a écrit :

    <blockquote type="cite">

      <div style="font-family: 'times new roman', 'new york', times, serif; font-size: 12pt;"><span>Dear Numpy/Scipy experts,</span></div>

      <div style="font-family: 'times new roman', 'new york', times, serif; font-size: 16px; color: rgb(0, 0, 0); background-color: transparent; font-style: normal;"><span>                       

                                Attached is a script which I made to

          test the numpy.correlate ( which is called py plt.xcorr) to

          see how the cross correlation is calculated. From this it

          appears the if i call plt.xcorr(x,y)</span></div>

      <div style="background-color:transparent;"><span>Y is slided back

          in time compared to x. ie if y is a process that causes a

          delayed response in x after 5 timesteps then there should be a

          high correlation at Lag 5. However in attached plot the

          response is seen in only -ve side of the lags.</span></div>

      <div style="background-color:transparent;"><span>Can any one

          advice me on how to see which way exactly the 2 series

          are slided back or forth.? and understand the cause result

          relation better?( I understand merely by correlation one

          cannot assume cause and result relation, but it is important

          to know which series is older in time at a given lag.</span></div>

    </blockquote>

    You indeed pointed out a lack of documentation of in

    matplotlib.xcorr function because the definition of covariance can

    be ambiguous.<br>

    <br>

    The way I would try to get an interpretation of xcorr function

    (& its friends) is to go back to the theoretical definition of

    cross-correlation, which is a normalized version of the covariance.<br>

    <br>

    In your example you've created a time series X(k) and a lagged one :

    Y(k) = X(k-5)<br>

    <br>

    Now, the covariance function of X and Y is commonly defined as :<br>

     Cov_{X,Y}(h) = E(X(k+h) * Y(k))   where E is the expectation<br>

     (assuming that X and Y are centered for the sake of clarity).<br>

    <br>

    If I plug in the definition of Y, I get Cov(h) = E(X(k+h) * X(k-5)).

    This yields naturally the fact that the covariance is indeed maximal

    at h=-5 and not h=+5.<br>

    <br>

    Note that this reasoning does yield the opposite result with a

    different definition of the covariance, ie. Cov_{X,Y}(h) = E(X(k) *

    Y(k+h))  (and that's what I first did !).<br>

    <br>

    <br>

    Therefore, I think there should be a definition in of cross

    correlation in matplotlib xcorr docstring. In R's acf doc, there is

    this mention : "The lag k value returned by ccf(x, y) estimates the

    correlation between x[t+k] and y[t]. "<br>

    (see

    <a rel="nofollow" class="yiv47526519moz-txt-link-freetext" target="_blank" href="http://stat.ethz.ch/R-manual/R-devel/library/stats/html/acf.html">http://stat.ethz.ch/R-manual/R-devel/library/stats/html/acf.html</a>)<br>

    <br>

    Now I believe, this upper discussion really belongs to matplotlib

    ML. I'll put an issue on github (I just spotted a mistake the

    definition of normalization anyway)<br>

    <br>

    <br>

    <br>

    Coming back to numpy :<br>

    There's a strange thing, the definition of numpy.correlate seems to

    give the other definition "z[k] = sum_n a[n] * conj(v[n+k])" (

    
    <a rel="nofollow" class="yiv47526519moz-txt-link-freetext" target="_blank" href="http://docs.scipy.org/doc/numpy/reference/generated/numpy.correlate.html">http://docs.scipy.org/doc/numpy/reference/generated/numpy.correlate.html</a>)

    although its usage prooves otherwise. What did I miss ?<br>

    <br>

    best,<br>

    Pierre<br>

  </div>


</div><br>_______________________________________________<br>NumPy-Discussion mailing list<br><a ymailto="mailto:NumPy-Discussion@scipy.org" href="mailto:NumPy-Discussion@scipy.org">NumPy-Discussion@scipy.org</a><br><a href="http://mail.scipy.org/mailman/listinfo/numpy-discussion" target="_blank">http://mail.scipy.org/mailman/listinfo/numpy-discussion</a><br><br><br> </div> </div>  </div></body></html>