[Numpy-discussion] Definition of correlation, correlate and so on ?

Tim Hochberg tim.hochberg at ieee.org
Tue Dec 12 12:07:24 EST 2006


David Cournapeau wrote:
> Charles R Harris wrote:
>   
>> On 12/12/06, *David Cournapeau* <david at ar.media.kyoto-u.ac.jp 
>> <mailto:david at ar.media.kyoto-u.ac.jp>> wrote:
>>
>>     Hi,
>>
>>         I am polishing some code to compute autocorrelation using fft, and
>>     when testing the code against numpy.correlate, I realised that I
>>     am not
>>     sure about the definition... There are various function related to
>>     correlation as far as numpy/scipoy is concerned:
>>
>>         numpy.correlate
>>         numpy.corrcoef
>>         scipy.signal.correlate
>>
>>         For me, the correlation between two sequences X and Y at lag t is
>>     the sum(X[i] * Y*[i+lag]) where Y* is the complex conjugate of Y.
>>     numpy.correlate does not use the conjugate, scipy.signal.correlate as
>>     well, and I don't understand numpy.corrcoef. I've never seen complex
>>     correlation used without the conjugate, so I was curious why this
>>
>>
>> Neither have I, it is one of those oddities that may have been 
>> inherited from Numeric. I wouldn't mind seeing it changed but it is 
>> probably a bit late for that.
>>     
> Well, I would myself call this a bug, not a feature, unless at least the 
> doc specifies the behaviour; the point of my question was to get the 
> opinion of other on this point. Anyway, a function to implements the 
> 'real' cross correlation as defined in signal processing and statistics 
> is a must have IMHO,
>   
It's unfriendly to modify the behavior of a function like this in a 
point release. And, this particular type of modification is particularly 
unfriendly since any code that depends on the current behavior won't 
break cleanly, but will start producing failures, possibly intermittent, 
data dependent failures, which are especially troublesome.  In addition, 
neither the name correlation nor its docstring is strongly, cough, 
correlated with cross-correlation. The docstring claims that it's the 
"discrete, linear correlation", which appears to mean nothing in my far 
from exhaustive web search.

So rather than "fixing" the function, I would first propose introducing 
a function with a more descriptive name and docstring , for example you 
could steal the name 'xcorr' from matlab. Then if in fact the behavior 
of correlate is deemed to be an error, deprecate it and start issuing a 
warning in the next point release, then remove it in the next major release.

Even better, IMO,  would be if someone who cares about this stuff pulls 
together all the related signal processing stuff and moves them to a 
submodule so we could actually find what signal processing primitives 
are available. At the same time, more informative docstrings would be a 
great.

-tim







More information about the NumPy-Discussion mailing list