[Numpy-discussion] Bug in numpy.correlate documentation

Bernhard Spinnler Bernhard.Spinnler at gmx.net
Thu Oct 10 15:31:12 EDT 2013

On 10.10.2013, at 19:27, David Goldsmith <d.l.goldsmith at gmail.com> wrote:
> On Wed, Oct 9, 2013 at 7:48 PM, Bernhard Spinnler
> <Bernhard.Spinnler at gmx.net> wrote:
> > Hi Richard,
> >
> > Ah, I searched the list but didn't find those posts before?
> >
> > I can easily imagine that correlation is defined differently in different
> > disciplines. Both ways are correct and it's just a convention or definition.
> > In my field (Digital Communications, Digital Signal Processing) the vast
> > majority uses the convention implemented by the code. Here are a few
> > examples of prominent text books:
> >
> > - Papoulis, "Probaility, Random Variables, and Stochastic Processes",
> > McGraw-Hill, 2nd ed.
> > - Benvenuto, Cherubini, "Algorithms for Communications Systems and their
> > Applications", Wiley.
> > - Carlson, "Communication Systems" 4th ed. 2002, McGraw-Hill.
> >
> > Last not least, Matlab's xcorr() function behaves exactly like correlate()
> > does right now, see
> > - http://www.mathworks.de/de/help/signal/ref/xcorr.html
> >
> > But, as you say, the most important aspect might be, that most people will
> > probably prefer changing the docs instead of changing the code.
> Yeah, unless the current behaviour is actually broken or redundant in
> some way, we're not going to switch from one perfectly good convention
> to another perfectly good convention and break everyone's code in the
> process.
> The most helpful thing would be if you could file a pull request that
> just changes the docstring to what you think it should be. Extra bonus
> points if it points out that there is another definition some people
> might be expecting instead, and explains how those people can use the
> existing functions to get what they want. :-)
> -n
> IMHO, "point[ing] out that there is another definition some people might be expecting instead, and explain[ing] how those people can use the existing functions to get what they want" should be a requirement for the docstring ("Notes" section), not merely worth "extra bonus points."  But then I'm not, presently, in a position to edit the docstring myself, so that's just MHO.  
> IAE, I found what appears to me to be another "vote" for the extant docstring: Box & Jenkins, 1976, "Time Series Analysis: Forecasting and Control," Holden-Day, Oakland, pg. 374.  Perhaps a "switch" (with a default value that maintains current definition, so that extant uses would not require a code change) c/should be added to the function signature so that users can get easily get what they want?

As pointed out in another post in this thread, there are now at least three different definitions of correlation which are in use in different disciplines of science and engineering:

Numpy code:

z_numpyCode[k] = sum_n a[n+k] * conj(v[n])

Numpy docs:

z_numpyDoc[k] = sum_n a[n] * conj(v[n+k])
                         = sum_n a[n-k] * conj(v[n])
                         = z_numpyCode[-k]

Wolfram Mathworld:

z_mmca[k] = sum_n conj(a[n]) * v[n+k]
                 = conj( sum_n a[n] * conj(v[n+k]) )
                 = conj( z_numpyDoc[k] )
                 = conj( z_numpyCode[-k] )

I'm sure there are even more if you search long enough. But shouldn't the primary objective be to bring the docs in line with the code (which is definitely not "broken")? It took me 2 days of debugging my code recently only to discover that numpy correlate() was calculating a different correlation than the docs said.

I can try to come up with a proposal for the docs. Could anyone point me to where I can find the docs? I can clone the numpy repo, however, I'm not a numpy developer.

Best wishes,

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20131010/af03125b/attachment.html>

More information about the NumPy-Discussion mailing list