Re: [Numpy-discussion] Bug in numpy.correlate documentation

Date: Wed, 9 Oct 2013 21:54:07 +0100
From: Nathaniel Smith <njs@pobox.com> Subject: Re: [Numpy-discussion] Bug in numpy.correlate documentation To: Discussion of Numerical Python <numpy-discussion@scipy.org> Message-ID: <CAPJVwBmMZKN= z8V-aHUU+85LZ88xYWmAwxgzHk5GhtfuW8HN9A@mail.gmail.com> Content-Type: text/plain; charset=UTF-8
On Wed, Oct 9, 2013 at 7:48 PM, Bernhard Spinnler <Bernhard.Spinnler@gmx.net> wrote:
Hi Richard,
Ah, I searched the list but didn't find those posts before?
I can easily imagine that correlation is defined differently in different disciplines. Both ways are correct and it's just a convention or definition. In my field (Digital Communications, Digital Signal Processing) the vast majority uses the convention implemented by the code. Here are a few examples of prominent text books:
- Papoulis, "Probaility, Random Variables, and Stochastic Processes", McGraw-Hill, 2nd ed. - Benvenuto, Cherubini, "Algorithms for Communications Systems and their Applications", Wiley. - Carlson, "Communication Systems" 4th ed. 2002, McGraw-Hill.
Last not least, Matlab's xcorr() function behaves exactly like correlate() does right now, see - http://www.mathworks.de/de/help/signal/ref/xcorr.html
But, as you say, the most important aspect might be, that most people will probably prefer changing the docs instead of changing the code.
Yeah, unless the current behaviour is actually broken or redundant in some way, we're not going to switch from one perfectly good convention to another perfectly good convention and break everyone's code in the process.
The most helpful thing would be if you could file a pull request that just changes the docstring to what you think it should be. Extra bonus points if it points out that there is another definition some people might be expecting instead, and explains how those people can use the existing functions to get what they want. :-)
-n
IMHO, "point[ing] out that there is another definition some people might be expecting instead, and explain[ing] how those people can use the existing functions to get what they want" should be a requirement for the docstring ("Notes" section), not merely worth "extra bonus points." But then I'm not, presently, in a position to edit the docstring myself, so that's just MHO. IAE, I found what appears to me to be another "vote" for the extant docstring: Box & Jenkins, 1976, "Time Series Analysis: Forecasting and Control," Holden-Day, Oakland, pg. 374. Perhaps a "switch" (with a default value that maintains current definition, so that extant uses would not require a code change) c/should be added to the function signature so that users can get easily get what they want? DG

On 10.10.2013, at 19:27, David Goldsmith <d.l.goldsmith@gmail.com> wrote:
On Wed, Oct 9, 2013 at 7:48 PM, Bernhard Spinnler <Bernhard.Spinnler@gmx.net> wrote:
Hi Richard,
Ah, I searched the list but didn't find those posts before?
I can easily imagine that correlation is defined differently in different disciplines. Both ways are correct and it's just a convention or definition. In my field (Digital Communications, Digital Signal Processing) the vast majority uses the convention implemented by the code. Here are a few examples of prominent text books:
- Papoulis, "Probaility, Random Variables, and Stochastic Processes", McGraw-Hill, 2nd ed. - Benvenuto, Cherubini, "Algorithms for Communications Systems and their Applications", Wiley. - Carlson, "Communication Systems" 4th ed. 2002, McGraw-Hill.
Last not least, Matlab's xcorr() function behaves exactly like correlate() does right now, see - http://www.mathworks.de/de/help/signal/ref/xcorr.html
But, as you say, the most important aspect might be, that most people will probably prefer changing the docs instead of changing the code.
Yeah, unless the current behaviour is actually broken or redundant in some way, we're not going to switch from one perfectly good convention to another perfectly good convention and break everyone's code in the process.
The most helpful thing would be if you could file a pull request that just changes the docstring to what you think it should be. Extra bonus points if it points out that there is another definition some people might be expecting instead, and explains how those people can use the existing functions to get what they want. :-)
-n
IMHO, "point[ing] out that there is another definition some people might be expecting instead, and explain[ing] how those people can use the existing functions to get what they want" should be a requirement for the docstring ("Notes" section), not merely worth "extra bonus points." But then I'm not, presently, in a position to edit the docstring myself, so that's just MHO.
IAE, I found what appears to me to be another "vote" for the extant docstring: Box & Jenkins, 1976, "Time Series Analysis: Forecasting and Control," Holden-Day, Oakland, pg. 374. Perhaps a "switch" (with a default value that maintains current definition, so that extant uses would not require a code change) c/should be added to the function signature so that users can get easily get what they want?
As pointed out in another post in this thread, there are now at least three different definitions of correlation which are in use in different disciplines of science and engineering: Numpy code: z_numpyCode[k] = sum_n a[n+k] * conj(v[n]) Numpy docs: z_numpyDoc[k] = sum_n a[n] * conj(v[n+k]) = sum_n a[n-k] * conj(v[n]) = z_numpyCode[-k] Wolfram Mathworld: z_mmca[k] = sum_n conj(a[n]) * v[n+k] = conj( sum_n a[n] * conj(v[n+k]) ) = conj( z_numpyDoc[k] ) = conj( z_numpyCode[-k] ) I'm sure there are even more if you search long enough. But shouldn't the primary objective be to bring the docs in line with the code (which is definitely not "broken")? It took me 2 days of debugging my code recently only to discover that numpy correlate() was calculating a different correlation than the docs said. I can try to come up with a proposal for the docs. Could anyone point me to where I can find the docs? I can clone the numpy repo, however, I'm not a numpy developer. Best wishes, Bernhard

On 10.10.2013 21:31, Bernhard Spinnler wrote:
On 10.10.2013, at 19:27, David Goldsmith <d.l.goldsmith@gmail.com <mailto:d.l.goldsmith@gmail.com>> wrote:
On Wed, Oct 9, 2013 at 7:48 PM, Bernhard Spinnler <Bernhard.Spinnler@gmx.net <mailto:Bernhard.Spinnler@gmx.net>> wrote: > Hi Richard, > > Ah, I searched the list but didn't find those posts before? > > I can easily imagine that correlation is defined differently in different > disciplines. Both ways are correct and it's just a convention or definition. > In my field (Digital Communications, Digital Signal Processing) the vast > majority uses the convention implemented by the code. Here are a few > examples of prominent text books: > > - Papoulis, "Probaility, Random Variables, and Stochastic Processes", > McGraw-Hill, 2nd ed. > - Benvenuto, Cherubini, "Algorithms for Communications Systems and their > Applications", Wiley. > - Carlson, "Communication Systems" 4th ed. 2002, McGraw-Hill. > > Last not least, Matlab's xcorr() function behaves exactly like correlate() > does right now, see > - http://www.mathworks.de/de/help/signal/ref/xcorr.html > > But, as you say, the most important aspect might be, that most people will > probably prefer changing the docs instead of changing the code.
Yeah, unless the current behaviour is actually broken or redundant in some way, we're not going to switch from one perfectly good convention to another perfectly good convention and break everyone's code in the process.
The most helpful thing would be if you could file a pull request that just changes the docstring to what you think it should be. Extra bonus points if it points out that there is another definition some people might be expecting instead, and explains how those people can use the existing functions to get what they want. :-)
-n
IMHO, "point[ing] out that there is another definition some people might be expecting instead, and explain[ing] how those people can use the existing functions to get what they want" should be a requirement for the docstring ("Notes" section), not merely worth "extra bonus points." But then I'm not, presently, in a position to edit the docstring myself, so that's just MHO.
IAE, I found what appears to me to be another "vote" for the extant docstring: Box & Jenkins, 1976, "Time Series Analysis: Forecasting and Control," Holden-Day, Oakland, pg. 374. Perhaps a "switch" (with a default value that maintains current definition, so that extant uses would not require a code change) c/should be added to the function signature so that users can get easily get what they want?
As pointed out in another post in this thread, there are now at least three different definitions of correlation which are in use in different disciplines of science and engineering:
Numpy code:
z_numpyCode[k] = sum_n a[n+k] * conj(v[n])
Numpy docs:
z_numpyDoc[k] = sum_n a[n] * conj(v[n+k]) = sum_n a[n-k] * conj(v[n]) = z_numpyCode[-k]
Wolfram Mathworld:
z_mmca[k] = sum_n conj(a[n]) * v[n+k] = conj( sum_n a[n] * conj(v[n+k]) ) = conj( z_numpyDoc[k] ) = conj( z_numpyCode[-k] )
I'm sure there are even more if you search long enough. But shouldn't the primary objective be to bring the docs in line with the code (which is definitely not "broken")? It took me 2 days of debugging my code recently only to discover that numpy correlate() was calculating a different correlation than the docs said.
I can try to come up with a proposal for the docs. Could anyone point me to where I can find the docs? I can clone the numpy repo, however, I'm not a numpy developer.
yes we should only change the documentation to match the (hopefully correct) code. the documentation is in the docstring of the correlate function in numpy/core/numeric.py line 819

On 11.10.2013, at 01:19, Julian Taylor <jtaylor.debian@googlemail.com> wrote:
Yeah, unless the current behaviour is actually broken or redundant in some way, we're not going to switch from one perfectly good convention to another perfectly good convention and break everyone's code in the process.
The most helpful thing would be if you could file a pull request that just changes the docstring to what you think it should be. Extra bonus points if it points out that there is another definition some people might be expecting instead, and explains how those people can use the existing functions to get what they want. :-)
-n
IMHO, "point[ing] out that there is another definition some people might be expecting instead, and explain[ing] how those people can use the existing functions to get what they want" should be a requirement for the docstring ("Notes" section), not merely worth "extra bonus points." But then I'm not, presently, in a position to edit the docstring myself, so that's just MHO.
IAE, I found what appears to me to be another "vote" for the extant docstring: Box & Jenkins, 1976, "Time Series Analysis: Forecasting and Control," Holden-Day, Oakland, pg. 374. Perhaps a "switch" (with a default value that maintains current definition, so that extant uses would not require a code change) c/should be added to the function signature so that users can get easily get what they want?
As pointed out in another post in this thread, there are now at least three different definitions of correlation which are in use in different disciplines of science and engineering:
Numpy code:
z_numpyCode[k] = sum_n a[n+k] * conj(v[n])
Numpy docs:
z_numpyDoc[k] = sum_n a[n] * conj(v[n+k]) = sum_n a[n-k] * conj(v[n]) = z_numpyCode[-k]
Wolfram Mathworld:
z_mmca[k] = sum_n conj(a[n]) * v[n+k] = conj( sum_n a[n] * conj(v[n+k]) ) = conj( z_numpyDoc[k] ) = conj( z_numpyCode[-k] )
I'm sure there are even more if you search long enough. But shouldn't the primary objective be to bring the docs in line with the code (which is definitely not "broken")? It took me 2 days of debugging my code recently only to discover that numpy correlate() was calculating a different correlation than the docs said.
I can try to come up with a proposal for the docs. Could anyone point me to where I can find the docs? I can clone the numpy repo, however, I'm not a numpy developer.
yes we should only change the documentation to match the (hopefully correct) code. the documentation is in the docstring of the correlate function in numpy/core/numeric.py line 819 _______________________________________________
Ok, corrected the docstring, mentioning one alternative definition of correlation. Pull request filed: https://github.com/numpy/numpy/pull/3913. Bernhard
participants (3)
-
Bernhard Spinnler
-
David Goldsmith
-
Julian Taylor