
Okay! I think I now actually understand what was going on with np.gradient! The discussion has been pretty confused, and I'm worried that what's in master right now is not the right solution, which is a problem because we're supposed to cut 1.9.1 tomorrow. Background: np.gradient computes gradients using a finite difference method. Wikipedia has a good explanation of how this works: https://en.wikipedia.org/wiki/Numerical_differentiation The key point is that there are multiple formulas one can use that all converge to the derivative, e.g. (f(x + h) - f(x)) / h or (f(x + h) - f(x - h)) / 2h The first is the textbook definition of the derivative. As h -> 0, the error in that approximation shrinks like O(h). The second formula also converges to the derivative as h -> 0, but it converges like O(h^2), i.e., much faster. And there's are many many formulas like this with different trade-offs: https://en.wikipedia.org/wiki/Finite_difference_coefficient In practice, given a grid of values of f(x) and a grid stepsize of h, all of these formulas come down to doing a simple convolution of the data with certain coefficients (e.g. [-1/h, 1/h] or [-1/2h, 0, 1/2h] for the two formulas above). Traditionally np.gradient has used the second formula above, with its quadratically diminishing error term, for interior points in the grid. For points on the boundary of the grid, though, this formula has a problem, b/c it requires you look at points to both the left and the right of the current point -- but for boundary points one of these doesn't exist. In such situations np.gradient has traditionally used the first formula instead (the "forward (or backward) finite difference approximation with linear error", where "forward"/"backward" means that it works on the boundary). As the web page linked above shows, though, there's an easy alternative formula that works on The change in numpy 1.9.0 that has caused all the fuss, is that we switched to using the quadratically accurate approximation instead of the linearly accurate approximation for boundary points. This had two effects: 1) it produced (hopefully small) changes in the calculation for the boundaries. In general these should have made the calculations more accurate, but apparently this did cause problems for some people who wanted perfectly reproducible output. 2) it tickled a nasty bug in how gradient and masked arrays work together. It turns out gradient applied to masked arrays has always been seriously buggy; there's some horribleness in the masked array implementation that means the output array that np.gradient is writing into ends up sharing a mask array with the input. So as we go along calculating the output, we also mutate the input. This can definitely corrupt the input, for example here: https://gist.github.com/njsmith/551738469b74d175e039 ...and I strongly suspect it's possible to find cases where it means you get the wrong answer. For mysterious reasons involving the order in which values were calculated or something, the version of np.gradient in 1.9.0 tickled this bug in a much worse manner, with early mutations to the input array ended up affecting later calculations using the same input array, causing cascading errors. This is the cause of the massive problems that matplotlib encountered in their test suite: https://github.com/matplotlib/matplotlib/issues/3598#issuecomment-58859663 There's a proposed fix for the masked array issues at: https://github.com/numpy/numpy/pull/5203 For the reasons described above, I would be very wary about using *any* currently released version of np.gradient on masked arrays. A workaround would be to make a local copy of the definition of np.gradient, and immediately after the 'out' array is allocated do 'out.mask = out.mask.copy()'. Once this bug is fixed, the "new" gradient code produces results which are identical to the "old" gradient code on the interior; only the boundary is different. ------ So here are my concerns: - We decided to revert the changes to np.gradient in 1.9.1 (at least by default). I'm not sure how much of that decision was based on the issues with masked arrays, though, which turns out to be a different issue entirely. The previous discussion conflated the two issues above. - We decided to gate the more-accurate boundary calculation behind a kwarg called "edge_order=2". AFAICT now that I actually understand what this code is doing, this is a terrible name -- we're using a 3-coefficient kernel to compute a quadratically accurate approximation to a first-order derivative. There probably exist other kernels that are also quadratically accurate. "Order 2" simply doesn't refer to this in any unique or meaningful way. And it will be even more confusing if we ever add the option to select which kernel to use on the interior, where "quadratically accurate" is definitely not enough information to uniquely define a kernel. Maybe we want something like edge_accuracy=2 or edge_accuracy="quadratic" or something? I'm not sure. - If edge_order=2 escapes into a release tomorrow then we'll be stuck with it. Some possible options for 1.9.1: - Just revert the entire np.gradient code to match 1.8, and put off a real fix until 1.9. This at least avoids getting stuck with a poor API. - Put the masked array bug fix into 1.9.1, and leave the gradient code the same as 1.9.0; put off designing a proper API for selecting the old behavior for 1.9.2. (Or 1.10 I guess.) This doesn't solve the pure reproduceability problems, but it might be enough to make matplotlib work again at least? - Delay 1.9.1 to rehash the issues and make a decision on what we want to support long-term. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org

Hi, On Sat, Oct 18, 2014 at 6:17 PM, Nathaniel Smith <njs@pobox.com> wrote:
Okay! I think I now actually understand what was going on with np.gradient! The discussion has been pretty confused, and I'm worried that what's in master right now is not the right solution, which is a problem because we're supposed to cut 1.9.1 tomorrow.
Background:
np.gradient computes gradients using a finite difference method. Wikipedia has a good explanation of how this works: https://en.wikipedia.org/wiki/Numerical_differentiation The key point is that there are multiple formulas one can use that all converge to the derivative, e.g.
(f(x + h) - f(x)) / h
or
(f(x + h) - f(x - h)) / 2h
The first is the textbook definition of the derivative. As h -> 0, the error in that approximation shrinks like O(h). The second formula also converges to the derivative as h -> 0, but it converges like O(h^2), i.e., much faster. And there's are many many formulas like this with different trade-offs: https://en.wikipedia.org/wiki/Finite_difference_coefficient In practice, given a grid of values of f(x) and a grid stepsize of h, all of these formulas come down to doing a simple convolution of the data with certain coefficients (e.g. [-1/h, 1/h] or [-1/2h, 0, 1/2h] for the two formulas above).
Traditionally np.gradient has used the second formula above, with its quadratically diminishing error term, for interior points in the grid. For points on the boundary of the grid, though, this formula has a problem, b/c it requires you look at points to both the left and the right of the current point -- but for boundary points one of these doesn't exist. In such situations np.gradient has traditionally used the first formula instead (the "forward (or backward) finite difference approximation with linear error", where "forward"/"backward" means that it works on the boundary). As the web page linked above shows, though, there's an easy alternative formula that works on
Did you lose some text here?
The change in numpy 1.9.0 that has caused all the fuss, is that we switched to using the quadratically accurate approximation instead of the linearly accurate approximation for boundary points.
This had two effects:
1) it produced (hopefully small) changes in the calculation for the boundaries. In general these should have made the calculations more accurate, but apparently this did cause problems for some people who wanted perfectly reproducible output.
2) it tickled a nasty bug in how gradient and masked arrays work together. It turns out gradient applied to masked arrays has always been seriously buggy; there's some horribleness in the masked array implementation that means the output array that np.gradient is writing into ends up sharing a mask array with the input. So as we go along calculating the output, we also mutate the input. This can definitely corrupt the input, for example here: https://gist.github.com/njsmith/551738469b74d175e039 ...and I strongly suspect it's possible to find cases where it means you get the wrong answer.
For mysterious reasons involving the order in which values were calculated or something, the version of np.gradient in 1.9.0 tickled this bug in a much worse manner, with early mutations to the input array ended up affecting later calculations using the same input array, causing cascading errors. This is the cause of the massive problems that matplotlib encountered in their test suite: https://github.com/matplotlib/matplotlib/issues/3598#issuecomment-58859663
There's a proposed fix for the masked array issues at: https://github.com/numpy/numpy/pull/5203 For the reasons described above, I would be very wary about using *any* currently released version of np.gradient on masked arrays. A workaround would be to make a local copy of the definition of np.gradient, and immediately after the 'out' array is allocated do 'out.mask = out.mask.copy()'.
Once this bug is fixed, the "new" gradient code produces results which are identical to the "old" gradient code on the interior; only the boundary is different.
------
So here are my concerns:
- We decided to revert the changes to np.gradient in 1.9.1 (at least by default). I'm not sure how much of that decision was based on the issues with masked arrays, though, which turns out to be a different issue entirely. The previous discussion conflated the two issues above.
- We decided to gate the more-accurate boundary calculation behind a kwarg called "edge_order=2". AFAICT now that I actually understand what this code is doing, this is a terrible name -- we're using a 3-coefficient kernel to compute a quadratically accurate approximation to a first-order derivative. There probably exist other kernels that are also quadratically accurate. "Order 2" simply doesn't refer to this in any unique or meaningful way. And it will be even more confusing if we ever add the option to select which kernel to use on the interior, where "quadratically accurate" is definitely not enough information to uniquely define a kernel. Maybe we want something like edge_accuracy=2 or edge_accuracy="quadratic" or something? I'm not sure.
- If edge_order=2 escapes into a release tomorrow then we'll be stuck with it.
Some possible options for 1.9.1: - Just revert the entire np.gradient code to match 1.8, and put off a real fix until 1.9. This at least avoids getting stuck with a poor API.
- Put the masked array bug fix into 1.9.1, and leave the gradient code the same as 1.9.0; put off designing a proper API for selecting the old behavior for 1.9.2. (Or 1.10 I guess.) This doesn't solve the pure reproduceability problems, but it might be enough to make matplotlib work again at least?
- Delay 1.9.1 to rehash the issues and make a decision on what we want to support long-term.
Excellent summary, thanks for doing all the work to get to the bottom of the problem. Is there any significant disadvantage to delaying the 1.9.1 release for a few days? I would appreciate a short delay to see if it's possible to agree a workaround for the OSX Accelerate bug : https://github.com/numpy/numpy/pull/5205 - but in any case, unless there is a compelling reason to release tomorrow, it seems reasonable to take the time to agree on a good solution to the np.gradient API. Cheers, Matthew

On Sun, Oct 19, 2014 at 2:23 AM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Sat, Oct 18, 2014 at 6:17 PM, Nathaniel Smith <njs@pobox.com> wrote:
Okay! I think I now actually understand what was going on with np.gradient! The discussion has been pretty confused, and I'm worried that what's in master right now is not the right solution, which is a problem because we're supposed to cut 1.9.1 tomorrow.
Background:
np.gradient computes gradients using a finite difference method. Wikipedia has a good explanation of how this works: https://en.wikipedia.org/wiki/Numerical_differentiation The key point is that there are multiple formulas one can use that all converge to the derivative, e.g.
(f(x + h) - f(x)) / h
or
(f(x + h) - f(x - h)) / 2h
The first is the textbook definition of the derivative. As h -> 0, the error in that approximation shrinks like O(h). The second formula also converges to the derivative as h -> 0, but it converges like O(h^2), i.e., much faster. And there's are many many formulas like this with different trade-offs: https://en.wikipedia.org/wiki/Finite_difference_coefficient In practice, given a grid of values of f(x) and a grid stepsize of h, all of these formulas come down to doing a simple convolution of the data with certain coefficients (e.g. [-1/h, 1/h] or [-1/2h, 0, 1/2h] for the two formulas above).
Traditionally np.gradient has used the second formula above, with its quadratically diminishing error term, for interior points in the grid. For points on the boundary of the grid, though, this formula has a problem, b/c it requires you look at points to both the left and the right of the current point -- but for boundary points one of these doesn't exist. In such situations np.gradient has traditionally used the first formula instead (the "forward (or backward) finite difference approximation with linear error", where "forward"/"backward" means that it works on the boundary). As the web page linked above shows, though, there's an easy alternative formula that works on
Did you lose some text here?
"There's an easy alternative formula that works on edge points and provides quadratic accuracy." Not too critical, you probably figured out the gist of it :-) -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org

On Sat, Oct 18, 2014 at 7:17 PM, Nathaniel Smith <njs@pobox.com> wrote:
Okay! I think I now actually understand what was going on with np.gradient! The discussion has been pretty confused, and I'm worried that what's in master right now is not the right solution, which is a problem because we're supposed to cut 1.9.1 tomorrow.
Background:
np.gradient computes gradients using a finite difference method. Wikipedia has a good explanation of how this works: https://en.wikipedia.org/wiki/Numerical_differentiation The key point is that there are multiple formulas one can use that all converge to the derivative, e.g.
(f(x + h) - f(x)) / h
or
(f(x + h) - f(x - h)) / 2h
The first is the textbook definition of the derivative. As h -> 0, the error in that approximation shrinks like O(h). The second formula also converges to the derivative as h -> 0, but it converges like O(h^2), i.e., much faster. And there's are many many formulas like this with different trade-offs: https://en.wikipedia.org/wiki/Finite_difference_coefficient In practice, given a grid of values of f(x) and a grid stepsize of h, all of these formulas come down to doing a simple convolution of the data with certain coefficients (e.g. [-1/h, 1/h] or [-1/2h, 0, 1/2h] for the two formulas above).
Traditionally np.gradient has used the second formula above, with its quadratically diminishing error term, for interior points in the grid. For points on the boundary of the grid, though, this formula has a problem, b/c it requires you look at points to both the left and the right of the current point -- but for boundary points one of these doesn't exist. In such situations np.gradient has traditionally used the first formula instead (the "forward (or backward) finite difference approximation with linear error", where "forward"/"backward" means that it works on the boundary). As the web page linked above shows, though, there's an easy alternative formula that works on
The change in numpy 1.9.0 that has caused all the fuss, is that we switched to using the quadratically accurate approximation instead of the linearly accurate approximation for boundary points.
This had two effects:
1) it produced (hopefully small) changes in the calculation for the boundaries. In general these should have made the calculations more accurate, but apparently this did cause problems for some people who wanted perfectly reproducible output.
2) it tickled a nasty bug in how gradient and masked arrays work together. It turns out gradient applied to masked arrays has always been seriously buggy; there's some horribleness in the masked array implementation that means the output array that np.gradient is writing into ends up sharing a mask array with the input. So as we go along calculating the output, we also mutate the input. This can definitely corrupt the input, for example here: https://gist.github.com/njsmith/551738469b74d175e039 ...and I strongly suspect it's possible to find cases where it means you get the wrong answer.
For mysterious reasons involving the order in which values were calculated or something, the version of np.gradient in 1.9.0 tickled this bug in a much worse manner, with early mutations to the input array ended up affecting later calculations using the same input array, causing cascading errors. This is the cause of the massive problems that matplotlib encountered in their test suite:
https://github.com/matplotlib/matplotlib/issues/3598#issuecomment-58859663
There's a proposed fix for the masked array issues at: https://github.com/numpy/numpy/pull/5203 For the reasons described above, I would be very wary about using *any* currently released version of np.gradient on masked arrays. A workaround would be to make a local copy of the definition of np.gradient, and immediately after the 'out' array is allocated do 'out.mask = out.mask.copy()'.
Once this bug is fixed, the "new" gradient code produces results which are identical to the "old" gradient code on the interior; only the boundary is different.
------
So here are my concerns:
- We decided to revert the changes to np.gradient in 1.9.1 (at least by default). I'm not sure how much of that decision was based on the issues with masked arrays, though, which turns out to be a different issue entirely. The previous discussion conflated the two issues above.
My concern was reproducibility. The old behavior wasn't a bug, so we should be careful that old results are reproducible.
- We decided to gate the more-accurate boundary calculation behind a kwarg called "edge_order=2". AFAICT now that I actually understand what this code is doing, this is a terrible name -- we're using a 3-coefficient kernel to compute a quadratically accurate approximation to a first-order derivative.
Accuracy is a different problem and depends on the function being interpolated. As you point out above, the order refers to the *rate* of convergence. Normally that is illustrated with a loglog plot of the absolute value of the error against h, resulting in a straight line once the function is sufficiently smooth over the range of h. The end points are a bit special because of the lack of bracketing. Relative to the interior, one is effectively extrapolating rather than interpolating and things can become a bit unstable. Hence it is useful to have a safe option, here linear extrapolation, as an alternative to a higher order method.
There probably exist other kernels that are also quadratically accurate. "Order 2" simply doesn't refer to this in any unique or meaningful way. And it will be even more confusing if we ever add the option to select which kernel to use on the interior, where "quadratically accurate" is definitely not enough information to uniquely define a kernel. Maybe we want something like edge_accuracy=2 or edge_accuracy="quadratic" or something? I'm not sure.
- If edge_order=2 escapes into a release tomorrow then we'll be stuck with it.
Order has two common meanings in the numerical context, either degree, or, for polynomials, the number of coefficients (degree + 1). I've noticed that the meaning has evolved over the years, and these days an equivalence with degree seems to be pretty common. In the present context, it refers to the power of the h term in the error term of the Taylor's series approximation of the derivative. The precise meaning needs to be elucidated in the notes, as the order doesn't map one-to-one into methods.
Some possible options for 1.9.1: - Just revert the entire np.gradient code to match 1.8, and put off a real fix until 1.9. This at least avoids getting stuck with a poor API.
- Put the masked array bug fix into 1.9.1, and leave the gradient code the same as 1.9.0; put off designing a proper API for selecting the old behavior for 1.9.2. (Or 1.10 I guess.) This doesn't solve the pure reproduceability problems, but it might be enough to make matplotlib work again at least?
The only reason I see to keep the current default is to maintain behavior from 1.9.0 to 1.9.1. I don't think 1.9.0 will last long in the wild. Anyone working on the cutting edge will likely update almost immediately, and distros releases in the near future will probably have 1.8.2.
- Delay 1.9.1 to rehash the issues and make a decision on what we want to support long-term.
Might want to delay a few days in any case to settle the Mac/Windows issues. Chuck

On Sun, Oct 19, 2014 at 3:37 AM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Sat, Oct 18, 2014 at 7:17 PM, Nathaniel Smith <njs@pobox.com> wrote:
So here are my concerns:
- We decided to revert the changes to np.gradient in 1.9.1 (at least by default). I'm not sure how much of that decision was based on the issues with masked arrays, though, which turns out to be a different issue entirely. The previous discussion conflated the two issues above.
My concern was reproducibility. The old behavior wasn't a bug, so we should be careful that old results are reproducible.
Yep.
- We decided to gate the more-accurate boundary calculation behind a kwarg called "edge_order=2". AFAICT now that I actually understand what this code is doing, this is a terrible name -- we're using a 3-coefficient kernel to compute a quadratically accurate approximation to a first-order derivative.
Accuracy is a different problem and depends on the function being interpolated. As you point out above, the order refers to the *rate* of convergence. Normally that is illustrated with a loglog plot of the absolute value of the error against h, resulting in a straight line once the function is sufficiently smooth over the range of h. The end points are a bit special because of the lack of bracketing. Relative to the interior, one is effectively extrapolating rather than interpolating and things can become a bit unstable. Hence it is useful to have a safe option, here linear extrapolation, as an alternative to a higher order method.
This sounds plausible! But given that we have to (a) pick defaults, (b) pick which ones are included at all, and (c) document the difference in such a way that users can make an informed choice, then I'd kinda like... more precision :-). I'm sure there are situations where one or the other is better, but which situations are those? Do you know some way to tell which is which? Does one situation arise substantially more often than the other?
There probably exist other kernels that are also quadratically accurate. "Order 2" simply doesn't refer to this in any unique or meaningful way. And it will be even more confusing if we ever add the option to select which kernel to use on the interior, where "quadratically accurate" is definitely not enough information to uniquely define a kernel. Maybe we want something like edge_accuracy=2 or edge_accuracy="quadratic" or something? I'm not sure.
- If edge_order=2 escapes into a release tomorrow then we'll be stuck with it.
Order has two common meanings in the numerical context, either degree, or, for polynomials, the number of coefficients (degree + 1). I've noticed that the meaning has evolved over the years, and these days an equivalence with degree seems to be pretty common. In the present context, it refers to the power of the h term in the error term of the Taylor's series approximation of the derivative. The precise meaning needs to be elucidated in the notes, as the order doesn't map one-to-one into methods.
Surely this is still an argument for using a word that requires less elucidation, like degree or accuracy? (I'm particularly concerned because "2nd order derivative" has a *very* well known meaning that's very importantly different.) -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org

On Sun, Oct 19, 2014 at 8:13 AM, Nathaniel Smith <njs@pobox.com> wrote:
On Sun, Oct 19, 2014 at 3:37 AM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Sat, Oct 18, 2014 at 7:17 PM, Nathaniel Smith <njs@pobox.com> wrote:
So here are my concerns:
- We decided to revert the changes to np.gradient in 1.9.1 (at least by default). I'm not sure how much of that decision was based on the issues with masked arrays, though, which turns out to be a different issue entirely. The previous discussion conflated the two issues above.
My concern was reproducibility. The old behavior wasn't a bug, so we
should
be careful that old results are reproducible.
Yep.
- We decided to gate the more-accurate boundary calculation behind a kwarg called "edge_order=2". AFAICT now that I actually understand what this code is doing, this is a terrible name -- we're using a 3-coefficient kernel to compute a quadratically accurate approximation to a first-order derivative.
Accuracy is a different problem and depends on the function being interpolated. As you point out above, the order refers to the *rate* of convergence. Normally that is illustrated with a loglog plot of the absolute value of the error against h, resulting in a straight line once the function is sufficiently smooth over the range of h. The end points are a bit special because of the lack of bracketing. Relative to the interior, one is effectively extrapolating rather than interpolating and things can become a bit unstable. Hence it is useful to have a safe option, here linear extrapolation, as an alternative to a higher order method.
This sounds plausible! But given that we have to (a) pick defaults, (b) pick which ones are included at all, and (c) document the difference in such a way that users can make an informed choice, then I'd kinda like... more precision :-). I'm sure there are situations where one or the other is better, but which situations are those? Do you know some way to tell which is which? Does one situation arise substantially more often than the other?
A challenge ;) In [34]: x = arange(11)/10. In [35]: y = exp(-1./x + -1./(1 - x)) In [36]: y[-1] = y[0] = 0 In [37]: plot(x, gradient(y, .01, edge_order=1)) Out[37]: [<matplotlib.lines.Line2D at 0x4c319d0>] In [38]: plot(x, gradient(y, .01, edge_order=2)) Out[38]: [<matplotlib.lines.Line2D at 0x4c49f10>] A bit artificial I'll admit. Most functions with reasonable sample points do better with the second order version. The main argument for keeping the linear default is backward compatibility. Absent that, second order would be preferable.
There probably exist other kernels that
are also quadratically accurate. "Order 2" simply doesn't refer to this in any unique or meaningful way. And it will be even more confusing if we ever add the option to select which kernel to use on the interior, where "quadratically accurate" is definitely not enough information to uniquely define a kernel. Maybe we want something like edge_accuracy=2 or edge_accuracy="quadratic" or something? I'm not sure.
- If edge_order=2 escapes into a release tomorrow then we'll be stuck with it.
Order has two common meanings in the numerical context, either degree, or, for polynomials, the number of coefficients (degree + 1). I've noticed that the meaning has evolved over the years, and these days an equivalence with degree seems to be pretty common. In the present context, it refers to the power of the h term in the error term of the Taylor's series approximation of the derivative. The precise meaning needs to be elucidated in the notes, as the order doesn't map one-to-one into methods.
Surely this is still an argument for using a word that requires less elucidation, like degree or accuracy? (I'm particularly concerned because "2nd order derivative" has a *very* well known meaning that's very importantly different.)
Order is well understood in context, but you have a point for naive users. Maybe a string would be better, "edge_method='linear', " etc. Chuck
participants (3)
-
Charles R Harris
-
Matthew Brett
-
Nathaniel Smith