[Numpy]: making np.gradient support unevenly spaced data

Hi all, current version of numpy gradient supports only data with uniform spacing. I have implemented a proposed enhancement for the np.gradient function that allows to compute the gradient on non uniform grids. (PR: https://github.com/numpy/numpy/pull/8446) Since it may be of interest also for the scipy user/developers and it may be useful also for scipy.diff <https://github.com/scipy/scipy/wiki/Proposal:-add-finite-difference-numerica...> I am posting here too. The proposed implementation has a behaviour/signature is similar to that of Matlab/Octave. As argument it can take: 1. A single scalar to specify a sample distance for all dimensions. 2. N scalars to specify a constant sample distance for each dimension. i.e. `dx`, `dy`, `dz`, ... 3. N arrays to specify the coordinates of the values along each dimension of F. The length of the array must match the size of the corresponding dimension 4. Any combination of N scalars/arrays with the meaning of 2. and 3. e.g., you can do the following: >>> f = np.array([[1, 2, 6], [3, 4, 5]], dtype=np.float) >>> dx = 2. >>> y = [1., 1.5, 3.5] >>> np.gradient(f, dx, y) [array([[ 1. , 1. , -0.5], [ 1. , 1. , -0.5]]), array([[ 2. , 2. , 2. ], [ 2. , 1.7, 0.5]])] It should not break any existing code since as of 1.12 only scalars or list of scalars are allowed. A possible alternative API could be pass arrays of sampling steps instead of the coordinates. On the one hand, this would have the advantage of having "differences" both in the scalar case and in the array case. On the other hand, if you are dealing with non uniformly-spaced data (e.g, data is mapped on a grid or it is a time-series), in most cases you already have the coordinates/timestamps. Therefore, in the case of difference as argument, you would almost always have a call np.diff before np.gradient. In the end, I would rather prefer the coordinates option since IMHO it is more handy, I don't think that would be too much "surprising" and it is what Matlab already does. Also, it could not easily lead to "silly" mistakes since the length have to match the size of the corresponding dimension. What do you think? Thanks Alessandro -- -------------------------------------------------------------------------- NOTICE: Dlgs 196/2003 this e-mail and any attachments thereto may contain confidential information and are intended for the sole use of the recipient(s) named above. If you are not the intended recipient of this message you are hereby notified that any dissemination or copying of this message is strictly prohibited. If you have received this e-mail in error, please notify the sender either by telephone or by e-mail and delete the material from any computer. Thank you. --------------------------------------------------------------------------

On Fri, Jan 20, 2017 at 5:21 AM, alebarde@gmail.com <alebarde@gmail.com> wrote:
Hi all,
current version of numpy gradient supports only data with uniform spacing. I have implemented a proposed enhancement for the np.gradient function that allows to compute the gradient on non uniform grids. (PR: https://github.com/numpy/numpy/pull/8446) Since it may be of interest also for the scipy user/developers and it may be useful also for scipy.diff <https://github.com/scipy/scipy/wiki/Proposal:-add-finite-difference-numerica...> I am posting here too.
The proposed implementation has a behaviour/signature is similar to that of Matlab/Octave. As argument it can take: 1. A single scalar to specify a sample distance for all dimensions. 2. N scalars to specify a constant sample distance for each dimension. i.e. `dx`, `dy`, `dz`, ... 3. N arrays to specify the coordinates of the values along each dimension of F. The length of the array must match the size of the corresponding dimension 4. Any combination of N scalars/arrays with the meaning of 2. and 3.
e.g., you can do the following:
>>> f = np.array([[1, 2, 6], [3, 4, 5]], dtype=np.float) >>> dx = 2. >>> y = [1., 1.5, 3.5] >>> np.gradient(f, dx, y) [array([[ 1. , 1. , -0.5], [ 1. , 1. , -0.5]]), array([[ 2. , 2. , 2. ], [ 2. , 1.7, 0.5]])]
It should not break any existing code since as of 1.12 only scalars or list of scalars are allowed. A possible alternative API could be pass arrays of sampling steps instead of the coordinates. On the one hand, this would have the advantage of having "differences" both in the scalar case and in the array case. On the other hand, if you are dealing with non uniformly-spaced data (e.g, data is mapped on a grid or it is a time-series), in most cases you already have the coordinates/timestamps. Therefore, in the case of difference as argument, you would almost always have a call np.diff before np.gradient.
In the end, I would rather prefer the coordinates option since IMHO it is more handy, I don't think that would be too much "surprising" and it is what Matlab already does. Also, it could not easily lead to "silly" mistakes since the length have to match the size of the corresponding dimension.
What do you think?
In general I also like specifying x better than dx. I looked at np.gradient two days ago to see if it does what I need. But it was missing the unequal spacing, and it is missing the second derivative. more general question: I found the dimension story with dx, dy, dz, ... confusing. Isn't this just applying gradient with respect to each axis separately? (aside: If I read your link in the PR http://websrv.cs.umt.edu/isis/index.php/Finite_differencing:_Introduction correctly, then I might have just implemented a rescaled version of second (kth) order differencing with unequal spaced x using sparse linear algebra. I didn't see anything premade in numpy or scipy for it.) Josef
Thanks
Alessandro
-- -------------------------------------------------------------------------- NOTICE: Dlgs 196/2003 this e-mail and any attachments thereto may contain confidential information and are intended for the sole use of the recipient(s) named above. If you are not the intended recipient of this message you are hereby notified that any dissemination or copying of this message is strictly prohibited. If you have received this e-mail in error, please notify the sender either by telephone or by e-mail and delete the material from any computer. Thank you. --------------------------------------------------------------------------
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org https://mail.scipy.org/mailman/listinfo/scipy-dev

2017-01-20 14:44 GMT+01:00 <josef.pktd@gmail.com>:
On Fri, Jan 20, 2017 at 5:21 AM, alebarde@gmail.com <alebarde@gmail.com> wrote:
Hi all,
current version of numpy gradient supports only data with uniform spacing. I have implemented a proposed enhancement for the np.gradient function that allows to compute the gradient on non uniform grids. (PR: https://github.com/numpy/numpy/pull/8446) Since it may be of interest also for the scipy user/developers and it may be useful also for scipy.diff <https://github.com/scipy/scipy/wiki/Proposal:-add-finite-difference-numerica...> I am posting here too.
The proposed implementation has a behaviour/signature is similar to that of Matlab/Octave. As argument it can take: 1. A single scalar to specify a sample distance for all dimensions. 2. N scalars to specify a constant sample distance for each dimension. i.e. `dx`, `dy`, `dz`, ... 3. N arrays to specify the coordinates of the values along each dimension of F. The length of the array must match the size of the corresponding dimension 4. Any combination of N scalars/arrays with the meaning of 2. and 3.
e.g., you can do the following:
>>> f = np.array([[1, 2, 6], [3, 4, 5]], dtype=np.float) >>> dx = 2. >>> y = [1., 1.5, 3.5] >>> np.gradient(f, dx, y) [array([[ 1. , 1. , -0.5], [ 1. , 1. , -0.5]]), array([[ 2. , 2. , 2. ], [ 2. , 1.7, 0.5]])]
It should not break any existing code since as of 1.12 only scalars or list of scalars are allowed. A possible alternative API could be pass arrays of sampling steps instead of the coordinates. On the one hand, this would have the advantage of having "differences" both in the scalar case and in the array case. On the other hand, if you are dealing with non uniformly-spaced data (e.g, data is mapped on a grid or it is a time-series), in most cases you already have the coordinates/timestamps. Therefore, in the case of difference as argument, you would almost always have a call np.diff before np.gradient.
In the end, I would rather prefer the coordinates option since IMHO it is more handy, I don't think that would be too much "surprising" and it is what Matlab already does. Also, it could not easily lead to "silly" mistakes since the length have to match the size of the corresponding dimension.
What do you think?
In general I also like specifying x better than dx.
I looked at np.gradient two days ago to see if it does what I need. But it was missing the unequal spacing, and it is missing the second derivative.
Yes, and I believe it is quite a useful feature. that's why I have implemented it. On the other hand I don't think that `gradient` should also be able to compute the second or kth derivative. In that case a more general function called `differentiate` would probably be better.
more general question: I found the dimension story with dx, dy, dz, ... confusing. Isn't this just applying gradient with respect to each axis separately?
Yes it is just a way to pass the sampling step in each direction (and actually they are not keywords). I have kept the original documentation on this but it is probably better to change into h1,h2,h3.
(aside: If I read your link in the PR http://websrv.cs.umt.edu/isis/ index.php/Finite_differencing:_Introduction correctly, then I might have just implemented a rescaled version of second (kth) order differencing with unequal spaced x using sparse linear algebra. I didn't see anything premade in numpy or scipy for it.)
what do you mean with "rescaled version"? and you meant second order or second derivative? in the first case I think that would be equivalent to what my PR does. In the latter, I confirm that there is currently no implementation of a second derivative.
Josef
Thanks
Alessandro
-- ------------------------------------------------------------ -------------- NOTICE: Dlgs 196/2003 this e-mail and any attachments thereto may contain confidential information and are intended for the sole use of the recipient(s) named above. If you are not the intended recipient of this message you are hereby notified that any dissemination or copying of this message is strictly prohibited. If you have received this e-mail in error, please notify the sender either by telephone or by e-mail and delete the material from any computer. Thank you. ------------------------------------------------------------ --------------
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org https://mail.scipy.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org https://mail.scipy.org/mailman/listinfo/scipy-dev
-- -------------------------------------------------------------------------- NOTICE: Dlgs 196/2003 this e-mail and any attachments thereto may contain confidential information and are intended for the sole use of the recipient(s) named above. If you are not the intended recipient of this message you are hereby notified that any dissemination or copying of this message is strictly prohibited. If you have received this e-mail in error, please notify the sender either by telephone or by e-mail and delete the material from any computer. Thank you. --------------------------------------------------------------------------

On Fri, Jan 20, 2017 at 9:09 AM, alebarde@gmail.com <alebarde@gmail.com> wrote:
2017-01-20 14:44 GMT+01:00 <josef.pktd@gmail.com>:
On Fri, Jan 20, 2017 at 5:21 AM, alebarde@gmail.com <alebarde@gmail.com> wrote:
Hi all,
current version of numpy gradient supports only data with uniform spacing. I have implemented a proposed enhancement for the np.gradient function that allows to compute the gradient on non uniform grids. (PR: https://github.com/numpy/numpy/pull/8446) Since it may be of interest also for the scipy user/developers and it may be useful also for scipy.diff <https://github.com/scipy/scipy/wiki/Proposal:-add-finite-difference-numerica...> I am posting here too.
The proposed implementation has a behaviour/signature is similar to that of Matlab/Octave. As argument it can take: 1. A single scalar to specify a sample distance for all dimensions. 2. N scalars to specify a constant sample distance for each dimension. i.e. `dx`, `dy`, `dz`, ... 3. N arrays to specify the coordinates of the values along each dimension of F. The length of the array must match the size of the corresponding dimension 4. Any combination of N scalars/arrays with the meaning of 2. and 3.
e.g., you can do the following:
>>> f = np.array([[1, 2, 6], [3, 4, 5]], dtype=np.float) >>> dx = 2. >>> y = [1., 1.5, 3.5] >>> np.gradient(f, dx, y) [array([[ 1. , 1. , -0.5], [ 1. , 1. , -0.5]]), array([[ 2. , 2. , 2. ], [ 2. , 1.7, 0.5]])]
It should not break any existing code since as of 1.12 only scalars or list of scalars are allowed. A possible alternative API could be pass arrays of sampling steps instead of the coordinates. On the one hand, this would have the advantage of having "differences" both in the scalar case and in the array case. On the other hand, if you are dealing with non uniformly-spaced data (e.g, data is mapped on a grid or it is a time-series), in most cases you already have the coordinates/timestamps. Therefore, in the case of difference as argument, you would almost always have a call np.diff before np.gradient.
In the end, I would rather prefer the coordinates option since IMHO it is more handy, I don't think that would be too much "surprising" and it is what Matlab already does. Also, it could not easily lead to "silly" mistakes since the length have to match the size of the corresponding dimension.
What do you think?
In general I also like specifying x better than dx.
I looked at np.gradient two days ago to see if it does what I need. But it was missing the unequal spacing, and it is missing the second derivative.
Yes, and I believe it is quite a useful feature. that's why I have implemented it. On the other hand I don't think that `gradient` should also be able to compute the second or kth derivative. In that case a more general function called `differentiate` would probably be better.
more general question: I found the dimension story with dx, dy, dz, ... confusing. Isn't this just applying gradient with respect to each axis separately?
Yes it is just a way to pass the sampling step in each direction (and actually they are not keywords). I have kept the original documentation on this but it is probably better to change into h1,h2,h3.
(aside: If I read your link in the PR http://websrv.cs.umt.edu/isis/ index.php/Finite_differencing:_Introduction correctly, then I might have just implemented a rescaled version of second (kth) order differencing with unequal spaced x using sparse linear algebra. I didn't see anything premade in numpy or scipy for it.)
what do you mean with "rescaled version"? and you meant second order or second derivative? in the first case I think that would be equivalent to what my PR does. In the latter, I confirm that there is currently no implementation of a second derivative.
I think it's k-th derivative. Rescaled means that I don't actually compute the derivative, I want to get a nonparametric estimate of the residual which assumes, I think, that the underlying function for the non-noise version has approximately constant k-th derivative (e.g. k-th order polynomial, or (k+1)-th). The sum of squared window coefficients is normalized to 1. I didn't try to understand the math, but the kernels or windows in the equal spaced case are
_diff_kernel(0) array([ 1.]) _diff_kernel(1) array([ 0.70710678, -0.70710678]) _diff_kernel(2) array([ 0.40824829, -0.81649658, 0.40824829]) _diff_kernel(2) / _diff_kernel(2)[1] array([-0.5, 1. , -0.5])
_diff_kernel(3) array([ 0.2236068 , -0.67082039, 0.67082039, -0.2236068 ]) _diff_kernel(4) array([ 0.11952286, -0.47809144, 0.71713717, -0.47809144, 0.11952286]) (_diff_kernel(4)**2).sum() 0.99999999999999989
Josef
Josef
Thanks
Alessandro
-- ------------------------------------------------------------ -------------- NOTICE: Dlgs 196/2003 this e-mail and any attachments thereto may contain confidential information and are intended for the sole use of the recipient(s) named above. If you are not the intended recipient of this message you are hereby notified that any dissemination or copying of this message is strictly prohibited. If you have received this e-mail in error, please notify the sender either by telephone or by e-mail and delete the material from any computer. Thank you. ------------------------------------------------------------ --------------
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org https://mail.scipy.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org https://mail.scipy.org/mailman/listinfo/scipy-dev
-- -------------------------------------------------------------------------- NOTICE: Dlgs 196/2003 this e-mail and any attachments thereto may contain confidential information and are intended for the sole use of the recipient(s) named above. If you are not the intended recipient of this message you are hereby notified that any dissemination or copying of this message is strictly prohibited. If you have received this e-mail in error, please notify the sender either by telephone or by e-mail and delete the material from any computer. Thank you. --------------------------------------------------------------------------
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org https://mail.scipy.org/mailman/listinfo/scipy-dev

2017-01-20 16:20 GMT+01:00 <josef.pktd@gmail.com>:
On Fri, Jan 20, 2017 at 9:09 AM, alebarde@gmail.com <alebarde@gmail.com> wrote:
2017-01-20 14:44 GMT+01:00 <josef.pktd@gmail.com>:
On Fri, Jan 20, 2017 at 5:21 AM, alebarde@gmail.com <alebarde@gmail.com> wrote:
Hi all,
current version of numpy gradient supports only data with uniform spacing. I have implemented a proposed enhancement for the np.gradient function that allows to compute the gradient on non uniform grids. (PR: https://github.com/numpy/numpy/pull/8446) Since it may be of interest also for the scipy user/developers and it may be useful also for scipy.diff <https://github.com/scipy/scipy/wiki/Proposal:-add-finite-difference-numerica...> I am posting here too.
The proposed implementation has a behaviour/signature is similar to that of Matlab/Octave. As argument it can take: 1. A single scalar to specify a sample distance for all dimensions. 2. N scalars to specify a constant sample distance for each dimension. i.e. `dx`, `dy`, `dz`, ... 3. N arrays to specify the coordinates of the values along each dimension of F. The length of the array must match the size of the corresponding dimension 4. Any combination of N scalars/arrays with the meaning of 2. and 3.
e.g., you can do the following:
>>> f = np.array([[1, 2, 6], [3, 4, 5]], dtype=np.float) >>> dx = 2. >>> y = [1., 1.5, 3.5] >>> np.gradient(f, dx, y) [array([[ 1. , 1. , -0.5], [ 1. , 1. , -0.5]]), array([[ 2. , 2. , 2. ], [ 2. , 1.7, 0.5]])]
It should not break any existing code since as of 1.12 only scalars or list of scalars are allowed. A possible alternative API could be pass arrays of sampling steps instead of the coordinates. On the one hand, this would have the advantage of having "differences" both in the scalar case and in the array case. On the other hand, if you are dealing with non uniformly-spaced data (e.g, data is mapped on a grid or it is a time-series), in most cases you already have the coordinates/timestamps. Therefore, in the case of difference as argument, you would almost always have a call np.diff before np.gradient.
In the end, I would rather prefer the coordinates option since IMHO it is more handy, I don't think that would be too much "surprising" and it is what Matlab already does. Also, it could not easily lead to "silly" mistakes since the length have to match the size of the corresponding dimension.
What do you think?
In general I also like specifying x better than dx.
I looked at np.gradient two days ago to see if it does what I need. But it was missing the unequal spacing, and it is missing the second derivative.
Yes, and I believe it is quite a useful feature. that's why I have implemented it. On the other hand I don't think that `gradient` should also be able to compute the second or kth derivative. In that case a more general function called `differentiate` would probably be better.
more general question: I found the dimension story with dx, dy, dz, ... confusing. Isn't this just applying gradient with respect to each axis separately?
Yes it is just a way to pass the sampling step in each direction (and actually they are not keywords). I have kept the original documentation on this but it is probably better to change into h1,h2,h3.
(aside: If I read your link in the PR http://websrv.cs.umt.edu/isis/ index.php/Finite_differencing:_Introduction correctly, then I might have just implemented a rescaled version of second (kth) order differencing with unequal spaced x using sparse linear algebra. I didn't see anything premade in numpy or scipy for it.)
what do you mean with "rescaled version"? and you meant second order or second derivative? in the first case I think that would be equivalent to what my PR does. In the latter, I confirm that there is currently no implementation of a second derivative.
I think it's k-th derivative. Rescaled means that I don't actually compute the derivative, I want to get a nonparametric estimate of the residual which assumes, I think, that the underlying function for the non-noise version has approximately constant k-th derivative (e.g. k-th order polynomial, or (k+1)-th). The sum of squared window coefficients is normalized to 1. I didn't try to understand the math, but the kernels or windows in the equal spaced case are
_diff_kernel(0) array([ 1.]) _diff_kernel(1) array([ 0.70710678, -0.70710678]) _diff_kernel(2) array([ 0.40824829, -0.81649658, 0.40824829]) _diff_kernel(2) / _diff_kernel(2)[1] array([-0.5, 1. , -0.5])
_diff_kernel(3) array([ 0.2236068 , -0.67082039, 0.67082039, -0.2236068 ]) _diff_kernel(4) array([ 0.11952286, -0.47809144, 0.71713717, -0.47809144, 0.11952286]) (_diff_kernel(4)**2).sum() 0.99999999999999989
Yes, if I am not wrong you are getting the coefficient of the 1st order
k-th derivative of the forward/backard approximation ( https://en.wikipedia.org/wiki/Finite_difference_coefficient#Forward_and_back... ) Fornberg ( http://www.ams.org/journals/mcom/1988-51-184/S0025-5718-1988-0935077-0/S0025...) proposed an algorithm to get the coefficients for a general N-order K-th derivative but I don't know how efficient it is. Also, I am not sure if it applies to your case but mind that when going to higher order strange effects happens unless "nodes" are properly chosen see e.g., figures last page of ( https://www.rsmas.miami.edu/users/miskandarani/Courses/MSC321/lectfiniteDiff...) -- -------------------------------------------------------------------------- NOTICE: Dlgs 196/2003 this e-mail and any attachments thereto may contain confidential information and are intended for the sole use of the recipient(s) named above. If you are not the intended recipient of this message you are hereby notified that any dissemination or copying of this message is strictly prohibited. If you have received this e-mail in error, please notify the sender either by telephone or by e-mail and delete the material from any computer. Thank you. --------------------------------------------------------------------------

On Mon, Jan 23, 2017 at 1:56 PM, alebarde@gmail.com <alebarde@gmail.com> wrote:
2017-01-20 16:20 GMT+01:00 <josef.pktd@gmail.com>:
On Fri, Jan 20, 2017 at 9:09 AM, alebarde@gmail.com <alebarde@gmail.com> wrote:
2017-01-20 14:44 GMT+01:00 <josef.pktd@gmail.com>:
On Fri, Jan 20, 2017 at 5:21 AM, alebarde@gmail.com <alebarde@gmail.com
wrote:
Hi all,
current version of numpy gradient supports only data with uniform spacing. I have implemented a proposed enhancement for the np.gradient function that allows to compute the gradient on non uniform grids. (PR: https://github.com/numpy/numpy/pull/8446) Since it may be of interest also for the scipy user/developers and it may be useful also for scipy.diff <https://github.com/scipy/scipy/wiki/Proposal:-add-finite-difference-numerica...> I am posting here too.
The proposed implementation has a behaviour/signature is similar to that of Matlab/Octave. As argument it can take: 1. A single scalar to specify a sample distance for all dimensions. 2. N scalars to specify a constant sample distance for each dimension. i.e. `dx`, `dy`, `dz`, ... 3. N arrays to specify the coordinates of the values along each dimension of F. The length of the array must match the size of the corresponding dimension 4. Any combination of N scalars/arrays with the meaning of 2. and 3.
e.g., you can do the following:
>>> f = np.array([[1, 2, 6], [3, 4, 5]], dtype=np.float) >>> dx = 2. >>> y = [1., 1.5, 3.5] >>> np.gradient(f, dx, y) [array([[ 1. , 1. , -0.5], [ 1. , 1. , -0.5]]), array([[ 2. , 2. , 2. ], [ 2. , 1.7, 0.5]])]
It should not break any existing code since as of 1.12 only scalars or list of scalars are allowed. A possible alternative API could be pass arrays of sampling steps instead of the coordinates. On the one hand, this would have the advantage of having "differences" both in the scalar case and in the array case. On the other hand, if you are dealing with non uniformly-spaced data (e.g, data is mapped on a grid or it is a time-series), in most cases you already have the coordinates/timestamps. Therefore, in the case of difference as argument, you would almost always have a call np.diff before np.gradient.
In the end, I would rather prefer the coordinates option since IMHO it is more handy, I don't think that would be too much "surprising" and it is what Matlab already does. Also, it could not easily lead to "silly" mistakes since the length have to match the size of the corresponding dimension.
What do you think?
In general I also like specifying x better than dx.
I looked at np.gradient two days ago to see if it does what I need. But it was missing the unequal spacing, and it is missing the second derivative.
Yes, and I believe it is quite a useful feature. that's why I have implemented it. On the other hand I don't think that `gradient` should also be able to compute the second or kth derivative. In that case a more general function called `differentiate` would probably be better.
more general question: I found the dimension story with dx, dy, dz, ... confusing. Isn't this just applying gradient with respect to each axis separately?
Yes it is just a way to pass the sampling step in each direction (and actually they are not keywords). I have kept the original documentation on this but it is probably better to change into h1,h2,h3.
(aside: If I read your link in the PR http://websrv.cs.umt.edu/isis/ index.php/Finite_differencing:_Introduction correctly, then I might have just implemented a rescaled version of second (kth) order differencing with unequal spaced x using sparse linear algebra. I didn't see anything premade in numpy or scipy for it.)
what do you mean with "rescaled version"? and you meant second order or second derivative? in the first case I think that would be equivalent to what my PR does. In the latter, I confirm that there is currently no implementation of a second derivative.
I think it's k-th derivative. Rescaled means that I don't actually compute the derivative, I want to get a nonparametric estimate of the residual which assumes, I think, that the underlying function for the non-noise version has approximately constant k-th derivative (e.g. k-th order polynomial, or (k+1)-th). The sum of squared window coefficients is normalized to 1. I didn't try to understand the math, but the kernels or windows in the equal spaced case are
_diff_kernel(0) array([ 1.]) _diff_kernel(1) array([ 0.70710678, -0.70710678]) _diff_kernel(2) array([ 0.40824829, -0.81649658, 0.40824829]) _diff_kernel(2) / _diff_kernel(2)[1] array([-0.5, 1. , -0.5])
_diff_kernel(3) array([ 0.2236068 , -0.67082039, 0.67082039, -0.2236068 ]) _diff_kernel(4) array([ 0.11952286, -0.47809144, 0.71713717, -0.47809144, 0.11952286]) (_diff_kernel(4)**2).sum() 0.99999999999999989
Yes, if I am not wrong you are getting the coefficient of the 1st order
k-th derivative of the forward/backard approximation ( https://en.wikipedia.org/wiki/Finite_difference_coefficient#Forward_and_ backward_finite_difference) Fornberg (http://www.ams.org/journals/mcom/1988-51-184/S0025-5718- 1988-0935077-0/S0025-5718-1988-0935077-0.pdf) proposed an algorithm to get the coefficients for a general N-order K-th derivative but I don't know how efficient it is.
Thanks for the links. My underlying story ended up being close to savitzky-golay e.g. central differences with increasing accuracy
signal.savgol_coeffs(3, 2, deriv=1) array([ 5.00000000e-01, -4.53646149e-17, -5.00000000e-01])
signal.savgol_coeffs(5, 4, deriv=1) array([ -8.33333333e-02, 6.66666667e-01, 1.31151738e-15, -6.66666667e-01, 8.33333333e-02])
In the simple differencing appoach window length is tied to the number of derivatives/differencing. But with local polynomials like savitzky-golay the window length (and order of approximation) can be separated from the derivative order. This works with equal spaced points, but I gave up for now going beyond simple differencing for unequal spaced points, because I don't see a fast way to create the windows for general local polynomial regression. (In case anyone is interested: https://github.com/statsmodels/statsmodels/pull/3380 where it took me a while to figure out the relationship to savitzky-golay)
Also, I am not sure if it applies to your case but mind that when going to higher order strange effects happens unless "nodes" are properly chosen see e.g., figures last page of (https://www.rsmas.miami.edu/ users/miskandarani/Courses/MSC321/lectfiniteDifference.pdf)
In my experience this is more of a problem for interpolation than for lower order smoothing, but it's not ruled out. It should be even less of a problem with local, usually low order polynomials as in savitzky-golay. Josef
-- -------------------------------------------------------------------------- NOTICE: Dlgs 196/2003 this e-mail and any attachments thereto may contain confidential information and are intended for the sole use of the recipient(s) named above. If you are not the intended recipient of this message you are hereby notified that any dissemination or copying of this message is strictly prohibited. If you have received this e-mail in error, please notify the sender either by telephone or by e-mail and delete the material from any computer. Thank you. --------------------------------------------------------------------------
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org https://mail.scipy.org/mailman/listinfo/scipy-dev
participants (2)
-
alebarde@gmail.com
-
josef.pktd@gmail.com