
Here is a python code snippet:
# python vers. 3.6.5 (v3.6.5:f59c0932b4, Mar 28 2018, 17:00:18) [MSC v.1900 64 bit (AMD64)] import numpy as np # numpy vers. 1.14.3 #import matplotlib.pyplot as plt
N = 21 amp = 10 t = np.linspace(0.0,N-1,N) arg = 2.0*np.pi/(N-1)
y = amp*np.sin(arg*t) print('y:\n',y) print('mean(y): ',np.mean(y))
#plt.plot(t,y) #plt.show()
ypad = np.pad(y, (3,2),'mean') print('ypad:\n',ypad)
When I execute this the outputs are:
y: [ 0.00000000e+00 3.09016994e+00 5.87785252e+00 8.09016994e+00 9.51056516e+00 1.00000000e+01 9.51056516e+00 8.09016994e+00 5.87785252e+00 3.09016994e+00 1.22464680e-15 -3.09016994e+00 -5.87785252e+00 -8.09016994e+00 -9.51056516e+00 -1.00000000e+01 -9.51056516e+00 -8.09016994e+00 -5.87785252e+00 -3.09016994e+00 -2.44929360e-15] mean(y): -1.3778013372117948e-16 ypad: [-1.37780134e-16 -1.37780134e-16 -1.37780134e-16 0.00000000e+00 3.09016994e+00 5.87785252e+00 8.09016994e+00 9.51056516e+00 1.00000000e+01 9.51056516e+00 8.09016994e+00 5.87785252e+00 3.09016994e+00 1.22464680e-15 -3.09016994e+00 -5.87785252e+00 -8.09016994e+00 -9.51056516e+00 -1.00000000e+01 -9.51056516e+00 -8.09016994e+00 -5.87785252e+00 -3.09016994e+00 -2.44929360e-15 -7.40148683e-17 -7.40148683e-17]
The left pad is correct, but the right pad is different and not the mean of y) --- why?

mean(y): -1.3778013372117948e-16 ypad: [-1.37780134e-16 -1.37780134e-16 -1.37780134e-16 0.00000000e+00 3.09016994e+00 5.87785252e+00 8.09016994e+00 9.51056516e+00 1.00000000e+01 9.51056516e+00 8.09016994e+00 5.87785252e+00 3.09016994e+00 1.22464680e-15 -3.09016994e+00 -5.87785252e+00 -8.09016994e+00 -9.51056516e+00 -1.00000000e+01 -9.51056516e+00 -8.09016994e+00 -5.87785252e+00 -3.09016994e+00 -2.44929360e-15 -7.40148683e-17 -7.40148683e-17]
The left pad is correct, but the right pad is different and not the mean of y) --- why?
This is how np.pad computes mean padding: https://github.com/numpy/numpy/blob/01541f2822d0d4b37b96f6b42e35963b132f1947... elif mode == 'mean': for axis, ((pad_before, pad_after), (chunk_before, chunk_after)) \ in enumerate(zip(pad_width, kwargs['stat_length'])): newmat = _prepend_mean(newmat, pad_before, chunk_before, axis) newmat = _append_mean(newmat, pad_after, chunk_after, axis)
That is, first the mean is prepended, then appended, and in the latter step the updates (front-padded) array is used for computing the mean again. Note that with arbitrary precision this is fine, since appending n*`mean` to an array with mean `mean` should preserve the mean. But with doubles you can get errors on the order of the machine epsilon, which is what happens here:
In [16]: ypad[3:-2].mean() Out[16]: -1.1663302849022412e-16
In [17]: ypad[:-2].mean() Out[17]: -3.700743415417188e-17
So the prepended values are `y.mean()`, but the appended values are `ypad[:-2].mean()` which includes the near-zero padding values. I don't think this error should be a problem in practice, but I agree it's surprising.
András

PS. my exact numbers are different from yours (probably a multithreaded thing?), but `ypad[:-2].mean()` agrees with the last 3 elements in `ypad` in my case and I'm sure this is true for yours too.
On Sun, Apr 29, 2018 at 11:36 PM, Andras Deak deak.andris@gmail.com wrote:
mean(y): -1.3778013372117948e-16 ypad: [-1.37780134e-16 -1.37780134e-16 -1.37780134e-16 0.00000000e+00 3.09016994e+00 5.87785252e+00 8.09016994e+00 9.51056516e+00 1.00000000e+01 9.51056516e+00 8.09016994e+00 5.87785252e+00 3.09016994e+00 1.22464680e-15 -3.09016994e+00 -5.87785252e+00 -8.09016994e+00 -9.51056516e+00 -1.00000000e+01 -9.51056516e+00 -8.09016994e+00 -5.87785252e+00 -3.09016994e+00 -2.44929360e-15 -7.40148683e-17 -7.40148683e-17]
The left pad is correct, but the right pad is different and not the mean of y) --- why?
This is how np.pad computes mean padding: https://github.com/numpy/numpy/blob/01541f2822d0d4b37b96f6b42e35963b132f1947... elif mode == 'mean': for axis, ((pad_before, pad_after), (chunk_before, chunk_after)) \ in enumerate(zip(pad_width, kwargs['stat_length'])): newmat = _prepend_mean(newmat, pad_before, chunk_before, axis) newmat = _append_mean(newmat, pad_after, chunk_after, axis)
That is, first the mean is prepended, then appended, and in the latter step the updates (front-padded) array is used for computing the mean again. Note that with arbitrary precision this is fine, since appending n*`mean` to an array with mean `mean` should preserve the mean. But with doubles you can get errors on the order of the machine epsilon, which is what happens here:
In [16]: ypad[3:-2].mean() Out[16]: -1.1663302849022412e-16
In [17]: ypad[:-2].mean() Out[17]: -3.700743415417188e-17
So the prepended values are `y.mean()`, but the appended values are `ypad[:-2].mean()` which includes the near-zero padding values. I don't think this error should be a problem in practice, but I agree it's surprising.
András

I would consider this a bug, and think we should fix this.
On Sun, 29 Apr 2018 at 13:48 Virgil Stokes vs@it.uu.se wrote:
Here is a python code snippet:
# python vers. 3.6.5 (v3.6.5:f59c0932b4, Mar 28 2018, 17:00:18) [MSC v.1900 64 bit (AMD64)] import numpy as np # numpy vers. 1.14.3 #import matplotlib.pyplot as plt
N = 21 amp = 10 t = np.linspace(0.0,N-1,N) arg = 2.0*np.pi/(N-1)
y = amp*np.sin(arg*t) print('y:\n',y) print('mean(y): ',np.mean(y))
#plt.plot(t,y) #plt.show()
ypad = np.pad(y, (3,2),'mean') print('ypad:\n',ypad)
When I execute this the outputs are:
y: [ 0.00000000e+00 3.09016994e+00 5.87785252e+00 8.09016994e+00 9.51056516e+00 1.00000000e+01 9.51056516e+00 8.09016994e+00 5.87785252e+00 3.09016994e+00 1.22464680e-15 -3.09016994e+00 -5.87785252e+00 -8.09016994e+00 -9.51056516e+00 -1.00000000e+01 -9.51056516e+00 -8.09016994e+00 -5.87785252e+00 -3.09016994e+00 -2.44929360e-15] mean(y): -1.3778013372117948e-16 ypad: [-1.37780134e-16 -1.37780134e-16 -1.37780134e-16 0.00000000e+00 3.09016994e+00 5.87785252e+00 8.09016994e+00 9.51056516e+00 1.00000000e+01 9.51056516e+00 8.09016994e+00 5.87785252e+00 3.09016994e+00 1.22464680e-15 -3.09016994e+00 -5.87785252e+00 -8.09016994e+00 -9.51056516e+00 -1.00000000e+01 -9.51056516e+00 -8.09016994e+00 -5.87785252e+00 -3.09016994e+00 -2.44929360e-15 -7.40148683e-17 -7.40148683e-17]
The left pad is correct, but the right pad is different and not the mean of y) --- why?
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

On Sun, Apr 29, 2018 at 11:39 PM, Eric Wieser wieser.eric+numpy@gmail.com wrote:
I would consider this a bug, and think we should fix this.
In that case `mode='median'` should probably fixed as well.
participants (3)
-
Andras Deak
-
Eric Wieser
-
Virgil Stokes