Enhancement: np.convolve(..., mode="normalized")

Convolution is often used for smoothing noisy data; a typical use will keep the 'same' length of data and may look like this:
convol = 2**-np.linspace(-2,2,100)**2; y2 = np.convolve(y,convol/np.sum(convol), mode='same') ## simple smoothing ax.plot(x, y2, label="simple smoothing", color='g')
However, when the smoothed curve has some nonzero background value at its edges, this convolution mode internally pads it with zeros, resulting in the curve looking like moustache of Frank Zappa. I made an example plot illustrating this here: https://www.fzu.cz/~dominecf/misc/numpy_smoothing_example.png. Such a result, i.e. the green curve, is not publication ready! 1) One way around is to np.pad(..., mode='edge'), then convolve & properly truncate the curve back to its original length. This is not a correct approach, however, as it makes the curve edges smooth, but their actual shape becomes very sensitive to the pointwise noise. Moreover, it artificially removes the curve's slope at its edges. 2) Another way around is to generate an auxiliary "Zappa's moustache" by applying the same convolution to a fresh array of np.ones_like(y). Then one can normalize the convolved curve by this auxiliary function. This has only one downside of keeping the curve more noisy at its edges, which however appears more scientifically honest to me - at the dataset edges one simply has less means to filter out noise.
convol = 2**-np.linspace(-2,2,100)**2; norm = np.convolve(np.ones_like(y),convol/np.sum(convol), mode='same') y2 = np.convolve(y,convol/np.sum(convol), mode='same')/norm ## simple smoothing ax.plot(x, y2, label="approach 2", color='k')
In my experimental work, I am missing this behaviour of np.convolve in a single function. I suggest this option should be accessible numpy under the mode="normalized" option. (Actually I believe this could have been set as default, but this would break compatibility.) Thanks for consideration, Filip

I wonder whether you are looking for the solution in the right direction. Is there theory for the shape of the curve? In that case it might be better to see the problem as a fitting problem. Other than that I think option 2 is too ad hoc for scientific work. I would opt for simply not showing the smoothed curve where it is not available. The convol function you specified here is a very narrow Gaussian, is that the function you actually used? Note: The code you provided can not be executed

Hi Filip, Am Mi., 22. Nov. 2023 um 14:24 Uhr schrieb <filip.dominec@gmail.com>:
Convolution is often used for smoothing noisy data; a typical use will keep the 'same' length of data and may look like this:
convol = 2**-np.linspace(-2,2,100)**2; y2 = np.convolve(y,convol/np.sum(convol), mode='same') ## simple smoothing ax.plot(x, y2, label="simple smoothing", color='g')
First, maybe it might be useful to calculate the convolution "father" `convol` with the exponent of the normal distribution in its full glory; s.t. its standard deviation is known. Currently it's just "some" normal distribution which goes down to 0.0625 as its edges. What you might want to achieve is to use a *different kernel* at the edges. It seems you're trying to use, at the edges, a kernel version normalised using the positions which overlap with the domain of `y`. Before diving into this: It possibly would be safest, to just discard the positions in the support of `y` which suffer from the boundary effect. In your example, it would mean to cut away say 20 positions at each side. The choice can be made systematically by looking at the standard deviation of the kernel. This would produce reliable results without much headache and subsidiary conditions ... and it would make interpretation of the results much easier as it wouldn't be required to keep that extra maths in mind.
convol = 2**-np.linspace(-2,2,100)**2; norm = np.convolve(np.ones_like(y),convol/np.sum(convol), mode='same') y2 = np.convolve(y,convol/np.sum(convol), mode='same')/norm ## simple smoothing ax.plot(x, y2, label="approach 2", color='k')
`norm` holds the sums of the "truncated" Gaussians. Dividing by it should mean the same as using the truncated kernels as the edges, which are normalised *on their truncated domain*. So it should implement what I described above. Maybe this can be checked by applying the method to some artificial test function, most easily to a constant input function to be convolved. It should result in completely constant values also at the edges. I would be interested if this works. Looking at your "real world" result I am not entirely sure if am not mistaken at some point here.
In my experimental work, I am missing this behaviour of np.convolve in a single function. I suggest this option should be accessible numpy under the mode="normalized" option. (Actually I believe this could have been set as default, but this would break compatibility.)
I deem your possible solution as too special for this. It can be implemented by a few lines in "numpy user space", if needed. Best, Friedrich
participants (3)
-
filip.dominec@gmail.com
-
Friedrich Romstedt
-
Ronald van Elburg