how to name "contagious" keyword in np.ma.convolve

Hi all, Eric Wieser has a PR which defines new functions np.ma.correlate and np.ma.convolve: https://github.com/numpy/numpy/pull/7922 We're deciding how to name the keyword arg which determines whether masked elements are "propagated" in the convolution sums. Currently we are leaning towards calling it "contagious", with default of True: def convolve(a, v, mode='full', contagious=True): Any thoughts? Cheers, Allan

On Fr, 2016-10-14 at 13:00 -0400, Allan Haldane wrote:
Sounds a bit overly odd to me to be honest. Just brain storming, you could think/name it the other way around maybe? Should the masked values be considered as zero/ignored? - Sebastian

I think the possibilities that have been mentioned so far (here or in the PR) are: contagious contagious_mask propagate propagate_mask propagated `propogate_mask=False` seemed to imply that the mask would never be set, so Eric also suggested propagate_mask='any' or propagate_mask='all' I would be happy with 'propagated=False' as the name/default. As Eric pointed out, most MaskedArray functions like sum implicitly don't propagate, currently, so maybe we should do likewise here. Allan On 10/14/2016 01:44 PM, Benjamin Root wrote:

+1 for propagate_mask. That is the only proposal that immediately makes sense to me. "contagious" may be cute but I think approximately 0% of users would guess its purpose on first use. Can you elaborate on what happens with the masks exactly? I didn't quite get why propagate_mask=False was unintuitive. My expectation is that any mask present in the input will not be set in the output, but the mask will be "respected" by the function. On 15 Oct. 2016, 5:23 AM +1100, Allan Haldane <allanhaldane@gmail.com>, wrote:

On 10/14/2016 07:49 PM, Juan Nunez-Iglesias wrote:
Here's an illustration of how the PR currently works with convolve, using the name "propagate_mask": >>> m = np.ma.masked >>> a = np.ma.array([1,1,1,m,1,1,1,m,m,m,1,1,1]) >>> b = np.ma.array([1,1,1]) >>> >>> print np.ma.convolve(a, b, propagate_mask=True) [1 2 3 -- -- -- 3 -- -- -- -- -- 3 2 1] >>> print np.ma.convolve(a, b, propagate_mask=False) [1 2 3 2 2 2 3 2 1 -- 1 2 3 2 1] Allan

Given this behaviour, I'm actually more concerned about the logic ma.convolve uses in the propagate_mask=False case. It appears that the masked values are essentially replaced by zero. Is my interpretation correct and if so does this make sense? When I have similar situations, I usually interpolate between the valid values. I assume there are a lot of use cases for convolutions but I have difficulties imagining that ignoring a missing value and, for the purpose of the computation, treating it as zero is useful in many of them. Hanno

Hi, Le 16/10/2016 à 11:52, Hanno Klemm a écrit :
Also, coming back to the initial question, I feel that it is necessary that the name "mask" (or "na" or similar) appears in the parameter name. Otherwise, people will wonder : "what on earth is contagious/being propagated...." just thinking of yet another keyword name : ignore_masked (or drop_masked) If I remember well, in R it is dropna. It would be nice if the boolean switch followed the same logic. Now of course the convolution function is more general than just autocorrelation... best, Pierre

On Mon, Oct 17, 2016 at 1:01 PM, Pierre Haessig <pierre.haessig@crans.org> wrote:
I think "drop" or "ignore" is too generic, for correlation it would be for example ignore pairs versus ignore cases. To me propagate sounds ok to me, but something with `valid` might be more explicit for convolution or `correlate`, however `valid` also refers to the end points, so maybe valid_na or valid_masked=True Josef

On Tue, Oct 18, 2016 at 1:30 PM, <josef.pktd@gmail.com> wrote:
aside to the aside: statsmodels was just catching up in this The original for masked array acf including correct counting of "valid" terms is https://github.com/pierregm/scikits.timeseries/blob/master/scikits/timeserie... (which I looked at way before statsmodels had any acf) Josef

On 10/17/2016 01:01 PM, Pierre Haessig wrote:
There is an old unimplemented NEP which uses similar language, like "ignorena", and np.NA. http://docs.scipy.org/doc/numpy/neps/missing-data.html But right now that isn't part of numpy, so I think it would be confusing to use that terminology. Allan

On 10/17/2016 01:01 PM, Pierre Haessig wrote:
Based on feedback so far, I think "propagate_mask" sounds like the best word to use. Let's go with that. As for whether it should default to "True" or "False", the arguments I see are: * False, because that is the way most functions like `np.ma.sum` already work, as well as matlab and octave's similar "nanconv". * True, because its effects are more visible and might lead to less surprises. The "False" case seems like it is often not what the user intended. Eg, it affects the overall normalization of normalized kernels, and the choice of 0 seems arbitrary. If no one says anything, I'd probably go with True. Allan

On 10/16/2016 05:52 AM, Hanno Klemm wrote:
I think that's right. Its usefulness wasn't obvious to me either, but googling shows that in matlab people like the file "nanconv.m" which works this way, using nans similarly to how the mask is used here. Just as convolution functions often add zero-padding around an image, here the mask behavior would allow you to have different borders, eg [m,m,m,1,1,1,1,m,m,m,m] using my notation from before. Octave's "nanconv" does this too. I still agree that in most cases people should be handling the missing values more carefully (manually) if they are doing convolutions, but this default behaviour maybe seems reasonable to me. Allan

On Fr, 2016-10-14 at 13:00 -0400, Allan Haldane wrote:
Sounds a bit overly odd to me to be honest. Just brain storming, you could think/name it the other way around maybe? Should the masked values be considered as zero/ignored? - Sebastian

I think the possibilities that have been mentioned so far (here or in the PR) are: contagious contagious_mask propagate propagate_mask propagated `propogate_mask=False` seemed to imply that the mask would never be set, so Eric also suggested propagate_mask='any' or propagate_mask='all' I would be happy with 'propagated=False' as the name/default. As Eric pointed out, most MaskedArray functions like sum implicitly don't propagate, currently, so maybe we should do likewise here. Allan On 10/14/2016 01:44 PM, Benjamin Root wrote:

+1 for propagate_mask. That is the only proposal that immediately makes sense to me. "contagious" may be cute but I think approximately 0% of users would guess its purpose on first use. Can you elaborate on what happens with the masks exactly? I didn't quite get why propagate_mask=False was unintuitive. My expectation is that any mask present in the input will not be set in the output, but the mask will be "respected" by the function. On 15 Oct. 2016, 5:23 AM +1100, Allan Haldane <allanhaldane@gmail.com>, wrote:

On 10/14/2016 07:49 PM, Juan Nunez-Iglesias wrote:
Here's an illustration of how the PR currently works with convolve, using the name "propagate_mask": >>> m = np.ma.masked >>> a = np.ma.array([1,1,1,m,1,1,1,m,m,m,1,1,1]) >>> b = np.ma.array([1,1,1]) >>> >>> print np.ma.convolve(a, b, propagate_mask=True) [1 2 3 -- -- -- 3 -- -- -- -- -- 3 2 1] >>> print np.ma.convolve(a, b, propagate_mask=False) [1 2 3 2 2 2 3 2 1 -- 1 2 3 2 1] Allan

Given this behaviour, I'm actually more concerned about the logic ma.convolve uses in the propagate_mask=False case. It appears that the masked values are essentially replaced by zero. Is my interpretation correct and if so does this make sense? When I have similar situations, I usually interpolate between the valid values. I assume there are a lot of use cases for convolutions but I have difficulties imagining that ignoring a missing value and, for the purpose of the computation, treating it as zero is useful in many of them. Hanno

Hi, Le 16/10/2016 à 11:52, Hanno Klemm a écrit :
Also, coming back to the initial question, I feel that it is necessary that the name "mask" (or "na" or similar) appears in the parameter name. Otherwise, people will wonder : "what on earth is contagious/being propagated...." just thinking of yet another keyword name : ignore_masked (or drop_masked) If I remember well, in R it is dropna. It would be nice if the boolean switch followed the same logic. Now of course the convolution function is more general than just autocorrelation... best, Pierre

On Mon, Oct 17, 2016 at 1:01 PM, Pierre Haessig <pierre.haessig@crans.org> wrote:
I think "drop" or "ignore" is too generic, for correlation it would be for example ignore pairs versus ignore cases. To me propagate sounds ok to me, but something with `valid` might be more explicit for convolution or `correlate`, however `valid` also refers to the end points, so maybe valid_na or valid_masked=True Josef

On Tue, Oct 18, 2016 at 1:30 PM, <josef.pktd@gmail.com> wrote:
aside to the aside: statsmodels was just catching up in this The original for masked array acf including correct counting of "valid" terms is https://github.com/pierregm/scikits.timeseries/blob/master/scikits/timeserie... (which I looked at way before statsmodels had any acf) Josef

On 10/17/2016 01:01 PM, Pierre Haessig wrote:
There is an old unimplemented NEP which uses similar language, like "ignorena", and np.NA. http://docs.scipy.org/doc/numpy/neps/missing-data.html But right now that isn't part of numpy, so I think it would be confusing to use that terminology. Allan

On 10/17/2016 01:01 PM, Pierre Haessig wrote:
Based on feedback so far, I think "propagate_mask" sounds like the best word to use. Let's go with that. As for whether it should default to "True" or "False", the arguments I see are: * False, because that is the way most functions like `np.ma.sum` already work, as well as matlab and octave's similar "nanconv". * True, because its effects are more visible and might lead to less surprises. The "False" case seems like it is often not what the user intended. Eg, it affects the overall normalization of normalized kernels, and the choice of 0 seems arbitrary. If no one says anything, I'd probably go with True. Allan

On 10/16/2016 05:52 AM, Hanno Klemm wrote:
I think that's right. Its usefulness wasn't obvious to me either, but googling shows that in matlab people like the file "nanconv.m" which works this way, using nans similarly to how the mask is used here. Just as convolution functions often add zero-padding around an image, here the mask behavior would allow you to have different borders, eg [m,m,m,1,1,1,1,m,m,m,m] using my notation from before. Octave's "nanconv" does this too. I still agree that in most cases people should be handling the missing values more carefully (manually) if they are doing convolutions, but this default behaviour maybe seems reasonable to me. Allan
participants (8)
-
Allan Haldane
-
Benjamin Root
-
Hanno Klemm
-
josef.pktd@gmail.com
-
Juan Nunez-Iglesias
-
Pierre Haessig
-
Sebastian Berg
-
Stephan Hoyer