On Tue, Feb 26, 2008 at 2:32 PM, Pierre GM <pgmdevlist@gmail.com> wrote:
Alexander, The rationale behind the current behavior is to avoid an accidental propagation of the mask. Consider the following example:
m = numpy.array([1,0,0,1,0], dtype=bool_) x = numpy.array([1,2,3,4,5]) y = numpy.sqrt([5,4,3,2,1]) mx = masked_array(x,mask=m) my = masked_array(y,mask=m) mx[0] = 0 print mx,my, m [0 2 3 -- 5] [-- 4 3 -- 1] [ True False False True False]
At the creation, mx._sharedmask and my._sharedmask are both True. Setting mx[0]=0 forces mx._mask to be copied, so that we don't affect the mask of my.
Now,
m = numpy.array([1,0,0,1,0], dtype=bool_) x = numpy.array([1,2,3,4,5]) y = numpy.sqrt([5,4,3,2,1]) mx = masked_array(x,mask=m) my = masked_array(y,mask=m) mx._sharedmask = False mx[0] = 0 print mx,my, m [0 2 3 -- 5] [5 4 3 -- 1] [False False False True False]
By mx._sharedmask=False, we deceived numpy.ma into thinking that it's OK to update the mask of mx (that is, m), and my gets updated. Sometimes it's what you want (your case for example), often it is not: I've been bitten more than once before reintroducing the _sharedmask flag.
As you've observed, setting a private flag isn't a very good idea: you should use the .unshare_mask() function instead, that copies the mask and set the _sharedmask to False. OK, in your example, copying the mask is not needed, but in more general cases, it is.
At the initialization, self._sharedmask is set to (not copy). That is, if you didn't specify copy=True at the creation (the default being copy=False), self._sharedmask is True. Now, I recognize it's not obvious, and perhaps we could introduce yet another parameter to masked_array/array/MaskedArray, share_mask, that would take a default value of True and set self._sharedmask=(not copy)&share_mask
Thank you for your thorough explanation. I was providing the mask array to the constructor in order to do my own allocating, mostly to ensure that the MaskedArray had a dense mask that *wouldn't* be replaced with a copy without my intentional instruction. I didn't realize that the MaskedArray was not taking ownership of provided mask (even though copy was False) because the implied usage for providing the mask explicitly is to read-only alias another MaskedArray's mask. I was working against my own goal! Now that I understand a little better, the easiest/betst thing for me to do is change the way I create the MaskedArray to:
a = numpy.ma.MaskedArray( ... data=numpy.zeros((4,5), dtype=float), ... mask=True, ... fill_value=0.0 ... )
This appears to cause MaskedArray to create a dense mask which persists (i.e. isn't replaced by a copy) for the lifetime of the MaskedArray.
So: should we introduce this extra parameter ?
The propagation semantics and mechanics are definitely tricky, especially considering that it seems that the "right behavior" is context dependent. Are the mask propagation rules spelled out anywhere (aside from the code! :-))? I could see some potential value to an additional argument, but the constructor is already quite complicated so I'm reluctant to say "Yes" outright, especially with my current level of understanding. At the very least, perhaps the doc-string should be amended to include the note that if a mask is provided, it is assumed to be shared and a copy of it will be made when/if it is modified. How does the keep_mask option play into this? I don't understand what that one does yet. Thanks! Alex