Trouble With MaskedArray and Shared Masks
I'm having trouble with MaskedArray's _sharedmask flag. I would like to create a sub-view of a MaskedArray, fill it, and have the modifications reflected in the original array. This works with regular ndarrays, but only works with MaskedArrays if _sharedmask is set to False. Here's an example:
a = numpy.ma.MaskedArray( ... data=numpy.zeros((4,5), dtype=float), ... mask=numpy.ones((4,5), dtype=numpy.ma.MaskType), ... fill_value=0.0 ... )
sub_a = a[:2,:3] sub_a[0,0] = 1.0
print sub_a [[1.0 -- --] [-- -- --]]
print a [[-- -- -- -- --] [-- -- -- -- --] [-- -- -- -- --] [-- -- -- -- --]]
print a.data [[ 1. 0. 0. 0. 0.] [ 0. 0. 0. 0. 0.] [ 0. 0. 0. 0. 0.] [ 0. 0. 0. 0. 0.]]
The data array receives the new value, but the mask array does not.
a._sharedmask = False sub_a = a[:2,:3] sub_a[0,0] = 1.0
print sub_a [[1.0 -- --] [-- -- --]]
print a [[1.0 -- -- -- --] [-- -- -- -- --] [-- -- -- -- --] [-- -- -- -- --]]
This sort of (for me) unexpected behavior extends to other ways I've been using numpy arrays as well: a[:] = 1.0 (set to constant); a[:] = b (copy into); a[:5] = a[-5:] (rotating copy), etc. I wasn't seeing this behavior before because I was working on an array that had already been sliced and therefore "unshared", which caused a good deal of confusion for me when I started working on an array that wasn't the product of slicing. All of this leads me to some questions. What is the rational for initializing a new MaskedArray with _sharedmask=True when its mask isn't (actively) being shared yet? Is there a better way to say: "a=MaskedArray(...); a._sharedmask=False" that does not require touching a "private" attribute? Or am I going about this all wrong? What's the correct MaskedArray idioms for these actions that doesn't cause a new mask to be created? Thanks! Alex
Alexander, The rationale behind the current behavior is to avoid an accidental propagation of the mask. Consider the following example:
m = numpy.array([1,0,0,1,0], dtype=bool_) x = numpy.array([1,2,3,4,5]) y = numpy.sqrt([5,4,3,2,1]) mx = masked_array(x,mask=m) my = masked_array(y,mask=m) mx[0] = 0 print mx,my, m [0 2 3 -- 5] [-- 4 3 -- 1] [ True False False True False]
At the creation, mx._sharedmask and my._sharedmask are both True. Setting mx[0]=0 forces mx._mask to be copied, so that we don't affect the mask of my. Now,
m = numpy.array([1,0,0,1,0], dtype=bool_) x = numpy.array([1,2,3,4,5]) y = numpy.sqrt([5,4,3,2,1]) mx = masked_array(x,mask=m) my = masked_array(y,mask=m) mx._sharedmask = False mx[0] = 0 print mx,my, m [0 2 3 -- 5] [5 4 3 -- 1] [False False False True False]
By mx._sharedmask=False, we deceived numpy.ma into thinking that it's OK to update the mask of mx (that is, m), and my gets updated. Sometimes it's what you want (your case for example), often it is not: I've been bitten more than once before reintroducing the _sharedmask flag. As you've observed, setting a private flag isn't a very good idea: you should use the .unshare_mask() function instead, that copies the mask and set the _sharedmask to False. OK, in your example, copying the mask is not needed, but in more general cases, it is. At the initialization, self._sharedmask is set to (not copy). That is, if you didn't specify copy=True at the creation (the default being copy=False), self._sharedmask is True. Now, I recognize it's not obvious, and perhaps we could introduce yet another parameter to masked_array/array/MaskedArray, share_mask, that would take a default value of True and set self._sharedmask=(not copy)&share_mask So: should we introduce this extra parameter ? In any case, I hope it helps. P.
On Tue, Feb 26, 2008 at 2:32 PM, Pierre GM <pgmdevlist@gmail.com> wrote:
Alexander, The rationale behind the current behavior is to avoid an accidental propagation of the mask. Consider the following example:
m = numpy.array([1,0,0,1,0], dtype=bool_) x = numpy.array([1,2,3,4,5]) y = numpy.sqrt([5,4,3,2,1]) mx = masked_array(x,mask=m) my = masked_array(y,mask=m) mx[0] = 0 print mx,my, m [0 2 3 -- 5] [-- 4 3 -- 1] [ True False False True False]
At the creation, mx._sharedmask and my._sharedmask are both True. Setting mx[0]=0 forces mx._mask to be copied, so that we don't affect the mask of my.
Now,
m = numpy.array([1,0,0,1,0], dtype=bool_) x = numpy.array([1,2,3,4,5]) y = numpy.sqrt([5,4,3,2,1]) mx = masked_array(x,mask=m) my = masked_array(y,mask=m) mx._sharedmask = False mx[0] = 0 print mx,my, m [0 2 3 -- 5] [5 4 3 -- 1] [False False False True False]
By mx._sharedmask=False, we deceived numpy.ma into thinking that it's OK to update the mask of mx (that is, m), and my gets updated. Sometimes it's what you want (your case for example), often it is not: I've been bitten more than once before reintroducing the _sharedmask flag.
As you've observed, setting a private flag isn't a very good idea: you should use the .unshare_mask() function instead, that copies the mask and set the _sharedmask to False. OK, in your example, copying the mask is not needed, but in more general cases, it is.
At the initialization, self._sharedmask is set to (not copy). That is, if you didn't specify copy=True at the creation (the default being copy=False), self._sharedmask is True. Now, I recognize it's not obvious, and perhaps we could introduce yet another parameter to masked_array/array/MaskedArray, share_mask, that would take a default value of True and set self._sharedmask=(not copy)&share_mask
Thank you for your thorough explanation. I was providing the mask array to the constructor in order to do my own allocating, mostly to ensure that the MaskedArray had a dense mask that *wouldn't* be replaced with a copy without my intentional instruction. I didn't realize that the MaskedArray was not taking ownership of provided mask (even though copy was False) because the implied usage for providing the mask explicitly is to read-only alias another MaskedArray's mask. I was working against my own goal! Now that I understand a little better, the easiest/betst thing for me to do is change the way I create the MaskedArray to:
a = numpy.ma.MaskedArray( ... data=numpy.zeros((4,5), dtype=float), ... mask=True, ... fill_value=0.0 ... )
This appears to cause MaskedArray to create a dense mask which persists (i.e. isn't replaced by a copy) for the lifetime of the MaskedArray.
So: should we introduce this extra parameter ?
The propagation semantics and mechanics are definitely tricky, especially considering that it seems that the "right behavior" is context dependent. Are the mask propagation rules spelled out anywhere (aside from the code! :-))? I could see some potential value to an additional argument, but the constructor is already quite complicated so I'm reluctant to say "Yes" outright, especially with my current level of understanding. At the very least, perhaps the doc-string should be amended to include the note that if a mask is provided, it is assumed to be shared and a copy of it will be made when/if it is modified. How does the keep_mask option play into this? I don't understand what that one does yet. Thanks! Alex
Alexander,
create the MaskedArray to:
a = numpy.ma.MaskedArray(
... data=numpy.zeros((4,5), dtype=float), ... mask=True, ... fill_value=0.0 ... )
By far the easiest indeed.
So: should we introduce this extra parameter ?
The propagation semantics and mechanics are definitely tricky, especially considering that it seems that the "right behavior" is context dependent. Are the mask propagation rules spelled out anywhere (aside from the code! :-))?
Mmh, no: we tried to avoid mask propagation as much as possible, as it can have some fairly disastrous side-effects. In other terms, no propagation by default when a mask is shared, propagation when the mask is not shared.
I could see some potential value to an additional argument, but the constructor is already quite complicated so I'm reluctant to say "Yes" outright, especially with my current level of understanding.
Yes, there are already a lot of parameters, some more useful than others: hard_mask : if True, prevent a masked value to be accidentally unmasked. shrink: if True, force a mask full of False to nomask keep_mask : when creating a new masked_array for an existing one, specifies whether the old mask should be taken into account or not. By default, keep_mask is True For example:
import numpy.mas as ma x=ma.array([1,2,3,4,5],mask=[1,0,0,1,0]) y=ma.array(x) y masked_array(data = [-- 2 3 -- 5], mask = [ True False False True False], fill_value=999999)
We just inherited the mask from x: y._mask and x._mask are the same object, and y._sharedmask is True. Now, let's change keep_mask to False
y=ma.array(x,keep_mask=False) y masked_array(data = [1 2 3 4 5], mask = False, fill_value=999999) We keep the data from x, but we force the mask to the default (viz, nomask) Now for some more fun: remember that we keep the mask by defulat
y=ma.array(x,mask=[0,0,0,0,1]) y masked_array(data = [-- 2 3 -- --], mask = [ True False False True True], fill_value=999999)
We kept the mask of x ([1,0,0,1,0]) and combined it with our new mask ([0,0,0,0,1]), so y._mask=[1,0,0,1,1] If you really want [0,0,0,0,1] as a mask, just drop the initial mask:
y=ma.array(x,mask=[0,0,0,0,1], keep_mask=False) y masked_array(data = [1 2 3 4 --], mask = [False False False False True], fill_value=999999)
At the very least, perhaps the doc-string should be amended to include the note that if a mask is provided, it is assumed to be shared and a copy of it will be made when/if it is modified. Sounds like a good idea. is there a wiki page for MaskedArrays somewhere ? If not, Alexander, feel free to start one from your experience, I'll update if needed.
participants (2)
-
Alexander Michael
-
Pierre GM