maskedarray: how to force mask to expand
data:image/s3,"s3://crabby-images/2905b/2905b16b016b4c6b237a1f7a12be5a0dbf736271" alt=""
Probably I'm just overlooking something obvious, but I'm having problems with maskedarrays (numpy.ma from svn: '1.3.0.dev5861'), the mask by default being a single bool value ('False') instead of a properly sized bool array. If I then try to mask one value by assigning values to certain mask positions (a.mask[0,0]=True) I get an error, logically. I know I can use mask_where, but I like the mask[...] idiom. And I have to expand the mask anyway, as I'm gonna write it to a file at the end. 1) Is there a way to have ma always use properly expanded masks (bool arrays instead of single bool values)? I tried the shrink=False keyword, but that does not do what I want, and is not available for numpy.ma.zeros, which I conveniently use a lot. 2) Is there a method/function to request the mask, be it a single bool value or an array, as a properly sized array? I found shrink_mask but no opposite method, and shrink_mask seems to do something subtly different even. Regards, Vincent.
data:image/s3,"s3://crabby-images/d6ed8/d6ed8a6c40cfae688fb3a548ced9980c66f99275" alt=""
Vincent, You should really consider putting an example next time. I must admit that I'm not sure what you're trying to do, and where/why it fails. Yes, by default, the mask of a new MaskedArray is set to the value 'nomask', which is the boolean False. Directly setting an element of the mask in that condition fails of course. The reasons behind using this behavior are (1) backward compatibility and (2) speed, as you can bypass a lot of operations on the mask when it is empty. If you need to mask one or several elements, the easiest is not to modify the mask itself, but to use the the special value `masked`:
a = ma.array(np.arange(6).reshape(3,2)) masked_array(data = [[0 1] [2 3] [4 5]], mask = False, fill_value=999999) # Mask the first element. a[0,0] = ma.masked a masked_array(data = [[-- 1] [2 3] [4 5]], mask = [[ True False] [False False] [False False]], fill_value=999999)
This value, `masked`, is also useful to check whether one particular element is masked:
a[0,0] is ma.masked True a[0,1] is ma.masked False
You can also force the mask to be full of False with the proper shape by that way:
a = ma.array(np.arange(6).reshape(3,2) # Force the mask to have the proper shape and be full of False: a.mask = False masked_array(data = [[0 1] [2 3] [4 5]], mask = [[False False] [False False] [False False]], fill_value=999999)
The shrink argument of ma.array collapses amask full of False to nomask, once again for speed reasons. So no, it won't do what you look like to want. I agree that having to deal with nomask is not completely intuitive. However, it is required for backward compatibility. One day, the class will be ported to C, and then I'll push to have the mask set to the proper shape ab initio, because then speed will be less of an issue. In the meantime, I hope I answered your question. On Wednesday 24 September 2008 06:25:57 Vincent Schut wrote:
Probably I'm just overlooking something obvious, but I'm having problems with maskedarrays (numpy.ma from svn: '1.3.0.dev5861'), the mask by default being a single bool value ('False') instead of a properly sized bool array. If I then try to mask one value by assigning values to certain mask positions (a.mask[0,0]=True) I get an error, logically. I know I can use mask_where, but I like the mask[...] idiom. And I have to expand the mask anyway, as I'm gonna write it to a file at the end.
1) Is there a way to have ma always use properly expanded masks (bool arrays instead of single bool values)? I tried the shrink=False keyword, but that does not do what I want, and is not available for numpy.ma.zeros, which I conveniently use a lot.
2) Is there a method/function to request the mask, be it a single bool value or an array, as a properly sized array? I found shrink_mask but no opposite method, and shrink_mask seems to do something subtly different even.
Regards, Vincent.
_______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
data:image/s3,"s3://crabby-images/2905b/2905b16b016b4c6b237a1f7a12be5a0dbf736271" alt=""
Pierre GM wrote:
Vincent,
You should really consider putting an example next time. I must admit that I'm not sure what you're trying to do, and where/why it fails.
Pierre, sorry for that, I was posting hastily before leaving work, and was myself pretty confused about ma's behaviour on this too, so it was hard for me to explain or phrase my question clearly. It just feels a bit strange that ma.array by default gives a mask without shape and of False. I mean, what's the difference then between that and a normal numpy array? If I did not want a mask, I'd use numpy.array. I do want a mask, so I'd expect ma to give me a mask, which it in fact does not (or does, on which we can have different opinions, but a default mask of False imho == nomask == no mask). OK, that being said, I understand the argument of backwards compatibility. I disagree on the argument of speed, because for that the same applies: if I were really concerned about speed, I'd use numpy arrays, keep a separate mask myself, and before any operation I'd get a flattened copy of all my data that is not masked and run the operation on that. IMHO masked arrays are there to trade speed for convenience, so that's what I expect. Just for clarity, to rephrase my question: how do I force ma to give me (always/by default/by some method on a maskedarray) a full shaped mask instead of 'False' or nomask? Because I am sure from the beginning that I'll need this mask in full shape, I want it, and I want to be able to treat it like any normal bool array :-)
Yes, by default, the mask of a new MaskedArray is set to the value 'nomask', which is the boolean False. Directly setting an element of the mask in that condition fails of course. The reasons behind using this behavior are (1) backward compatibility and (2) speed, as you can bypass a lot of operations on the mask when it is empty.
1) is clear 2) seems unintuitive to me. I'd say, use numpy arrays then, use .filled() before you do something, or use a flag 'bypass_mask=True', etc. Any of these seem more intuitive to me that what is does now. No offence, I really appreciate your work, just my 2c for a possible future...
If you need to mask one or several elements, the easiest is not to modify the mask itself, but to use the the special value `masked`:
a = ma.array(np.arange(6).reshape(3,2)) masked_array(data = [[0 1] [2 3] [4 5]], mask = False, fill_value=999999) # Mask the first element. a[0,0] = ma.masked
Ah, I did not know that one. Does that always work, I mean, with slices, fancy indexing, etc.? Like 'a[a<0 | a>100] = ma.masked'? It's kind of clean to fiddle with the mask of the array without really interacting with the mask itself, if you understand what I mean... :) And is there also a complement, like ma.unmasked? I could not find it (very quick search, I admit)... Or can I use !ma.masked?
a masked_array(data = [[-- 1] [2 3] [4 5]], mask = [[ True False] [False False] [False False]], fill_value=999999)
This value, `masked`, is also useful to check whether one particular element is masked:
a[0,0] is ma.masked True a[0,1] is ma.masked False
You can also force the mask to be full of False with the proper shape by that way:
a = ma.array(np.arange(6).reshape(3,2) # Force the mask to have the proper shape and be full of False: a.mask = False masked_array(data = [[0 1] [2 3] [4 5]], mask = [[False False] [False False] [False False]], fill_value=999999) Ah, now the magic starts... (normal user cap on head, beware):
In [9]: am.mask Out[9]: False In [10]: am.mask = False In [11]: am.mask Out[11]: array([[False, False], [False, False]], dtype=bool) while (with the same am as before [9], with am.mask == False): In [15]: am.mask = am.mask In [16]: am.mask Out[16]: False Do you see (and agree with me about) the inconsistency? Setting am.mask with its own value changes that same value of am.mask. While am.mask = am.mask, which on first sight should be the same as am.mask = False, as am.mask==False is True, does *not* change the value of am.mask...
The shrink argument of ma.array collapses amask full of False to nomask, once again for speed reasons. So no, it won't do what you look like to want.
I already supposed so...
I agree that having to deal with nomask is not completely intuitive. However, it is required for backward compatibility. One day, the class will be ported to C, and then I'll push to have the mask set to the proper shape ab initio, because then speed will be less of an issue.
Glad that we share opinions about the unintuitiveness... Eagerly awaiting the port to C, not (only) for speed, but mainly for consistency.
In the meantime, I hope I answered your question.
Well, yes and no. To resume: by default, the mask of a masked array (if not given at creation as a bool array) is always 'False'. There is no keyword to force the mask at creation to full shape, and there is no method on a maskedarray to change the mask to full shape. However, one can apply some magic and use 'a.mask' = False directly after creation to force the mask to full shape. This of course only works when the mask already *was* False, otherwise you'll be effectively changing your mask. So we presume ma never by default returns a mask of 'True', and then this works. The obvious trick to workaround this remote possibility of a mask of 'True' would be a.mask = a.mask, but that does not work. Hey, sorry about starting a discussion about this, while I meant to ask just a simple question (and really assumed I had overlooked something, it seemed so simple...). Again, no offence meant, and your work on ma is really appreciated. I hope this discussion will result in more intuitiveness in a future (C?) implementation of ma. Regards, Vincent.
On Wednesday 24 September 2008 06:25:57 Vincent Schut wrote:
Probably I'm just overlooking something obvious, but I'm having problems with maskedarrays (numpy.ma from svn: '1.3.0.dev5861'), the mask by default being a single bool value ('False') instead of a properly sized bool array. If I then try to mask one value by assigning values to certain mask positions (a.mask[0,0]=True) I get an error, logically. I know I can use mask_where, but I like the mask[...] idiom. And I have to expand the mask anyway, as I'm gonna write it to a file at the end.
1) Is there a way to have ma always use properly expanded masks (bool arrays instead of single bool values)? I tried the shrink=False keyword, but that does not do what I want, and is not available for numpy.ma.zeros, which I conveniently use a lot.
2) Is there a method/function to request the mask, be it a single bool value or an array, as a properly sized array? I found shrink_mask but no opposite method, and shrink_mask seems to do something subtly different even.
Regards, Vincent.
_______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
data:image/s3,"s3://crabby-images/d6ed8/d6ed8a6c40cfae688fb3a548ced9980c66f99275" alt=""
Vincent, The argument of speed (having a default mask of nomask) applies only to the computations inside MaskedArray. Of course, it is still far faster not to use masks but only ndarrays.
Just for clarity, to rephrase my question: how do I force ma to give me (always/by default/by some method on a maskedarray) a full shaped mask instead of 'False' or nomask? Because I am sure from the beginning that I'll need this mask in full shape, I want it, and I want to be able to treat it like any normal bool array :-)
Easy:
a = ma.array([1,2,3,4], mask=False) masked_array(data = [1 2 3 4], mask = [False False False False], fill_value=999999)
Puzzling ? Not so much. See, by default, the value of the mask parameter is `nomask`. nomask is in fact a 0-size boolean ndarray with value 0 (False). At the creation of the masked array, we check whether a value was given to the mask parameter. If no value is given, we default to `nomask`, and we end up with `a.mask is nomask`. If you force the mask parameter to the boolean False, you're not using `nomask`: in that case, the full mask is created. That won't work with ma.zeros or ma.ones. I could add an extra keyword to deal witht that, but is it really needed when you can simply do a.mask=False ? Note:
a=ma.array([1,2,3,]) a.mask is False False a.mask is ma.nomask True a.mask == False True
If you need to mask one or several elements, the easiest is not to modify the mask itself, but to use the the special value `masked`:
Ah, I did not know that one. Does that always work, I mean, with slices, fancy indexing, etc.? Like 'a[a<0 | a>100] = ma.masked'? It's kind of clean to fiddle with the mask of the array without really interacting with the mask itself, if you understand what I mean... :)
It does work, try it for yourself. It's actually *the* recommended way to set values to the mask.
And is there also a complement, like ma.unmasked? I could not find it (very quick search, I admit)... Or can I use !ma.masked?
No, there isn't and no you can't. ma.masked is actually a constant defined module-wide and independent of any array. That way, you can test whether an element is masked with `a[..] is masked`.
Ah, now the magic starts... (normal user cap on head, beware):
In [9]: am.mask Out[9]: False
In [10]: am.mask = False
In [11]: am.mask Out[11]: array([[False, False], [False, False]], dtype=bool)
while (with the same am as before [9], with am.mask == False):
In [15]: am.mask = am.mask
In [16]: am.mask Out[16]: False
Do you see (and agree with me about) the inconsistency?
No. I'm afraid you're confusing `nomask` and `False`. Once again, nomask is NOT the same thing as False. It's the same value, but not the same object.
data:image/s3,"s3://crabby-images/2905b/2905b16b016b4c6b237a1f7a12be5a0dbf736271" alt=""
Pierre, Thanks for your explanations. It still seems a little (too) complicated, but from a backwards-compatibility pov combined with your 'nomask is not False' implementation detail, I can understand mostly :-) I think the idea that when a.mask returns False, that actually means nomask instead of the False I'm used to, is what caused a major part of my confusion. It might actually be nice to give you some context of why I asked this: during my (satellite image) processing, I use maskedarrays by default for each step in the processing chain, and I need to save the result of each step to disk. That means saving the array and its mask. I save both of them as tiff files (because these can include all other info that is nice for satellite imagery, like coordinates and projection). When saving the mask, I'm creating a tiff file and pushing the .mask array into it. Therefore, I obviously need the .mask not to be nomask, but to be a full shaped array. That's the context you need to see my confusion in. Because speed and memory don't matter that much to me (well, they to matter, but I'm processing huge amounts of data anyways, and using parallel/clustered processing anyways, so I can take those masks...), I thought it might be easiest to make sure my data always has a full shaped mask. But, of course, perfomance-wise it would be best to be able to work with nomasks, and only expand these to full shaped masks when writing to disk. That's why I asked for a possible method on an ma to force expanding a mask, or e.g. an ma.mask.as_full_shaped_mask() method that returns either the mask, or (if nomask) a new array of Falses. I just supposed it existed and I could not find it, but now I understand it does not exist. But I could easily write something that checks for nomask and always returns an expanded mask. The 'trick' to create ma's with the mask=False keyword is neat, I had not thought about that. Same applies for masking values using ma[idx] = ma.masked. Just for completeness (in case someone else is reading this and wondering how to *unmask* values): just setting ma[idx] to some valid number will unset the mask for that index. No need to do ma[idx] = ma.unmask or whatever, just ma[idx] = v. OK, top posting is bad :) Further comments inline. Pierre GM wrote:
Vincent,
The argument of speed (having a default mask of nomask) applies only to the computations inside MaskedArray. Of course, it is still far faster not to use masks but only ndarrays.
Just for clarity, to rephrase my question: how do I force ma to give me (always/by default/by some method on a maskedarray) a full shaped mask instead of 'False' or nomask? Because I am sure from the beginning that I'll need this mask in full shape, I want it, and I want to be able to treat it like any normal bool array :-)
Easy:
a = ma.array([1,2,3,4], mask=False) masked_array(data = [1 2 3 4], mask = [False False False False], fill_value=999999)
Puzzling ? Not so much. See, by default, the value of the mask parameter is `nomask`. nomask is in fact a 0-size boolean ndarray with value 0 (False). At the creation of the masked array, we check whether a value was given to the mask parameter. If no value is given, we default to `nomask`, and we end up with `a.mask is nomask`. If you force the mask parameter to the boolean False, you're not using `nomask`: in that case, the full mask is created.
I understand that now.
That won't work with ma.zeros or ma.ones. I could add an extra keyword to deal witht that, but is it really needed when you can simply do a.mask=False ?
No real need for that, then. Would be just conveniencewise sugar-addition on (especially my) cake.
Note:
a=ma.array([1,2,3,]) a.mask is False False a.mask is ma.nomask True a.mask == False True
Btw, in future versions, would it be an idea to separate 'nomask' and 'False' a little more? I always assumed (correctly?) that in python, True and False are singletons (as far as that is possible in python), just like None. 'False is False' should always compare to True, then. In this case (a.mask is False) it at least *seems* to break that 'rule'...
And is there also a complement, like ma.unmasked? I could not find it (very quick search, I admit)... Or can I use !ma.masked?
Can just set elements to unmask them, found out that. No need for ma.unmasked.
No, there isn't and no you can't. ma.masked is actually a constant defined module-wide and independent of any array. That way, you can test whether an element is masked with `a[..] is masked`.
Ah, now the magic starts... (normal user cap on head, beware):
In [9]: am.mask Out[9]: False
In [10]: am.mask = False
In [11]: am.mask Out[11]: array([[False, False], [False, False]], dtype=bool)
while (with the same am as before [9], with am.mask == False):
In [15]: am.mask = am.mask
In [16]: am.mask Out[16]: False
Do you see (and agree with me about) the inconsistency?
No. I'm afraid you're confusing `nomask` and `False`. Once again, nomask is NOT the same thing as False. It's the same value, but not the same object.
Exactly. It might not be inconsistent, but imho it does a lot of effort to /feel/ inconsistent to Joe Average. And that's what was causing a lot of confusion. For example, why doesn't a.mask return 'nomask' instead of 'False'? That would have saved me some head-scratching... See also my comment earlier on assuming False to be a python-wide constant (singleton). "It's the same value, but not the same object" imho breaks this python guarantee (though I admit that I'll have to look that up, this guarantee might just be there only in my head...) Well, so far so good. My problems have been solved largely. The philosophical discussion could go on... Vincent.
data:image/s3,"s3://crabby-images/d6ed8/d6ed8a6c40cfae688fb3a548ced9980c66f99275" alt=""
Vincent,
I think the idea that when a.mask returns False, that actually means nomask instead of the False I'm used to, is what caused a major part of my confusion.
Indeed.
It might actually be nice to give you some context of why I asked this: during my (satellite image) processing, I use maskedarrays by default for each step in the processing chain
Ah. You'll probably notice it might be faster to process the array and its mask in parallel, and then recreate a masked array at the end. If you need to save a mask that hasn't been set yet, you could use the `ma.getmaskarray` command instead of trying to access the mask as an attribute, that is, use:
mask=ma.getmaskarray(a) instead of mask=a.mask or mask=ma.getmask(a)
`ma.getmaskarray(a)` always return an array with the same shape as `a`, even if full of False. If you need to create a new mask from the shape and dtype of `a`, you can use `ma.make_mask_none(a)`. Both functions are documented.
force expanding a mask, or e.g. an ma.mask.as_full_shaped_mask() method that returns either the mask, or (if nomask) a new array of Falses. I just supposed it existed and I could not find it, but now I understand it does not exist.
Actually, it does, that's the ma.getmaskarray() I was just telling you about.
Just for completeness (in case someone else is reading this and wondering how to *unmask* values): just setting ma[idx] to some valid number will unset the mask for that index. No need to do ma[idx] = ma.unmask or whatever, just ma[idx] = v.
Exactly. Except that for the sake of completeness, there's a little known attribute (_hardmask) that prevents you to unmask some data by mistake. By default, `_hardmask` is False, which means that if you set a masked value to a non-masked one, you update the mask as well. If you set a._hardmask to True or use the `harden_mask` method, you won't be able to unmask the data that way, you'll have to modify the mask directly. It's useful when the mask cannot or should not be changed.
Btw, in future versions, would it be an idea to separate 'nomask' and 'False' a little more? I always assumed (correctly?) that in python, True and False are singletons (as far as that is possible in python), just like None. 'False is False' should always compare to True, then. In this case (a.mask is False) it at least *seems* to break that 'rule'...
You mean, returning something that says 'nomask' instead of 'False' ? I'm not sure how I can do that. Sure, it'd be nice.
No. I'm afraid you're confusing `nomask` and `False`. Once again, nomask is NOT the same thing as False. It's the same value, but not the same object.
Exactly. It might not be inconsistent, but imho it does a lot of effort to /feel/ inconsistent to Joe Average.
It's not intentional. I guess most of your problems would never have come if we had a proper documentation. As you're starting to use MaskedArrays, you could take note and that would provide us with a basis we could improve on...
participants (2)
-
Pierre GM
-
Vincent Schut