On Wednesday 21 June 2006 04:46, Michael Sorich wrote:
When transposing a masked array of dtype '
OK, I see where the problem is: When your fill_value has a type that cannot be converted to the type of your data, the `filled` method (used internally in many functions, such as `transpose`) raises a TypeError, which is caught and your array is converted to 'O'. That's what happen here: your fill_value is a string, your data are integer, the types don't match, hence the conversion. So, no, I don't think that's a bug. Why filling when you don't have any masked values, then ? Well, there's a subtle difference between a boolean mask and a mask of booleans. When the mask is boolean (mask=nomask=False), there's no masked value, and `filled` returns the data. Now, when your mask is an array of boolean (your first case), MA doesn't check whether mask.any()==False to determine whether there are some missing data or not, it just processes the whole array of boolean. I agree that's a bit confusing here, and there might be some room for improvement (for example, changing the current `if m is nomask` to `if m is nomask or m.any()==False`, or better, forcing mask to nomask if mask.any()==False). But I don;t think that qualifies as bug. In short: when you have an array of numbers, don't try to fill it with characters.
I was setting the fill_value as 'NA' when constructing the array so
the masked values would be printed as 'NA'. It is not a big deal to
avoid doing this.
Nevertheless, the differences between a masked array with a boolean
mask and a mask of booleans have caused me trouble before. Especially
when there are hidden in-place conversions of a mask which is a array
of False to a mask which is False. e.g.
import numpy
print numpy.version.version
ma1 = numpy.ma.array(((1.,2,3),(4,5,6)), mask=((0,0,0),(0,0,0)))
print ma1.mask
a1 = numpy.asarray(ma1)
print ma1.mask
----------------------
0.9.9.2538
[[False False False]
[False False False]]
False
On 6/21/06, Pierre GM
On Wednesday 21 June 2006 04:46, Michael Sorich wrote:
When transposing a masked array of dtype '
OK, I see where the problem is: When your fill_value has a type that cannot be converted to the type of your data, the `filled` method (used internally in many functions, such as `transpose`) raises a TypeError, which is caught and your array is converted to 'O'.
That's what happen here: your fill_value is a string, your data are integer, the types don't match, hence the conversion. So, no, I don't think that's a bug.
Why filling when you don't have any masked values, then ? Well, there's a subtle difference between a boolean mask and a mask of booleans. When the mask is boolean (mask=nomask=False), there's no masked value, and `filled` returns the data. Now, when your mask is an array of boolean (your first case), MA doesn't check whether mask.any()==False to determine whether there are some missing data or not, it just processes the whole array of boolean.
I agree that's a bit confusing here, and there might be some room for improvement (for example, changing the current `if m is nomask` to `if m is nomask or m.any()==False`, or better, forcing mask to nomask if mask.any()==False). But I don;t think that qualifies as bug.
In short: when you have an array of numbers, don't try to fill it with characters.
On Wednesday 21 June 2006 22:01, Michael Sorich wrote:
I was setting the fill_value as 'NA' when constructing the array so the masked values would be printed as 'NA'. It is not a big deal to avoid doing this.
You can use masked_print_option, as illustrated below, without using a fill_value incompatible with your data type.
import numpy.core.ma as MA X = MA.array([1,2,3],maks=[0,1,0]) print X [1 -- 3] MA.masked_print_option=MA._MaskedPrintOption('N/A') print X [1 N/A 3]
Nevertheless, the differences between a masked array with a boolean mask and a mask of booleans have caused me trouble before. Especially when there are hidden in-place conversions of a mask which is a array of False to a mask which is False. e.g.
OK, I'm still using 0.9.8 and I can't help you with this one. In that version, N.asarray transforms the MA into a ndarray, so you lose the mask. But I wonder: if none of your values are masked, the natural behavior would be to have `data.mask==nomask`, which speeds up things a bit. This gain of time is why I was suggesting that `mask` would be forced to `nomask` at the creation, if `mask.any()==False`. Could you give me some examples of cases where you need the mask to stay as an array of False ? If you need to access the mask as an array, you can always use MA.getmaskarray.
Pierre wrote:
I agree that's a bit confusing here, and there might be some room for improvement (for example, changing the current `if m is nomask` to `if m is nomask or m.any()==False`, or better, forcing mask to nomask if mask.any()==False). But I don;t think that qualifies as bug.
In the original MA in Numeric, I decided that to constantly check for masks that didn't actually mask anything was not a good idea. It punishes normal use with a very expensive check that is rarely going to be true. If you are in a setting where you do not want this behavior, but instead want masks removed whenever possible, you may wish to wrap or replace things like masked_array so that they call make_mask with flag = 1: y = masked_array(data, make_mask(maskdata, flag=1)) y will have no mask if maskdata is all false. Thanks to Pierre for pointing out about masked_print_option. Paul
On 6/23/06, Pierre GM
On Wednesday 21 June 2006 22:01, Michael Sorich wrote:
Nevertheless, the differences between a masked array with a boolean mask and a mask of booleans have caused me trouble before. Especially when there are hidden in-place conversions of a mask which is a array of False to a mask which is False. e.g.
OK, I'm still using 0.9.8 and I can't help you with this one. In that version, N.asarray transforms the MA into a ndarray, so you lose the mask.
No, the mask of ma1 is converted in place to False. ma1 remains a MaskedArray
import numpy
ma1 = numpy.ma.array(((1.,2,3),(4,5,6)), mask=((0,0,0),(0,0,0)))
print ma1.mask, type(ma1)
numpy.asarray(ma1)
print ma1.mask, type(ma1)
--output--
[[False False False]
[False False False]]
But I wonder: if none of your values are masked, the natural behavior would be to have `data.mask==nomask`, which speeds up things a bit. This gain of time is why I was suggesting that `mask` would be forced to `nomask` at the creation, if `mask.any()==False`.
Could you give me some examples of cases where you need the mask to stay as an array of False ? If you need to access the mask as an array, you can always use MA.getmaskarray.
If it did not sometimes effect the behaviour of the masked array, I would not be worried about automatic conversions between the two forms of the mask. Is it agreed that there should not be any differences in the behavior of the two forms of masked array e.g. with a mask of [[False,False],[False,False]] vs False? It is frustrating to track down exceptions when the array has one behavior, then there is a implicit conversion of the mask which changes the behaviour of the array. Mike
On 6/23/06, Paul Dubois
In the original MA in Numeric, I decided that to constantly check for masks that didn't actually mask anything was not a good idea. It punishes normal use with a very expensive check that is rarely going to be true.
If you are in a setting where you do not want this behavior, but instead want masks removed whenever possible, you may wish to wrap or replace things like masked_array so that they call make_mask with flag = 1:
y = masked_array(data, make_mask(maskdata, flag=1))
y will have no mask if maskdata is all false.
Hi Paul. If this is purely for optimisation, is there perhaps a better way to do so in which the optimisation is hidden? For example if the optimisation is in regard to the case in which none of the elements is masked, an alternative approach may be to make a subclass of ndarray and cache the result of the any method. e.g. import numpy as N def asmask(data): if isinstance(data, Mask): return data else: return Mask(data) class Mask(N.ndarray): __array_priority__ = 10.0 def __new__(subtype, data): ret = N.array(data, N.bool_) return ret.view(Mask) def __init__(self, data): self._any = None def any(self): if self._any is None: self._any = N.ndarray.any(self) return self._any def __setitem__(self, index, value): self._any = None N.ndarray.__setitem__(self, index, value) The biggest problem I have with the current setup is the inconsistency between the behaviour of the array when the mask is nomask vs a boolarray with all False. Another example of this is when one changes the mask on an element. This is not possible when the mask is nomask print N.version.version ma1 = N.ma.array([1,2,3], mask=[False, False, False]) print ma1.mask ma1.mask[2] = True ma2 = N.ma.array([1,2,3], mask=False) print ma2.mask ma2.mask[2] = True ----- output 0.9.8 [False False False] [False False True] False Traceback (most recent call last): File "D:\eclipse\Mask\src\mask\__init__.py", line 111, in ? ma2.mask[2] = True TypeError: object does not support item assignment
Michael, I wonder whether the Mask class you suggest is not a bit overkill. There should be enough tools in the existing MA module to do what we want. And I don't wanna think about compatibility the number of changes in the MA code that'd be required (but I'm lazy)... For the sake of consistency and optimization, I still think it could be easier (and cleaner) to make `nomask` the default for a MaskedArray without masked values. That could for example be implemented by forcing `nomask` at the creation of the MaskedArray with an extra `if mask and not mask.any(): mask=nomask`, or by using Paul's make_mask( flag=1) trick. Masking some specific values could still be done when mask is nomask with an intermediary MA.getmaskarray() step. On a side note, modifying an existing mask is a delicate matter. Everything's OK if you use masks as a way to hide existing data, it's more complex when initially you have some holes in your dataset...
Some things to note:
The mask is copy-on-write. Don't mess with that. You can't just poke values
into an existing mask, it may be shared with other arrays.
I do not agree that there is any 'inconsistency'. It may be someone's
concept of the class that if there is a mask then at least one value is on,
but that was not my design. I believe if you try your ideas you'll find it
slows other people down, if not you.
Perhaps with all of Travis' new machinery, subclassing works. It didn't used
to, and I haven't kept up.
On 7/3/06, Pierre GM
Michael, I wonder whether the Mask class you suggest is not a bit overkill. There should be enough tools in the existing MA module to do what we want. And I don't wanna think about compatibility the number of changes in the MA code that'd be required (but I'm lazy)...
For the sake of consistency and optimization, I still think it could be easier (and cleaner) to make `nomask` the default for a MaskedArray without masked values. That could for example be implemented by forcing `nomask` at the creation of the MaskedArray with an extra `if mask and not mask.any(): mask=nomask`, or by using Paul's make_mask( flag=1) trick.
Masking some specific values could still be done when mask is nomask with an intermediary MA.getmaskarray() step.
On a side note, modifying an existing mask is a delicate matter. Everything's OK if you use masks as a way to hide existing data, it's more complex when initially you have some holes in your dataset...
participants (3)
-
Michael Sorich
-
Paul Dubois
-
Pierre GM