When transposing a masked array of dtype '<f8' I noticed that an ndarray of dtype '|O4' was returned. I found this a little strange as the masked array did not contain any masked values. Upon closer examination (see script below) it seems that if the mask is a boolean array with all false values and the fill_value is a string this will occur. However if the fill_value is a number or the mask is simply False, the dtype stays as '<f8'. I was going to submit this as a bug but wanted to check that this was not a deliberate feature. I am using the numpy version that comes with a recent version of python enthought edition. import numpy print numpy.version.version ma1 = numpy.ma.array(((1.,2,3),(4,5,6)), mask=((0,0,0),(0,0,0))) print ma1.filled('NA').dtype print ma1.filled(-999).dtype ma2 = numpy.ma.array(((1.,2,3),(4,5,6)), mask=False) print ma2.filled('NA').dtype print ma2.filled(-999).dtype --------------- output: 0.9.9.2538 '|O4' '<f8' '<f8' '<f8'
On Wednesday 21 June 2006 04:46, Michael Sorich wrote:
When transposing a masked array of dtype '<f8' I noticed that an ndarray of dtype '|O4' was returned.
OK, I see where the problem is: When your fill_value has a type that cannot be converted to the type of your data, the `filled` method (used internally in many functions, such as `transpose`) raises a TypeError, which is caught and your array is converted to 'O'. That's what happen here: your fill_value is a string, your data are integer, the types don't match, hence the conversion. So, no, I don't think that's a bug. Why filling when you don't have any masked values, then ? Well, there's a subtle difference between a boolean mask and a mask of booleans. When the mask is boolean (mask=nomask=False), there's no masked value, and `filled` returns the data. Now, when your mask is an array of boolean (your first case), MA doesn't check whether mask.any()==False to determine whether there are some missing data or not, it just processes the whole array of boolean. I agree that's a bit confusing here, and there might be some room for improvement (for example, changing the current `if m is nomask` to `if m is nomask or m.any()==False`, or better, forcing mask to nomask if mask.any()==False). But I don;t think that qualifies as bug. In short: when you have an array of numbers, don't try to fill it with characters.
I was setting the fill_value as 'NA' when constructing the array so the masked values would be printed as 'NA'. It is not a big deal to avoid doing this. Nevertheless, the differences between a masked array with a boolean mask and a mask of booleans have caused me trouble before. Especially when there are hidden in-place conversions of a mask which is a array of False to a mask which is False. e.g. import numpy print numpy.version.version ma1 = numpy.ma.array(((1.,2,3),(4,5,6)), mask=((0,0,0),(0,0,0))) print ma1.mask a1 = numpy.asarray(ma1) print ma1.mask ---------------------- 0.9.9.2538 [[False False False] [False False False]] False On 6/21/06, Pierre GM <pgmdevlist@mailcan.com> wrote:
On Wednesday 21 June 2006 04:46, Michael Sorich wrote:
When transposing a masked array of dtype '<f8' I noticed that an ndarray of dtype '|O4' was returned.
OK, I see where the problem is: When your fill_value has a type that cannot be converted to the type of your data, the `filled` method (used internally in many functions, such as `transpose`) raises a TypeError, which is caught and your array is converted to 'O'.
That's what happen here: your fill_value is a string, your data are integer, the types don't match, hence the conversion. So, no, I don't think that's a bug.
Why filling when you don't have any masked values, then ? Well, there's a subtle difference between a boolean mask and a mask of booleans. When the mask is boolean (mask=nomask=False), there's no masked value, and `filled` returns the data. Now, when your mask is an array of boolean (your first case), MA doesn't check whether mask.any()==False to determine whether there are some missing data or not, it just processes the whole array of boolean.
I agree that's a bit confusing here, and there might be some room for improvement (for example, changing the current `if m is nomask` to `if m is nomask or m.any()==False`, or better, forcing mask to nomask if mask.any()==False). But I don;t think that qualifies as bug.
In short: when you have an array of numbers, don't try to fill it with characters.
On Wednesday 21 June 2006 22:01, Michael Sorich wrote:
I was setting the fill_value as 'NA' when constructing the array so the masked values would be printed as 'NA'. It is not a big deal to avoid doing this.
You can use masked_print_option, as illustrated below, without using a fill_value incompatible with your data type.
import numpy.core.ma as MA X = MA.array([1,2,3],maks=[0,1,0]) print X [1 -- 3] MA.masked_print_option=MA._MaskedPrintOption('N/A') print X [1 N/A 3]
Nevertheless, the differences between a masked array with a boolean mask and a mask of booleans have caused me trouble before. Especially when there are hidden in-place conversions of a mask which is a array of False to a mask which is False. e.g.
OK, I'm still using 0.9.8 and I can't help you with this one. In that version, N.asarray transforms the MA into a ndarray, so you lose the mask. But I wonder: if none of your values are masked, the natural behavior would be to have `data.mask==nomask`, which speeds up things a bit. This gain of time is why I was suggesting that `mask` would be forced to `nomask` at the creation, if `mask.any()==False`. Could you give me some examples of cases where you need the mask to stay as an array of False ? If you need to access the mask as an array, you can always use MA.getmaskarray.
On 6/23/06, Pierre GM <pgmdevlist@mailcan.com> wrote:
On Wednesday 21 June 2006 22:01, Michael Sorich wrote:
Nevertheless, the differences between a masked array with a boolean mask and a mask of booleans have caused me trouble before. Especially when there are hidden in-place conversions of a mask which is a array of False to a mask which is False. e.g.
OK, I'm still using 0.9.8 and I can't help you with this one. In that version, N.asarray transforms the MA into a ndarray, so you lose the mask.
No, the mask of ma1 is converted in place to False. ma1 remains a MaskedArray import numpy ma1 = numpy.ma.array(((1.,2,3),(4,5,6)), mask=((0,0,0),(0,0,0))) print ma1.mask, type(ma1) numpy.asarray(ma1) print ma1.mask, type(ma1) --output-- [[False False False] [False False False]] <class 'numpy.core.ma.MaskedArray'> False <class 'numpy.core.ma.MaskedArray'>
But I wonder: if none of your values are masked, the natural behavior would be to have `data.mask==nomask`, which speeds up things a bit. This gain of time is why I was suggesting that `mask` would be forced to `nomask` at the creation, if `mask.any()==False`.
Could you give me some examples of cases where you need the mask to stay as an array of False ? If you need to access the mask as an array, you can always use MA.getmaskarray.
If it did not sometimes effect the behaviour of the masked array, I would not be worried about automatic conversions between the two forms of the mask. Is it agreed that there should not be any differences in the behavior of the two forms of masked array e.g. with a mask of [[False,False],[False,False]] vs False? It is frustrating to track down exceptions when the array has one behavior, then there is a implicit conversion of the mask which changes the behaviour of the array. Mike
Pierre wrote:
I agree that's a bit confusing here, and there might be some room for improvement (for example, changing the current `if m is nomask` to `if m is nomask or m.any()==False`, or better, forcing mask to nomask if mask.any()==False). But I don;t think that qualifies as bug.
In the original MA in Numeric, I decided that to constantly check for masks that didn't actually mask anything was not a good idea. It punishes normal use with a very expensive check that is rarely going to be true. If you are in a setting where you do not want this behavior, but instead want masks removed whenever possible, you may wish to wrap or replace things like masked_array so that they call make_mask with flag = 1: y = masked_array(data, make_mask(maskdata, flag=1)) y will have no mask if maskdata is all false. Thanks to Pierre for pointing out about masked_print_option. Paul
On 6/23/06, Paul Dubois <pfdubois@gmail.com> wrote:
In the original MA in Numeric, I decided that to constantly check for masks that didn't actually mask anything was not a good idea. It punishes normal use with a very expensive check that is rarely going to be true.
If you are in a setting where you do not want this behavior, but instead want masks removed whenever possible, you may wish to wrap or replace things like masked_array so that they call make_mask with flag = 1:
y = masked_array(data, make_mask(maskdata, flag=1))
y will have no mask if maskdata is all false.
Hi Paul. If this is purely for optimisation, is there perhaps a better way to do so in which the optimisation is hidden? For example if the optimisation is in regard to the case in which none of the elements is masked, an alternative approach may be to make a subclass of ndarray and cache the result of the any method. e.g. import numpy as N def asmask(data): if isinstance(data, Mask): return data else: return Mask(data) class Mask(N.ndarray): __array_priority__ = 10.0 def __new__(subtype, data): ret = N.array(data, N.bool_) return ret.view(Mask) def __init__(self, data): self._any = None def any(self): if self._any is None: self._any = N.ndarray.any(self) return self._any def __setitem__(self, index, value): self._any = None N.ndarray.__setitem__(self, index, value) The biggest problem I have with the current setup is the inconsistency between the behaviour of the array when the mask is nomask vs a boolarray with all False. Another example of this is when one changes the mask on an element. This is not possible when the mask is nomask print N.version.version ma1 = N.ma.array([1,2,3], mask=[False, False, False]) print ma1.mask ma1.mask[2] = True ma2 = N.ma.array([1,2,3], mask=False) print ma2.mask ma2.mask[2] = True ----- output 0.9.8 [False False False] [False False True] False Traceback (most recent call last): File "D:\eclipse\Mask\src\mask\__init__.py", line 111, in ? ma2.mask[2] = True TypeError: object does not support item assignment
Michael, I wonder whether the Mask class you suggest is not a bit overkill. There should be enough tools in the existing MA module to do what we want. And I don't wanna think about compatibility the number of changes in the MA code that'd be required (but I'm lazy)... For the sake of consistency and optimization, I still think it could be easier (and cleaner) to make `nomask` the default for a MaskedArray without masked values. That could for example be implemented by forcing `nomask` at the creation of the MaskedArray with an extra `if mask and not mask.any(): mask=nomask`, or by using Paul's make_mask( flag=1) trick. Masking some specific values could still be done when mask is nomask with an intermediary MA.getmaskarray() step. On a side note, modifying an existing mask is a delicate matter. Everything's OK if you use masks as a way to hide existing data, it's more complex when initially you have some holes in your dataset...
Some things to note: The mask is copy-on-write. Don't mess with that. You can't just poke values into an existing mask, it may be shared with other arrays. I do not agree that there is any 'inconsistency'. It may be someone's concept of the class that if there is a mask then at least one value is on, but that was not my design. I believe if you try your ideas you'll find it slows other people down, if not you. Perhaps with all of Travis' new machinery, subclassing works. It didn't used to, and I haven't kept up. On 7/3/06, Pierre GM <pgmdevlist@mailcan.com> wrote:
Michael, I wonder whether the Mask class you suggest is not a bit overkill. There should be enough tools in the existing MA module to do what we want. And I don't wanna think about compatibility the number of changes in the MA code that'd be required (but I'm lazy)...
For the sake of consistency and optimization, I still think it could be easier (and cleaner) to make `nomask` the default for a MaskedArray without masked values. That could for example be implemented by forcing `nomask` at the creation of the MaskedArray with an extra `if mask and not mask.any(): mask=nomask`, or by using Paul's make_mask( flag=1) trick.
Masking some specific values could still be done when mask is nomask with an intermediary MA.getmaskarray() step.
On a side note, modifying an existing mask is a delicate matter. Everything's OK if you use masks as a way to hide existing data, it's more complex when initially you have some holes in your dataset...
participants (3)
-
Michael Sorich
-
Paul Dubois
-
Pierre GM