[Numpy-discussion] A reimplementation of MaskedArray

Pierre GM pgmdevlist at gmail.com
Tue Nov 21 23:09:17 EST 2006


On Tuesday 21 November 2006 21:11, Michael Sorich wrote:
> I think that the new implementation is making a copy of the data with
> indexing a MA. This is different from both ndarray and the existing
> numpy ma version.

Michael, 
If you check the definition of MaskedArray.__new__, you'll see that the "copy" argument is set to True by default. Setting it to "false" seems to give what you expect. Should I make the default ?

> Having subviews of the mask seems complicated with the mask being
> nomask. 

Why ? nomask is just a trick to avoid unnecessary computations on a mask full of False that doesn't need updating.

> What happens if the view sets a new masked value and hence 
> changes from nomask to an boolean array ?
> How does the parent mask get  updated? 

Both implementations work the same way: the parent mask is not updated.

> I think the numpy implementation gets away with this by 
> returning a view of only the _data part if the ma mask is nomask 

By numpy implementation, you mean numpy.core.ma, right ? 
If so, then yes:
`self.__getitem__[i]` returns `self._data[i]` if the mask is nomask.

In maskedarray, if the mask is nomask, then
`self.__getitem__[i]` returns `self._data[i]` only if `self._data[i].size==1`, else it returns a masked array.

> I don't like this solution as I would expect a ma to be returned. Also I
> suspect that if the ma is to be a view of another ma, then in __new__
> a mask that is a boolean array of all False cannot be converted to
> nomask.

I'm not following you here: there's no `__new__` in numpy.core.ma (that's one of the reason why a masked array in numpy.core.ma is basically different from a ndarray...). And in maskedarray, a mask as array of `False` is set to `nomask` by default, but you can use the `flag` option: please check the documentation of `maskedarray.masked_array`: flag=True converts the mask, flags=False keeps an array of boolean.


One thing to remember is that masks tend to be copied more often than not. And I don't think it's advisable to modify the mask of the parent: it's no longer the same object, as the mask is now different ! In other terms, you could share data, you shouldn't share a mask. And I keep getting bitten with data sharing, that's why I had set the 'copy' flag to True by default.


> I like the new implementation of maskedarray, especially the focus on
> simplicity. The only simple solution I see is to have the mask be a
> boolean array at all times....

You haven't convinced me yet of why a mask of False is better than `nomask`. 
What don't you like in maskedarray (aka the new implementation) ?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20061121/dca85a83/attachment.html>


More information about the NumPy-Discussion mailing list