[Numpy-discussion] A reimplementation of MaskedArray

Wed Nov 22 01:12:11 EST 2006

If I make some minor changes (below) to MaskedArray get and setitem

from numpyext.maskedarray import *

a = array([[1,2,3,4,5],[6,7,8,9,10]], mask=nomask)
suba = a[0]
suba[1] = masked
print a
>[[1 -- 3 4 5]
> [6 7 8 9 10]]
print suba
>[1 -- 3 4 5]

suba = a[1]
suba[1] = 10
print a
>[[1 -- 3 4 5]
> [6 10 8 9 10]]
print suba
>[6 10 8 9 10]

a = array([[1,2,3,4,5],[6,7,8,9,10]], mask=[[0,0,0,0,0],[0,1,0,0,0]])
suba = a[0]
suba[1] = masked
print a
>[[1 -- 3 4 5]
> [6 -- 8 9 10]]
print suba
>[1 -- 3 4 5]

suba = a[1]
suba[1] = 10
print a
>[[1 -- 3 4 5]
> [6 10 8 9 10]]
print suba
>[6 10 8 9 10]

### in MaskedArray ###
    def __getitem__(self, i):
        """x.__getitem__(y) <==> x[y]
Returns the item described by i. Not a copy as in previous versions.
        """
        dout = self._data[i]
        if self._mask is nomask:
            if numeric.size(dout)==1:
                return dout
            else:
                ### made changes here
                self._mask = numeric.zeros(self.shape, dtype=MaskType)
                mout = self._mask[i]
                return self.__class__(dout, mask=mout,
fill_value=self._fill_value, copy=False, flag=False)
                ### -------------
        #....
#        m = self._mask.copy()
        m = self._mask
        mi = m[i]
        if mi.size == 1:
            if mi:
                return masked
            else:
                return dout
        else:
            ### made changes here
            return self.__class__(dout, mask=mi,
fill_value=self._fill_value, copy=False, flag=False)

    def __setitem__(self, index, value):
        """x.__setitem__(i, y) <==> x[i]=y
Sets item described by index. If value is masked, masks those locations.
        """
        d = self._data
        if self is masked:
            raise MAError, 'Cannot alter the masked element.'
        #....
        if value is masked:
            if self._mask is nomask:
                _mask = make_mask_none(d.shape)
            else:
                _mask = self._mask
#===============================================================================
#                #why does the mask need to be copied?
#                _mask = self._mask.copy()
#===============================================================================
            _mask[index] = True
            self._mask = _mask
            return
        #....
        m = getmask(value)
        value = filled(value).astype(d.dtype)
        d[index] = value
        if m is nomask:
            if self._mask is not nomask:
#===============================================================================
#                #why does the mask need to be copied?
#                _mask = self._mask.copy()
#===============================================================================
                _mask = self._mask
                _mask[index] = False
            else:
                _mask = nomask
        else:
            if self._mask is nomask:
                _mask = make_mask_none(d.shape)
            else:
                _mask = self._mask
#===============================================================================
#                #why does the mask need to be copied?
#                _mask = self._mask.copy()
#===============================================================================
            _mask[index] = m
        self._mask = _mask


On 11/22/06, Michael Sorich <michael.sorich at gmail.com> wrote:
> Perhaps an example will help explain what I mean
>
> For the case of an ndarray if you select a row and then alter the new
> array, the old array
> is also changed.
>
> from numpy import *
> a = array([[1,2,3,4,5],[1,2,3,4,5],[1,2,3,4,5]])
> suba = a[2]
> suba[1] = 10
> print a
> print suba
> --output--
> [[ 1  2  3  4  5]
>  [ 1  2  3  4  5]
>  [ 1 10  3  4  5]]
> [ 1 10  3  4  5]
>
> In the current version of maskedarray in numpy, changes in the row
> array affect the parent array. Here whenever you select a single row
> or column and mask is nomask a ndarray is returned not a masked array
>
> from numpy.core.ma import *
> a = array([[1,2,3,4,5],[1,2,3,4,5],[1,2,3,4,5]], mask=nomask)
> suba = a[2]
> suba[1] = 10
> print a
> print suba
> print type(suba)
> --output--
> [[1 2 3 4 5]
>  [1 2 3 4 5]
>  [1 10 3 4 5]]
> [ 1 10  3  4  5]
> <type 'numpy.ndarray'>
>
> However if mask is anything other than nomask (even if the mask is an
> boolean array in which all the values are false- which means the same
> thing as nomask) a masked array is returned. Once again the data is
> shared between the arrays
>
> from numpy.core.ma import *
> a = array([[1,2,3,4,5],[1,2,3,4,5],[1,2,3,4,5]],
> mask=[[0,0,0,0,0],[0,0,0,0,0],[0,0,0,0,0]])
> suba = a[2]
> suba[1] = 10
> print a
> print suba
> print type(suba)
> --output--
> [[1 2 3 4 5]
>  [1 2 3 4 5]
>  [1 10 3 4 5]]
> [1 10 3 4 5]
> <class 'numpy.core.ma.MaskedArray'>
>
> Unfortunately if the value is changed to masked, this is not updated
> in the parent array. This seems very inconsistent. I don't view masked
> values any different than any other value.
>
> a = array([[1,2,3,4,5],[1,2,3,4,5],[1,2,3,4,5]],
> mask=[[0,0,0,0,0],[0,0,0,0,0],[0,0,0,0,0]])
> suba = a[2]
> suba[1] = masked
> print a
> print suba
> print type(suba)
> --output--
> [[1 2 3 4 5]
>  [1 2 3 4 5]
>  [1 2 3 4 5]]
> [1 -- 3 4 5]
> <class 'numpy.core.ma.MaskedArray'>
>
> With the new implementation, the data is not shared for any of the 3
> variations above. I am happy that the arrays acts consistently,
> however this is different from the ndarray. It would be nice if the
> behavior was the same as the ndarray, but it is better than the
> numpy.core.ma implementation. If this is on purpose, then it should be
> documented.
>
> from numpyext.maskedarray import *
> a = array([[1,2,3,4,5],[1,2,3,4,5],[1,2,3,4,5]], mask=nomask)
> suba = a[2]
> suba[1] = 10
> print a
> print suba
> print type(suba)
> --output--
> [[1 2 3 4 5]
>  [1 2 3 4 5]
>  [1 2 3 4 5]]
> [ 1 10  3  4  5]
> <class 'numpyext.maskedarray.MaskedArray'>
>
>
> On 11/22/06, Pierre GM <pgmdevlist at gmail.com> wrote:
> >
> >
> > On Tuesday 21 November 2006 21:11, Michael Sorich wrote:
> >
> > > I think that the new implementation is making a copy of the data with
> >
> > > indexing a MA. This is different from both ndarray and the existing
> >
> > > numpy ma version.
> >
> >
> >
> > Michael,
> >
> > If you check the definition of MaskedArray.__new__, you'll see that the
> > "copy" argument is set to True by default. Setting it to "false" seems to
> > give what you expect. Should I make the default ?
> >
> >
> >
> > > Having subviews of the mask seems complicated with the mask being
> >
> > > nomask.
> >
> >
> >
> > Why ? nomask is just a trick to avoid unnecessary computations on a mask
> > full of False that doesn't need updating.
> >
> >
> >
> > > What happens if the view sets a new masked value and hence
> >
> > > changes from nomask to an boolean array ?
> >
> > > How does the parent mask get updated?
> >
> >
> >
> > Both implementations work the same way: the parent mask is not updated.
> >
> >
> >
> > > I think the numpy implementation gets away with this by
> >
> > > returning a view of only the _data part if the ma mask is nomask
> >
> >
> >
> > By numpy implementation, you mean numpy.core.ma, right ?
> >
> > If so, then yes:
> >
> > `self.__getitem__[i]` returns `self._data[i]` if the mask is nomask.
> >
> >
> >
> > In maskedarray, if the mask is nomask, then
> >
> > `self.__getitem__[i]` returns `self._data[i]` only if
> > `self._data[i].size==1`, else it returns a masked array.
> >
> >
> >
> > > I don't like this solution as I would expect a ma to be returned. Also I
> >
> > > suspect that if the ma is to be a view of another ma, then in __new__
> >
> > > a mask that is a boolean array of all False cannot be converted to
> >
> > > nomask.
> >
> >
> >
> > I'm not following you here: there's no `__new__` in numpy.core.ma (that's
> > one of the reason why a masked array in numpy.core.ma is basically different
> > from a ndarray...). And in maskedarray, a mask as array of `False` is set to
> > `nomask` by default, but you can use the `flag` option: please check the
> > documentation of `maskedarray.masked_array`: flag=True converts the mask,
> > flags=False keeps an array of boolean.
> >
> >
> >
> >
> >
> > One thing to remember is that masks tend to be copied more often than not.
> > And I don't think it's advisable to modify the mask of the parent: it's no
> > longer the same object, as the mask is now different ! In other terms, you
> > could share data, you shouldn't share a mask. And I keep getting bitten with
> > data sharing, that's why I had set the 'copy' flag to True by default.
> >
> >
> >
> >
> >
> > > I like the new implementation of maskedarray, especially the focus on
> >
> > > simplicity. The only simple solution I see is to have the mask be a
> >
> > > boolean array at all times....
> >
> >
> >
> > You haven't convinced me yet of why a mask of False is better than `nomask`.
> >
> > What don't you like in maskedarray (aka the new implementation) ?
> >
> >
>