Re: [Numpy-discussion] new MaskedArray class

June 23, 2019

      Hi Tom,

I think a sensible alternative mental model for the MaskedArray class is
...
...
that all it does is forward any operations to the data it holds and
separately propagate a mask,
I'm generally on-board with that mental picture, and agree that the
use-case described by Ben (different layers of satellite imagery) is
important.  Same thing happens in astronomy data, e.g. you have a CCD image
of the sky and there are cosmic rays that contaminate the image.  Those are
not garbage data, just pixels that one wants to ignore in some, but not
all, contexts.
However, it's worth noting that one cannot blindly forward any operations
to the data it holds since the operation may be illegal on that data.  The
simplest example is dividing `a / b` where  `b` has data values of 0 but
they are masked.  That operation should succeed with no exception, and here
the resultant value under the mask is genuinely garbage.
Even in the present implementation, the operation is just forwarded, with
numpy errstate set to ignore all errors. And then after the fact some
half-hearted remediation is done.
...
The current MaskedArray seems a bit inconsistent in dealing with invalid
calcuations.  Dividing by 0 (if masked) is no problem and returns the
numerator.  Taking the log of a masked 0 gives the usual divide by zero
RuntimeWarning and puts a 1.0 under the mask of the output.
Perhaps the expression should not even be evaluated on elements where the
output mask is True, and all the masked output data values should be set to
a predictable value (e.g. zero for numerical, zero-length string for
string, or maybe a default fill value).  That at least provides consistent
and predictable behavior that is simple to explain.  Otherwise the story is
that the data under the mask *might* be OK, unless for a particular element
the computation was invalid in which case it is filled with some arbitrary
value.  I think that is actually an error-prone behavior that should be
avoided.
I think I agree with Allan here, that after a computation, one generally
simply cannot safely assume anything for masked elements.

But it is reasonable for subclasses to define what they want to do
"post-operation"; e.g., for numerical arrays, it might make generally make
sense to do
```
    notok = ~np.isfinite(result)
    mask |= notok
```
and one could then also do
```
    result[notok] = fill_value
```

But I think one might want to leave that to the user.

All the best,

Marten

Re: [Numpy-discussion] new MaskedArray class

Marten van Kerkwijk