[Numpy-discussion] new MaskedArray class

Sun Jun 23 11:11:20 EDT 2019

Hi Tom,

I think a sensible alternative mental model for the MaskedArray class is
>> that all it does is forward any operations to the data it holds and
>> separately propagate a mask,
>>
>
> I'm generally on-board with that mental picture, and agree that the
> use-case described by Ben (different layers of satellite imagery) is
> important.  Same thing happens in astronomy data, e.g. you have a CCD image
> of the sky and there are cosmic rays that contaminate the image.  Those are
> not garbage data, just pixels that one wants to ignore in some, but not
> all, contexts.
>
> However, it's worth noting that one cannot blindly forward any operations
> to the data it holds since the operation may be illegal on that data.  The
> simplest example is dividing `a / b` where  `b` has data values of 0 but
> they are masked.  That operation should succeed with no exception, and here
> the resultant value under the mask is genuinely garbage.
>

Even in the present implementation, the operation is just forwarded, with
numpy errstate set to ignore all errors. And then after the fact some
half-hearted remediation is done.

> The current MaskedArray seems a bit inconsistent in dealing with invalid
> calcuations.  Dividing by 0 (if masked) is no problem and returns the
> numerator.  Taking the log of a masked 0 gives the usual divide by zero
> RuntimeWarning and puts a 1.0 under the mask of the output.
>
> Perhaps the expression should not even be evaluated on elements where the
> output mask is True, and all the masked output data values should be set to
> a predictable value (e.g. zero for numerical, zero-length string for
> string, or maybe a default fill value).  That at least provides consistent
> and predictable behavior that is simple to explain.  Otherwise the story is
> that the data under the mask *might* be OK, unless for a particular element
> the computation was invalid in which case it is filled with some arbitrary
> value.  I think that is actually an error-prone behavior that should be
> avoided.
>

I think I agree with Allan here, that after a computation, one generally
simply cannot safely assume anything for masked elements.

But it is reasonable for subclasses to define what they want to do
"post-operation"; e.g., for numerical arrays, it might make generally make
sense to do
```
    notok = ~np.isfinite(result)
    mask |= notok
```
and one could then also do
```
    result[notok] = fill_value
```

But I think one might want to leave that to the user.

All the best,

Marten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190623/0a23bdcf/attachment.html>