[Numpy-discussion] new MaskedArray class

Mon Jun 24 19:54:09 EDT 2019

Hi Allan,

> The alternative solution in my model would be to replace `np.dot` with a
> > masked-specific implementation of what `np.dot` is supposed to stand for
> > (in your simple example, `np.add.reduce(np.multiply(m, m))` - more
> > generally, add relevant `outer` and `axes`). This would be similar to
> > what I think all implementations do for `.mean()` - we cannot calculate
> > that from the data using any fill value or skipping, so rather use a
> > more easily cared-for `.sum()` and divide by a suitable number of
> > elements. But in both examples the disadvantage is that we took away the
> > option to use the underlying class's `.dot()` or `.mean()`
> implementations.
>
> Just to note, my current implementation uses the IGNORE style of mask,
> so does not propagate the mask in np.dot:
>
>     >>> a = MaskedArray([[1,1,1], [1,X,1], [1,1,1]])
>     >>> np.dot(a, a)
>
>     MaskedArray([[3, 2, 3],
>                  [2, 2, 2],
>                  [3, 2, 3]])
>
> I'm not at all set on that behavior and we can do something else. For
> now, I chose this way since it seemed to best match the "IGNORE" mask
> behavior.
>

It is a nice example, I think. In terms of action on the data, one would
get this result as well in my pseudo-representation of
`np.add.reduce(np.multiply(m, m))` - as long as the multiply is taken as
outer product along the relevant axes (which does not ignore the mask,
i.e., if either element is masked, the product is too), and subsequently a
sum which works like other reductions and skips masked elements.

>From the FFT array multiplication analogy, though, it is not clear that,
effectively, replacing masked elements by 0 is reasonable.

Equivalently, thinking of `np.dot` in its 1-D form as presenting the length
of the projection of one vector along another, it is not clear what a
single masked element is supposed to mean. In a way, masking just one
element of a vector or of a matrix makes vector or matrix operations
meaningless.

I thought fitting data with a mask might give a counterexample, but in that
one usually calculates at some point r = y - A x, so no masking of the
matrix, and subtraction y-Ax passing on a mask, and summing of r ignoring
masked elements does just the right thing.

All the best,

Marten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190624/4c163dd5/attachment.html>