[Numpy-discussion] new MaskedArray class

Marten van Kerkwijk m.h.vankerkwijk at gmail.com
Mon Jun 24 15:09:00 EDT 2019


Hi Allan,

Thanks for bringing up the noclobber explicitly (and Stephan for asking for
clarification; I was similarly confused).

It does clarify the difference in mental picture. In mine, the operation
would indeed be guaranteed to be done on the underlying data, without copy
and without `.filled(...)`. I should clarify further that I use `where`
only to skip reading elements (i.e., in reductions), not writing elements
(as you mention, the unwritten element will often be nonsense - e.g., wrong
units - which to me is worse than infinity or something similar; I've not
worried at all about runtime warnings). Note that my main reason here is
not that I'm against filling with numbers for numerical arrays, but rather
wanting to make minimal assumptions about the underlying data itself. This
may well be a mistake (but I want to find out where it breaks).

Anyway, it would seem in many ways all the better that our models are quite
different. I definitely see the advantages of your choice to decide one can
do with masked data elements whatever is logical ahead of an operation!

Thanks also for bringing up a useful example with `np.dot(m, m)` - clearly,
I didn't yet get beyond overriding ufuncs!

In my mental model, where I'd apply `np.dot` on the data and the mask
separately, the result will be wrong, so the mask has to be set (which it
would be). For your specific example, that might not be the best solution,
but when using `np.dot(matrix_shaped, matrix_shaped)`, I think it does give
the correct masking: any masked element in a matrix better propagate to all
parts that it influences, even if there is a reduction of sorts happening.
So, perhaps a price to pay for a function that tries to do multiple things.

The alternative solution in my model would be to replace `np.dot` with a
masked-specific implementation of what `np.dot` is supposed to stand for
(in your simple example, `np.add.reduce(np.multiply(m, m))` - more
generally, add relevant `outer` and `axes`). This would be similar to what
I think all implementations do for `.mean()` - we cannot calculate that
from the data using any fill value or skipping, so rather use a more easily
cared-for `.sum()` and divide by a suitable number of elements. But in both
examples the disadvantage is that we took away the option to use the
underlying class's `.dot()` or `.mean()` implementations.

(Aside: considerations such as these underlie my longed-for exposure of
standard implementations of functions in terms of basic ufunc calls.)

Another example of a function for which I think my model is not
particularly insightful (and for which it is difficult to know what to do
generally) is `np.fft.fft`. Since an fft is equivalent to a sine/cosine
fits to data points, the answer for masked data is in principle quite
well-defined. But much less easy to implement!

All the best,

Marten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190624/a44b16f4/attachment.html>


More information about the NumPy-Discussion mailing list