Re: [Numpy-discussion] new MaskedArray class

June 24, 2019

      On Mon, Jun 24, 2019 at 8:46 AM Allan Haldane <allanhaldane@gmail.com>
wrote:
...
1. Making a "no-clobber" guarantee on the underlying data
Hi Allan -- could kindly clarify what you mean by "no-clobber"?

Is this referring to allowing masked arrays to mutate masked data values
in-place, even on apparently non-in-place operators? If so, that definitely
seems like a bad idea to me. I would much rather do an unnecessary copy
than have surprisingly non-thread-safe operations.
...
If we agree that masked values will contain nonsense, it seems like a
bad idea for those values to be easily exposed.
Further, in all the comments so far I have not seen an example of a need
for unmasking that is not more easily, efficiently and safely achieved
by simply creating a new MaskedArray with a different mask.
My understanding is that essentially every low-level MaskedArray function
is implemented by looking at the data and mask separately. If so, we should
definitely expose this API directly to users (as part of the public API for
MaskedArray), so they can write their own efficient algorithms.

As a concrete example, suppose I wanted to implement a low-level "grouped
mean" operation for masked arrays like that found in pandas. This isn't a
builtin NumPy function, so I would need to write this myself. This would be
relatively straightforward to do in Numba or Cython with raw NumPy arrays
(e.g., see my example here for a NaN skipping version:
https://github.com/shoyer/numbagg/blob/v0.1.0/numbagg/grouped.py), but to
do it efficiently you definitely don't want to make an unnecessary copy.

The usual reason for hiding implementation details is when we want to
reserve the right to change them. But if we're sure about the data model
(which I think we are for MaskedArray) then I think there's a lot of value
in exposing it directly to users, even if it's lower level than it
appropriate to use in most cases.

Re: [Numpy-discussion] new MaskedArray class

Stephan Hoyer