[Numpy-discussion] new MaskedArray class

Marten van Kerkwijk m.h.vankerkwijk at gmail.com
Tue Jun 18 10:06:17 EDT 2019


Hi Allen,

Thanks for the message and link! In astropy, we've been struggling with
masking a lot, and one of the main conclusions I have reached is that
ideally one has a more abstract `Masked` class that can take any type of
data (including `ndarray`, of course), and behaves like that data as much
as possible, to the extent that if, e.g., I create a `Masked(Quantity(...,
unit), mask=...)`, the instance will have a `.unit` attribute and perhaps
even `isinstance(..., Quantity)` will hold. And similarly for
`Masked(Time(...), mask=...)`, `Masked(SkyCoord(...), mask=...)`, etc. In a
way, `Masked` would be a kind of Mixin-class that just tracks a mask
attribute.

This may be too much to ask from the initializer, but, if so, it still
seems most useful if it is made as easy as possible to do, say, `class
MaskedQuantity(Masked, Quantity): <very few overrides>`.

Even if this impossible, I think it is conceptually useful to think about
what the masking class should do. My sense is that, e.g., it should not
attempt to decide when an operation succeeds or not, but just "or together"
input masks for regular, multiple-input functions, and let the underlying
arrays skip elements for reductions by using `where` (hey, I did implement
that for a reason... ;-). In particular, it suggests one should not have
things like domains and all that (I never understood why `MaskedArray` did
that). If one wants more, the class should provide a method that updates
the mask (a sensible default might be `mask |= ~np.isfinite(result)` -
here, the class being masked should logically support ufuncs and functions,
so it can decide what "isfinite" means).

In any case, I would think that a basic truth should be that everything has
a mask with a shape consistent with the data, so
1. Each complex numbers has just one mask, and setting `a.imag` with a
masked array should definitely propagate the mask.
2. For a masked array with structured dtype, I'd similarly say that the
default is for a mask to have the same shape as the array. But that
something like your collection makes sense for the case where one wants to
mask items in a structure.

All the best,

Marten

p.s. I started trying to implement the above "Mixin" class; will try to
clean that up a bit so that at least it uses `where` and push it up.



On Mon, Jun 17, 2019 at 6:43 PM Allan Haldane <allanhaldane at gmail.com>
wrote:

> Hi all,
>
> Chuck suggested we think about a MaskedArray replacement for 1.18.
>
> A few months ago I did some work on a MaskedArray replacement using
> `__array_function__`, which I got mostly working. It seems like a good
> time to bring it up for discussion now. See it at:
>
> https://github.com/ahaldane/ndarray_ducktypes
>
> It should be very usable, it has docs you can read, and it passes a
> pytest-suite with 1024 tests adapted from numpy's MaskedArray tests.
> What is missing? It needs even more tests for new functionality, and a
> couple numpy-API functions are missing, in particular `np.median`,
> `np.percentile`, `np.cov`, and `np.corrcoef`. I'm sure other devs could
> also find many things to improve too.
>
> Besides fixing many little annoyances from MaskedArray, and simplifying
> the logic by always storing the mask in full, it also has new features.
> For instance it allows the use of a "X" variable to mark masked
> locations during array construction, and I solve the issue of how to
> mask individual fields of a structured array differently.
>
> At this point I would by happy to get some feedback on the design and
> what seems good or bad. If it seems like a good start, I'd be happy to
> move it into a numpy repo of some sort for further collaboration &
> discussion, and maybe into 1.18. At the least I hope it can serve as a
> design study of what we could do.
>
>
>
>
>
> Let me also drop here two more interesting detailed issues:
>
> First, the issue of what to do about .real and .imag of complex arrays,
> and similarly about field-assignment of structured arrays. The problem
> is that we have a single mask bool per element of a complex array, but
> what if someone does `arr.imag = MaskedArray([1,X,1])`? How should the
> mask of the original array change? Should we make .real and .imag readonly?
>
> Second, a more general issue of how to ducktype scalars when using
> `__array_function__` which I think many ducktype implementors will have
> to face. For MaskedArray, I created an associated "MaskedScalar" type.
> However, MaskedScalar has to behave differently from normal numpy
> scalars in a number of ways: It is not part of the numpy scalar
> hierarchy, it fails checks `isinstance(var, np.floating)`, and
> np.isscalar returns false. Numpy scalar types cannot be subclassed. We
> have discussed before the need to have distinction between 0d-arrays and
> scalars, so we shouldn't just use a 0d (in fact, this makes printing
> very difficult). This leads me to think that in future dtype-overhaul
> plans, we should consider creating a subclassable `np.scalar` base type
> to wrap all numpy scalar variables, and code like `isinstance(var,
> np.floating)` should be replaced by `isinstance(var.dtype.type,
> np.floating)` or similar. That is, the numeric dtype of the scalar is no
> longer encoded in `type(var)` but in `var.dtype`: The fact that the
> variable is a numpy scalar is decoupled from its numeric dtype.
>
> This is useful because there are many "associated" properties of scalars
> in common with arrays which have nothing to do with the dtype, which
> ducktype implementors want to touch. I imagine this will come up a lot:
> In that repo I also have an "ArrayCollection" ducktype which required a
> "CollectionScalar" scalar, and similarly I imagine people implementing
> units want the units attached to the scalar, independently of the dtype.
>
> Cheers,
> Allan
>
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190618/be963059/attachment.html>


More information about the NumPy-Discussion mailing list