[Numpy-discussion] new MaskedArray class

Marten van Kerkwijk m.h.vankerkwijk at gmail.com
Wed Jun 19 22:19:49 EDT 2019


Hi Allan,

This is very impressive! I could get the tests that I wrote for my class
pass with yours using Quantity with what I would consider very minimal
changes. I only could not find a good way to unmask data (I like the idea
of setting the mask on some elements via `ma[item] = X`); is this on
purpose?

Anyway, it would seem easily at the point where I should comment on your
repository rather than in the mailing list!

All the best,

Marten


On Wed, Jun 19, 2019 at 5:45 PM Allan Haldane <allanhaldane at gmail.com>
wrote:

> On 6/18/19 2:04 PM, Marten van Kerkwijk wrote:
> >
> >
> > On Tue, Jun 18, 2019 at 12:55 PM Allan Haldane <allanhaldane at gmail.com
> > <mailto:allanhaldane at gmail.com>> wrote:
> > <snip>
> >
> >     > This may be too much to ask from the initializer, but, if so, it
> still
> >     > seems most useful if it is made as easy as possible to do, say,
> `class
> >     > MaskedQuantity(Masked, Quantity): <very few overrides>`.
> >
> >     Currently MaskedArray does not accept ducktypes as underlying arrays,
> >     but I think it shouldn't be too hard to modify it to do so. Good
> idea!
> >
> >
> > Looking back at my trial, I see that I also never got to duck arrays -
> > only ndarray subclasses - though I tried to make the code as agnostic as
> > possible.
> >
> > (Trial at
> >
> https://github.com/astropy/astropy/compare/master...mhvk:utils-masked-class?expand=1
> )
> >
> >     I already partly navigated this mixin-issue in the
> >     "MaskedArrayCollection" class, which essentially does
> >     ArrayCollection(MaskedArray(array)), and only takes about 30 lines of
> >     boilerplate. That's the backwards encapsulation order from what you
> want
> >     though.
> >
> >
> > Yes, indeed, from a quick trial `MaskedArray(np.arange(3.) * u.m,
> > mask=[True, False, False])` does indeed not have a `.unit` attribute
> > (and cannot represent itself...); I'm not at all sure that my method of
> > just creating a mixed class is anything but a recipe for disaster,
> though!
>
> Based on your suggestion I worked on this a little today, and now my
> MaskedArray more easily encapsulates both ducktypes and ndarray
> subclasses (pushed to repo). Here's an example I got working with masked
> units using unyt:
>
> [1]: from MaskedArray import X, MaskedArray, MaskedScalar
>
> [2]: from unyt import m, km
>
> [3]: import numpy as np
>
> [4]: uarr = MaskedArray([1., 2., 3.]*km, mask=[0,1,0])
>
> [5]: uarr
>
> MaskedArray([1., X , 3.])
> [6]: uarr + 1*m
>
> MaskedArray([1.001, X    , 3.001])
> [7]: uarr.filled()
>
> unyt_array([1., 0., 3.], 'km')
> [8]: np.concatenate([uarr, 2*uarr]).filled()
> unyt_array([1., 0., 3., 2., 0., 6.], '(dimensionless)')
>
> The catch is the ducktype/subclass has to rigorously follow numpy's
> indexing rules, including distinguishing 0d arrays from scalars. For now
> only I used unyt in the example above since it happens to be less strict
>  about dimensionless operations than astropy.units which trips up my
> repr code. (see below for example with astropy.units). Note in the last
> line I lost the dimensions, but that is because unyt does not handle
> np.concatenate. To get that to work we need a true ducktype for units.
>
> The example above doesn't expose the ".units" attribute outside the
> MaskedArray, and it doesn't print the units in the repr. But you can
> access them using "filled".
>
> While I could make MaskedArray forward unknown attribute accesses to the
> encapsulated array, that seems a bit dangerous/bug-prone at first
> glance, so probably I want to require the user to make a MaskedArray
> subclass to do so. I've just started playing with that (probably buggy),
> and Ive attached subclass examples for astropy.unit and unyt, with some
> example output below.
>
> Cheers,
> Allan
>
>
>
> Example using the attached astropy unit subclass:
>
>     >>> from astropy.units import m, km, s
>     >>> uarr = MaskedQ(np.ones(3), units=km, mask=[0,1,0])
>     >>> uarr
>     MaskedQ([1., X , 1.], units=km)
>     >>> uarr.units
>     km
>     >>> uarr + (1*m)
>     MaskedQ([1.001, X    , 1.001], units=km)
>     >>> uarr/(1*s)
>     MaskedQ([1., X , 1.], units=km / s)
>     >>> (uarr*(1*m))[1:]
>     MaskedQ([X , 1.], units=km m)
>     >>> np.add.outer(uarr, uarr)
>     MaskedQ([[2., X , 2.],
>              [X , X , X ],
>              [2., X , 2.]], units=km)
>     >>> print(uarr)
>     [1. X  1.] km m
>
> Cheers,
> Allan
>
>
> >     > Even if this impossible, I think it is conceptually useful to think
> >     > about what the masking class should do. My sense is that, e.g., it
> >     > should not attempt to decide when an operation succeeds or not,
> >     but just
> >     > "or together" input masks for regular, multiple-input functions,
> >     and let
> >     > the underlying arrays skip elements for reductions by using `where`
> >     > (hey, I did implement that for a reason... ;-). In particular, it
> >     > suggests one should not have things like domains and all that (I
> never
> >     > understood why `MaskedArray` did that). If one wants more, the
> class
> >     > should provide a method that updates the mask (a sensible default
> >     might
> >     > be `mask |= ~np.isfinite(result)` - here, the class being masked
> >     should
> >     > logically support ufuncs and functions, so it can decide what
> >     "isfinite"
> >     > means).
> >
> >     I agree it would be nice to remove domains. It would make life
> easier,
> >     and I could remove a lot of twiddly code! I kept it in for now to
> >     minimize the behavior changes from the old MaskedArray.
> >
> >
> > That makes sense. Could be separated out to a backwards-compatibility
> > class later.
> >
> >
> >     > In any case, I would think that a basic truth should be that
> >     everything
> >     > has a mask with a shape consistent with the data, so
> >     > 1. Each complex numbers has just one mask, and setting `a.imag`
> with a
> >     > masked array should definitely propagate the mask.
> >     > 2. For a masked array with structured dtype, I'd similarly say
> >     that the
> >     > default is for a mask to have the same shape as the array. But that
> >     > something like your collection makes sense for the case where one
> >     wants
> >     > to mask items in a structure.
> >
> >     Agreed that we should have a single bool per complex or structured
> >     element, and the mask shape is the same as the array shape. That's
> how I
> >     implemented it. But there is still a problem with complex.imag
> >     assignment:
> >
> >         >>> a = MaskedArray([1j, 2, X])
> >         >>> i = a.imag
> >         >>> i[:] = MaskedArray([1, X, 1])
> >
> >     If we make the last line copy the mask to the original array, what
> >     should the real part of a[2] be? Conversely, if we don't copy the
> mask,
> >     what should the imag part of a[1] be? It seems like we might "want"
> the
> >     masks to be OR'd instead, but then should i[2] be masked after we
> just
> >     set it to 1?
> >
> > Ah, I see the issue now... Easiest to implement and closest in analogy
> > to a regular view would be to just let it unmask a[2] (with whatever is
> > in real; user beware!).
> >
> > Perhaps better would be to special-case such that `imag` returns a
> > read-only view of the mask. Making `imag` itself read-only would prevent
> > possibly reasonable things like `i[np.isclose(i, 0)] = 0` - but there is
> > no reason this should update the mask.
> >
> > Still, neither is really satisfactory...
> >
> >
> >
> >     > p.s. I started trying to implement the above "Mixin" class; will
> >     try to
> >     > clean that up a bit so that at least it uses `where` and push it
> up.
> >
> >     I played with "where", but didn't include it since 1.17 is not
> released.
> >     To avoid duplication of effort, I've attached a diff of what I
> tried. I
> >     actually get a slight slowdown of about 10% by using where...
> >
> >
> > Your implementation is indeed quite similar to what I got in
> > __array_ufunc__ (though one should "&" the where with ~mask).
> >
> > I think the main benefit is not to presume that whatever is underneath
> > understands 0 or 1, i.e., avoid filling.
> >
> >
> >     If you make progress with the mixin, a push is welcome. I imagine a
> >     problem is going to be that np.isscalar doesn't work to detect duck
> >     scalars.
> >
> > I fear that in my attempts I've simply decided that only array scalars
> > exist...
> >
> > -- Marten
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190619/2ba2f2a6/attachment-0001.html>


More information about the NumPy-Discussion mailing list