[Numpy-discussion] new MaskedArray class
Marten van Kerkwijk
m.h.vankerkwijk at gmail.com
Wed Jun 19 22:19:49 EDT 2019
Hi Allan,
This is very impressive! I could get the tests that I wrote for my class
pass with yours using Quantity with what I would consider very minimal
changes. I only could not find a good way to unmask data (I like the idea
of setting the mask on some elements via `ma[item] = X`); is this on
purpose?
Anyway, it would seem easily at the point where I should comment on your
repository rather than in the mailing list!
All the best,
Marten
On Wed, Jun 19, 2019 at 5:45 PM Allan Haldane <allanhaldane at gmail.com>
wrote:
> On 6/18/19 2:04 PM, Marten van Kerkwijk wrote:
> >
> >
> > On Tue, Jun 18, 2019 at 12:55 PM Allan Haldane <allanhaldane at gmail.com
> > <mailto:allanhaldane at gmail.com>> wrote:
> > <snip>
> >
> > > This may be too much to ask from the initializer, but, if so, it
> still
> > > seems most useful if it is made as easy as possible to do, say,
> `class
> > > MaskedQuantity(Masked, Quantity): <very few overrides>`.
> >
> > Currently MaskedArray does not accept ducktypes as underlying arrays,
> > but I think it shouldn't be too hard to modify it to do so. Good
> idea!
> >
> >
> > Looking back at my trial, I see that I also never got to duck arrays -
> > only ndarray subclasses - though I tried to make the code as agnostic as
> > possible.
> >
> > (Trial at
> >
> https://github.com/astropy/astropy/compare/master...mhvk:utils-masked-class?expand=1
> )
> >
> > I already partly navigated this mixin-issue in the
> > "MaskedArrayCollection" class, which essentially does
> > ArrayCollection(MaskedArray(array)), and only takes about 30 lines of
> > boilerplate. That's the backwards encapsulation order from what you
> want
> > though.
> >
> >
> > Yes, indeed, from a quick trial `MaskedArray(np.arange(3.) * u.m,
> > mask=[True, False, False])` does indeed not have a `.unit` attribute
> > (and cannot represent itself...); I'm not at all sure that my method of
> > just creating a mixed class is anything but a recipe for disaster,
> though!
>
> Based on your suggestion I worked on this a little today, and now my
> MaskedArray more easily encapsulates both ducktypes and ndarray
> subclasses (pushed to repo). Here's an example I got working with masked
> units using unyt:
>
> [1]: from MaskedArray import X, MaskedArray, MaskedScalar
>
> [2]: from unyt import m, km
>
> [3]: import numpy as np
>
> [4]: uarr = MaskedArray([1., 2., 3.]*km, mask=[0,1,0])
>
> [5]: uarr
>
> MaskedArray([1., X , 3.])
> [6]: uarr + 1*m
>
> MaskedArray([1.001, X , 3.001])
> [7]: uarr.filled()
>
> unyt_array([1., 0., 3.], 'km')
> [8]: np.concatenate([uarr, 2*uarr]).filled()
> unyt_array([1., 0., 3., 2., 0., 6.], '(dimensionless)')
>
> The catch is the ducktype/subclass has to rigorously follow numpy's
> indexing rules, including distinguishing 0d arrays from scalars. For now
> only I used unyt in the example above since it happens to be less strict
> about dimensionless operations than astropy.units which trips up my
> repr code. (see below for example with astropy.units). Note in the last
> line I lost the dimensions, but that is because unyt does not handle
> np.concatenate. To get that to work we need a true ducktype for units.
>
> The example above doesn't expose the ".units" attribute outside the
> MaskedArray, and it doesn't print the units in the repr. But you can
> access them using "filled".
>
> While I could make MaskedArray forward unknown attribute accesses to the
> encapsulated array, that seems a bit dangerous/bug-prone at first
> glance, so probably I want to require the user to make a MaskedArray
> subclass to do so. I've just started playing with that (probably buggy),
> and Ive attached subclass examples for astropy.unit and unyt, with some
> example output below.
>
> Cheers,
> Allan
>
>
>
> Example using the attached astropy unit subclass:
>
> >>> from astropy.units import m, km, s
> >>> uarr = MaskedQ(np.ones(3), units=km, mask=[0,1,0])
> >>> uarr
> MaskedQ([1., X , 1.], units=km)
> >>> uarr.units
> km
> >>> uarr + (1*m)
> MaskedQ([1.001, X , 1.001], units=km)
> >>> uarr/(1*s)
> MaskedQ([1., X , 1.], units=km / s)
> >>> (uarr*(1*m))[1:]
> MaskedQ([X , 1.], units=km m)
> >>> np.add.outer(uarr, uarr)
> MaskedQ([[2., X , 2.],
> [X , X , X ],
> [2., X , 2.]], units=km)
> >>> print(uarr)
> [1. X 1.] km m
>
> Cheers,
> Allan
>
>
> > > Even if this impossible, I think it is conceptually useful to think
> > > about what the masking class should do. My sense is that, e.g., it
> > > should not attempt to decide when an operation succeeds or not,
> > but just
> > > "or together" input masks for regular, multiple-input functions,
> > and let
> > > the underlying arrays skip elements for reductions by using `where`
> > > (hey, I did implement that for a reason... ;-). In particular, it
> > > suggests one should not have things like domains and all that (I
> never
> > > understood why `MaskedArray` did that). If one wants more, the
> class
> > > should provide a method that updates the mask (a sensible default
> > might
> > > be `mask |= ~np.isfinite(result)` - here, the class being masked
> > should
> > > logically support ufuncs and functions, so it can decide what
> > "isfinite"
> > > means).
> >
> > I agree it would be nice to remove domains. It would make life
> easier,
> > and I could remove a lot of twiddly code! I kept it in for now to
> > minimize the behavior changes from the old MaskedArray.
> >
> >
> > That makes sense. Could be separated out to a backwards-compatibility
> > class later.
> >
> >
> > > In any case, I would think that a basic truth should be that
> > everything
> > > has a mask with a shape consistent with the data, so
> > > 1. Each complex numbers has just one mask, and setting `a.imag`
> with a
> > > masked array should definitely propagate the mask.
> > > 2. For a masked array with structured dtype, I'd similarly say
> > that the
> > > default is for a mask to have the same shape as the array. But that
> > > something like your collection makes sense for the case where one
> > wants
> > > to mask items in a structure.
> >
> > Agreed that we should have a single bool per complex or structured
> > element, and the mask shape is the same as the array shape. That's
> how I
> > implemented it. But there is still a problem with complex.imag
> > assignment:
> >
> > >>> a = MaskedArray([1j, 2, X])
> > >>> i = a.imag
> > >>> i[:] = MaskedArray([1, X, 1])
> >
> > If we make the last line copy the mask to the original array, what
> > should the real part of a[2] be? Conversely, if we don't copy the
> mask,
> > what should the imag part of a[1] be? It seems like we might "want"
> the
> > masks to be OR'd instead, but then should i[2] be masked after we
> just
> > set it to 1?
> >
> > Ah, I see the issue now... Easiest to implement and closest in analogy
> > to a regular view would be to just let it unmask a[2] (with whatever is
> > in real; user beware!).
> >
> > Perhaps better would be to special-case such that `imag` returns a
> > read-only view of the mask. Making `imag` itself read-only would prevent
> > possibly reasonable things like `i[np.isclose(i, 0)] = 0` - but there is
> > no reason this should update the mask.
> >
> > Still, neither is really satisfactory...
> >
> >
> >
> > > p.s. I started trying to implement the above "Mixin" class; will
> > try to
> > > clean that up a bit so that at least it uses `where` and push it
> up.
> >
> > I played with "where", but didn't include it since 1.17 is not
> released.
> > To avoid duplication of effort, I've attached a diff of what I
> tried. I
> > actually get a slight slowdown of about 10% by using where...
> >
> >
> > Your implementation is indeed quite similar to what I got in
> > __array_ufunc__ (though one should "&" the where with ~mask).
> >
> > I think the main benefit is not to presume that whatever is underneath
> > understands 0 or 1, i.e., avoid filling.
> >
> >
> > If you make progress with the mixin, a push is welcome. I imagine a
> > problem is going to be that np.isscalar doesn't work to detect duck
> > scalars.
> >
> > I fear that in my attempts I've simply decided that only array scalars
> > exist...
> >
> > -- Marten
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190619/2ba2f2a6/attachment-0001.html>
More information about the NumPy-Discussion
mailing list