<div dir="ltr"><div>Hi Allan,</div><div><br></div><div>This is very impressive! I could get the tests that I wrote for my class pass with yours using Quantity with what I would consider very minimal changes. I only could not find a good way to unmask data (I like the idea of setting the mask on some elements via `ma[item] = X`); is this on purpose?<br></div><div><br></div><div>Anyway, it would seem easily at the point where I should comment on your repository rather than in the mailing list!</div><div><br></div><div>All the best,</div><div><br></div><div>Marten</div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Jun 19, 2019 at 5:45 PM Allan Haldane <<a href="mailto:allanhaldane@gmail.com">allanhaldane@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On 6/18/19 2:04 PM, Marten van Kerkwijk wrote:<br>

> <br>

> <br>

> On Tue, Jun 18, 2019 at 12:55 PM Allan Haldane <<a href="mailto:allanhaldane@gmail.com" target="_blank">allanhaldane@gmail.com</a><br>

> <mailto:<a href="mailto:allanhaldane@gmail.com" target="_blank">allanhaldane@gmail.com</a>>> wrote:<br>

> <snip><br>

> <br>

>     > This may be too much to ask from the initializer, but, if so, it still<br>

>     > seems most useful if it is made as easy as possible to do, say, `class<br>

>     > MaskedQuantity(Masked, Quantity): <very few overrides>`.<br>

> <br>

>     Currently MaskedArray does not accept ducktypes as underlying arrays,<br>

>     but I think it shouldn't be too hard to modify it to do so. Good idea!<br>

> <br>

> <br>

> Looking back at my trial, I see that I also never got to duck arrays -<br>

> only ndarray subclasses - though I tried to make the code as agnostic as<br>

> possible.<br>

> <br>

> (Trial at<br>

> <a href="https://github.com/astropy/astropy/compare/master...mhvk:utils-masked-class?expand=1" rel="noreferrer" target="_blank">https://github.com/astropy/astropy/compare/master...mhvk:utils-masked-class?expand=1</a>)<br>

> <br>

>     I already partly navigated this mixin-issue in the<br>

>     "MaskedArrayCollection" class, which essentially does<br>

>     ArrayCollection(MaskedArray(array)), and only takes about 30 lines of<br>

>     boilerplate. That's the backwards encapsulation order from what you want<br>

>     though.<br>

> <br>

> <br>

> Yes, indeed, from a quick trial `MaskedArray(np.arange(3.) * u.m,<br>

> mask=[True, False, False])` does indeed not have a `.unit` attribute<br>

> (and cannot represent itself...); I'm not at all sure that my method of<br>

> just creating a mixed class is anything but a recipe for disaster, though!<br>

<br>

Based on your suggestion I worked on this a little today, and now my<br>

MaskedArray more easily encapsulates both ducktypes and ndarray<br>

subclasses (pushed to repo). Here's an example I got working with masked<br>

units using unyt:<br>

<br>

[1]: from MaskedArray import X, MaskedArray, MaskedScalar<br>

<br>

[2]: from unyt import m, km<br>

<br>

[3]: import numpy as np<br>

<br>

[4]: uarr = MaskedArray([1., 2., 3.]*km, mask=[0,1,0])<br>

<br>

[5]: uarr<br>

<br>

MaskedArray([1., X , 3.])<br>

[6]: uarr + 1*m<br>

<br>

MaskedArray([1.001, X    , 3.001])<br>

[7]: uarr.filled()<br>

<br>

unyt_array([1., 0., 3.], 'km')<br>

[8]: np.concatenate([uarr, 2*uarr]).filled()<br>

unyt_array([1., 0., 3., 2., 0., 6.], '(dimensionless)')<br>

<br>

The catch is the ducktype/subclass has to rigorously follow numpy's<br>

indexing rules, including distinguishing 0d arrays from scalars. For now<br>

only I used unyt in the example above since it happens to be less strict<br>

 about dimensionless operations than astropy.units which trips up my<br>

repr code. (see below for example with astropy.units). Note in the last<br>

line I lost the dimensions, but that is because unyt does not handle<br>

np.concatenate. To get that to work we need a true ducktype for units.<br>

<br>

The example above doesn't expose the ".units" attribute outside the<br>

MaskedArray, and it doesn't print the units in the repr. But you can<br>

access them using "filled".<br>

<br>

While I could make MaskedArray forward unknown attribute accesses to the<br>

encapsulated array, that seems a bit dangerous/bug-prone at first<br>

glance, so probably I want to require the user to make a MaskedArray<br>

subclass to do so. I've just started playing with that (probably buggy),<br>

and Ive attached subclass examples for astropy.unit and unyt, with some<br>

example output below.<br>

<br>

Cheers,<br>

Allan<br>

<br>

<br>

<br>

Example using the attached astropy unit subclass:<br>

<br>

    >>> from astropy.units import m, km, s<br>

    >>> uarr = MaskedQ(np.ones(3), units=km, mask=[0,1,0])<br>

    >>> uarr<br>

    MaskedQ([1., X , 1.], units=km)<br>

    >>> uarr.units<br>

    km<br>

    >>> uarr + (1*m)<br>

    MaskedQ([1.001, X    , 1.001], units=km)<br>

    >>> uarr/(1*s)<br>

    MaskedQ([1., X , 1.], units=km / s)<br>

    >>> (uarr*(1*m))[1:]<br>

    MaskedQ([X , 1.], units=km m)<br>

    >>> np.add.outer(uarr, uarr)<br>

    MaskedQ([[2., X , 2.],<br>

             [X , X , X ],<br>

             [2., X , 2.]], units=km)<br>

    >>> print(uarr)<br>

    [1. X  1.] km m<br>

<br>

Cheers,<br>

Allan<br>

<br>

<br>

>     > Even if this impossible, I think it is conceptually useful to think<br>

>     > about what the masking class should do. My sense is that, e.g., it<br>

>     > should not attempt to decide when an operation succeeds or not,<br>

>     but just<br>

>     > "or together" input masks for regular, multiple-input functions,<br>

>     and let<br>

>     > the underlying arrays skip elements for reductions by using `where`<br>

>     > (hey, I did implement that for a reason... ;-). In particular, it<br>

>     > suggests one should not have things like domains and all that (I never<br>

>     > understood why `MaskedArray` did that). If one wants more, the class<br>

>     > should provide a method that updates the mask (a sensible default<br>

>     might<br>

>     > be `mask |= ~np.isfinite(result)` - here, the class being masked<br>

>     should<br>

>     > logically support ufuncs and functions, so it can decide what<br>

>     "isfinite"<br>

>     > means).<br>

> <br>

>     I agree it would be nice to remove domains. It would make life easier,<br>

>     and I could remove a lot of twiddly code! I kept it in for now to<br>

>     minimize the behavior changes from the old MaskedArray.<br>

> <br>

> <br>

> That makes sense. Could be separated out to a backwards-compatibility<br>

> class later.<br>

> <br>

> <br>

>     > In any case, I would think that a basic truth should be that<br>

>     everything<br>

>     > has a mask with a shape consistent with the data, so<br>

>     > 1. Each complex numbers has just one mask, and setting `a.imag` with a<br>

>     > masked array should definitely propagate the mask.<br>

>     > 2. For a masked array with structured dtype, I'd similarly say<br>

>     that the<br>

>     > default is for a mask to have the same shape as the array. But that<br>

>     > something like your collection makes sense for the case where one<br>

>     wants<br>

>     > to mask items in a structure.<br>

> <br>

>     Agreed that we should have a single bool per complex or structured<br>

>     element, and the mask shape is the same as the array shape. That's how I<br>

>     implemented it. But there is still a problem with complex.imag<br>

>     assignment:<br>

> <br>

>         >>> a = MaskedArray([1j, 2, X])<br>

>         >>> i = a.imag<br>

>         >>> i[:] = MaskedArray([1, X, 1])<br>

> <br>

>     If we make the last line copy the mask to the original array, what<br>

>     should the real part of a[2] be? Conversely, if we don't copy the mask,<br>

>     what should the imag part of a[1] be? It seems like we might "want" the<br>

>     masks to be OR'd instead, but then should i[2] be masked after we just<br>

>     set it to 1?<br>

> <br>

> Ah, I see the issue now... Easiest to implement and closest in analogy<br>

> to a regular view would be to just let it unmask a[2] (with whatever is<br>

> in real; user beware!).<br>

> <br>

> Perhaps better would be to special-case such that `imag` returns a<br>

> read-only view of the mask. Making `imag` itself read-only would prevent<br>

> possibly reasonable things like `i[np.isclose(i, 0)] = 0` - but there is<br>

> no reason this should update the mask.<br>

> <br>

> Still, neither is really satisfactory...<br>

>  <br>

> <br>

> <br>

>     > p.s. I started trying to implement the above "Mixin" class; will<br>

>     try to<br>

>     > clean that up a bit so that at least it uses `where` and push it up.<br>

> <br>

>     I played with "where", but didn't include it since 1.17 is not released.<br>

>     To avoid duplication of effort, I've attached a diff of what I tried. I<br>

>     actually get a slight slowdown of about 10% by using where...<br>

> <br>

> <br>

> Your implementation is indeed quite similar to what I got in<br>

> __array_ufunc__ (though one should "&" the where with ~mask).<br>

> <br>

> I think the main benefit is not to presume that whatever is underneath<br>

> understands 0 or 1, i.e., avoid filling.<br>

> <br>

> <br>

>     If you make progress with the mixin, a push is welcome. I imagine a<br>

>     problem is going to be that np.isscalar doesn't work to detect duck<br>

>     scalars.<br>

> <br>

> I fear that in my attempts I've simply decided that only array scalars<br>

> exist...<br>

> <br>

> -- Marten<br>

> <br>

> _______________________________________________<br>

> NumPy-Discussion mailing list<br>

> <a href="mailto:NumPy-Discussion@python.org" target="_blank">NumPy-Discussion@python.org</a><br>

> <a href="https://mail.python.org/mailman/listinfo/numpy-discussion" rel="noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/numpy-discussion</a><br>

> <br>

<br>

_______________________________________________<br>

NumPy-Discussion mailing list<br>

<a href="mailto:NumPy-Discussion@python.org" target="_blank">NumPy-Discussion@python.org</a><br>

<a href="https://mail.python.org/mailman/listinfo/numpy-discussion" rel="noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/numpy-discussion</a><br>

</blockquote></div>