Re: [Numpy-discussion] new MaskedArray class

June 21, 2019

      Just to note, data that is masked isn't always garbage. There are plenty of
use-cases where one may want to temporarily apply a mask for a set of
computation, or possibly want to apply a series of different masks to the
data. I haven't read through this discussion deeply enough, but is this new
class going to destroy underlying masked data? and will it be possible to
swap out masks?

Cheers!
Ben Root

On Thu, Jun 20, 2019 at 12:44 PM Allan Haldane <allanhaldane@gmail.com>
wrote:
...
On 6/19/19 10:19 PM, Marten van Kerkwijk wrote:
...
Hi Allan,
This is very impressive! I could get the tests that I wrote for my class
pass with yours using Quantity with what I would consider very minimal
changes. I only could not find a good way to unmask data (I like the
idea of setting the mask on some elements via `ma[item] = X`); is this
on purpose?
Yes, I want to make it difficult for the user to access the garbage
values under the mask, which are often clobbered values. The only way to
"remove" a masked value is by replacing it with a new non-masked value.
...
Anyway, it would seem easily at the point where I should comment on your
repository rather than in the mailing list!
To make further progress on this encapsulation idea I need a more
complete ducktype to pass into MaskedArray to test, so that's what I'll
work on next, when I have time. I'll either try to finish my
ArrayCollection type, or try making a simple NDunit ducktype
piggybacking on astropy's Unit.
Best,
Allan
...
All the best,
Marten
On Wed, Jun 19, 2019 at 5:45 PM Allan Haldane <allanhaldane@gmail.com
<mailto:allanhaldane@gmail.com>> wrote:
On 6/18/19 2:04 PM, Marten van Kerkwijk wrote:
    >
    >
    > On Tue, Jun 18, 2019 at 12:55 PM Allan Haldane
    <allanhaldane@gmail.com <mailto:allanhaldane@gmail.com>
    > <mailto:allanhaldane@gmail.com <mailto:allanhaldane@gmail.com>>>
    wrote:
    > <snip>
    >
    >     > This may be too much to ask from the initializer, but, if
    so, it still
    >     > seems most useful if it is made as easy as possible to do,
    say, `class
    >     > MaskedQuantity(Masked, Quantity): <very few overrides>`.
    >
    >     Currently MaskedArray does not accept ducktypes as underlying
    arrays,
    >     but I think it shouldn't be too hard to modify it to do so.
    Good idea!
    >
    >
    > Looking back at my trial, I see that I also never got to duck
arrays -
...
> only ndarray subclasses - though I tried to make the code as
    agnostic as
    > possible.
    >
    > (Trial at
    >
https://github.com/astropy/astropy/compare/master...mhvk:utils-masked-class?...
)
...
>
    >     I already partly navigated this mixin-issue in the
    >     "MaskedArrayCollection" class, which essentially does
    >     ArrayCollection(MaskedArray(array)), and only takes about 30
    lines of
    >     boilerplate. That's the backwards encapsulation order from
    what you want
    >     though.
    >
    >
    > Yes, indeed, from a quick trial `MaskedArray(np.arange(3.) * u.m,
    > mask=[True, False, False])` does indeed not have a `.unit`
attribute
...
> (and cannot represent itself...); I'm not at all sure that my
    method of
    > just creating a mixed class is anything but a recipe for disaster,
    though!
Based on your suggestion I worked on this a little today, and now my
    MaskedArray more easily encapsulates both ducktypes and ndarray
    subclasses (pushed to repo). Here's an example I got working with
masked
...
units using unyt:
[1]: from MaskedArray import X, MaskedArray, MaskedScalar
[2]: from unyt import m, km
[3]: import numpy as np
[4]: uarr = MaskedArray([1., 2., 3.]*km, mask=[0,1,0])
[5]: uarr
MaskedArray([1., X , 3.])
    [6]: uarr + 1*m
MaskedArray([1.001, X    , 3.001])
    [7]: uarr.filled()
unyt_array([1., 0., 3.], 'km')
    [8]: np.concatenate([uarr, 2*uarr]).filled()
    unyt_array([1., 0., 3., 2., 0., 6.], '(dimensionless)')
The catch is the ducktype/subclass has to rigorously follow numpy's
    indexing rules, including distinguishing 0d arrays from scalars. For
now
...
only I used unyt in the example above since it happens to be less
strict
...
about dimensionless operations than astropy.units which trips up my
    repr code. (see below for example with astropy.units). Note in the
last
...
line I lost the dimensions, but that is because unyt does not handle
    np.concatenate. To get that to work we need a true ducktype for
units.
...
The example above doesn't expose the ".units" attribute outside the
    MaskedArray, and it doesn't print the units in the repr. But you can
    access them using "filled".
While I could make MaskedArray forward unknown attribute accesses to
the
...
encapsulated array, that seems a bit dangerous/bug-prone at first
    glance, so probably I want to require the user to make a MaskedArray
    subclass to do so. I've just started playing with that (probably
buggy),
...
and Ive attached subclass examples for astropy.unit and unyt, with
some
...
example output below.
Cheers,
    Allan
Example using the attached astropy unit subclass:
>>> from astropy.units import m, km, s
        >>> uarr = MaskedQ(np.ones(3), units=km, mask=[0,1,0])
        >>> uarr
        MaskedQ([1., X , 1.], units=km)
        >>> uarr.units
        km
        >>> uarr + (1*m)
        MaskedQ([1.001, X    , 1.001], units=km)
        >>> uarr/(1*s)
        MaskedQ([1., X , 1.], units=km / s)
        >>> (uarr*(1*m))[1:]
        MaskedQ([X , 1.], units=km m)
        >>> np.add.outer(uarr, uarr)
        MaskedQ([[2., X , 2.],
                 [X , X , X ],
                 [2., X , 2.]], units=km)
        >>> print(uarr)
        [1. X  1.] km m
Cheers,
    Allan
>     > Even if this impossible, I think it is conceptually useful
    to think
    >     > about what the masking class should do. My sense is that,
    e.g., it
    >     > should not attempt to decide when an operation succeeds or
not,
...
>     but just
    >     > "or together" input masks for regular, multiple-input
functions,
...
>     and let
    >     > the underlying arrays skip elements for reductions by using
    `where`
    >     > (hey, I did implement that for a reason... ;-). In
    particular, it
    >     > suggests one should not have things like domains and all
    that (I never
    >     > understood why `MaskedArray` did that). If one wants more,
    the class
    >     > should provide a method that updates the mask (a sensible
    default
    >     might
    >     > be `mask |= ~np.isfinite(result)` - here, the class being
masked
...
>     should
    >     > logically support ufuncs and functions, so it can decide what
    >     "isfinite"
    >     > means).
    >
    >     I agree it would be nice to remove domains. It would make life
    easier,
    >     and I could remove a lot of twiddly code! I kept it in for now
to
...
>     minimize the behavior changes from the old MaskedArray.
    >
    >
    > That makes sense. Could be separated out to a
backwards-compatibility
...
> class later.
    >
    >
    >     > In any case, I would think that a basic truth should be that
    >     everything
    >     > has a mask with a shape consistent with the data, so
    >     > 1. Each complex numbers has just one mask, and setting
    `a.imag` with a
    >     > masked array should definitely propagate the mask.
    >     > 2. For a masked array with structured dtype, I'd similarly
say
...
>     that the
    >     > default is for a mask to have the same shape as the array.
    But that
    >     > something like your collection makes sense for the case
    where one
    >     wants
    >     > to mask items in a structure.
    >
    >     Agreed that we should have a single bool per complex or
structured
...
>     element, and the mask shape is the same as the array shape.
    That's how I
    >     implemented it. But there is still a problem with complex.imag
    >     assignment:
    >
    >         >>> a = MaskedArray([1j, 2, X])
    >         >>> i = a.imag
    >         >>> i[:] = MaskedArray([1, X, 1])
    >
    >     If we make the last line copy the mask to the original array,
what
...
>     should the real part of a[2] be? Conversely, if we don't copy
    the mask,
    >     what should the imag part of a[1] be? It seems like we might
    "want" the
    >     masks to be OR'd instead, but then should i[2] be masked after
    we just
    >     set it to 1?
    >
    > Ah, I see the issue now... Easiest to implement and closest in
analogy
...
> to a regular view would be to just let it unmask a[2] (with
    whatever is
    > in real; user beware!).
    >
    > Perhaps better would be to special-case such that `imag` returns a
    > read-only view of the mask. Making `imag` itself read-only would
    prevent
    > possibly reasonable things like `i[np.isclose(i, 0)] = 0` - but
    there is
    > no reason this should update the mask.
    >
    > Still, neither is really satisfactory...
    >
    >
    >
    >     > p.s. I started trying to implement the above "Mixin" class;
will
...
>     try to
    >     > clean that up a bit so that at least it uses `where` and
    push it up.
    >
    >     I played with "where", but didn't include it since 1.17 is not
    released.
    >     To avoid duplication of effort, I've attached a diff of what I
    tried. I
    >     actually get a slight slowdown of about 10% by using where...
    >
    >
    > Your implementation is indeed quite similar to what I got in
    > __array_ufunc__ (though one should "&" the where with ~mask).
    >
    > I think the main benefit is not to presume that whatever is
underneath
...
> understands 0 or 1, i.e., avoid filling.
    >
    >
    >     If you make progress with the mixin, a push is welcome. I
    imagine a
    >     problem is going to be that np.isscalar doesn't work to detect
    duck
    >     scalars.
    >
    > I fear that in my attempts I've simply decided that only array
scalars
...
> exist...
    >
    > -- Marten
    >
    > _______________________________________________
    > NumPy-Discussion mailing list
    > NumPy-Discussion@python.org <mailto:NumPy-Discussion@python.org>
    > https://mail.python.org/mailman/listinfo/numpy-discussion
    >
_______________________________________________
    NumPy-Discussion mailing list
    NumPy-Discussion@python.org <mailto:NumPy-Discussion@python.org>
    https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] new MaskedArray class

Benjamin Root