[Numpy-discussion] new MaskedArray class

Stephan Hoyer shoyer at gmail.com
Sun Jun 23 18:25:29 EDT 2019


On Sun, Jun 23, 2019 at 11:55 PM Marten van Kerkwijk <
m.h.vankerkwijk at gmail.com> wrote:

> Your proposal would be something like np.sum(array,
>> where=np.ones_like(array))? This seems rather verbose for a common
>> operation. Perhaps np.sum(array, where=True) would work, making use of
>> broadcasting? (I haven't actually checked whether this is well-defined yet.)
>>
>> I think we'd need to consider separately the operation on the mask and on
> the data. In my proposal, the data would always do `np.sum(array,
> where=~mask)`, while how the mask would propagate might depend on the mask
> itself, i.e., we'd have different mask types for `skipna=True` (default)
> and `False` ("contagious") reductions, which differed in doing
> `logical_and.reduce` or `logical_or.reduce` on the mask.
>

OK, I think I finally understand what you're getting at. So suppose this
this how we implement it internally. Would we really insist on a user
creating a new MaskedArray with a new mask object, e.g., with a GreedyMask?
We could add sugar for this, but certainly array.greedy_masked().sum() is
significantly less clear than array.sum(skipna=False).

I'm also a little concerned about a proliferation of MaskedArray/Mask
types. New types are significantly harder to understand than new functions
(or new arguments on existing functions). I don't know if we have enough
distinct use cases for this many types.

Are there use-cases for propagating masks separately from data? If not, it
>> might make sense to only define mask operations along with data, which
>> could be much simpler.
>>
>
> I had only thought about separating out the concern of mask propagation
> from the "MaskedArray" class to the mask proper, but it might indeed make
> things easier if the mask also did any required preparation for passing
> things on to the data (such as adjusting the "where" argument in a
> reduction). I also like that this way the mask can determine even before
> the data what functionality is available (i.e., it could be the place from
> which to return `NotImplemented` for a ufunc.at call with a masked index
> argument).
>

You're going to have to come up with something more compelling than
"separation of concerns" to convince me that this extra Mask abstraction is
worthwhile. On its own, I think a separate Mask class would only obfuscate
MaskedArray functions.

For example, compare these two implementations of add:

def  add1(x, y):
    return MaskedArray(x.data + y.data,  x.mask | y.mask)

def  add2(x, y):
    return MaskedArray(x.data + y.data,  x.mask + y.mask)

The second version requires that you *also* know how Mask classes work, and
how they implement +. So now you need to look in at least twice as many
places to understand add() for MaskedArray objects.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190624/8a6798d0/attachment-0001.html>


More information about the NumPy-Discussion mailing list