
On Sun, Jun 23, 2019 at 11:55 PM Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:
where=np.ones_like(array))? This seems rather verbose for a common operation. Perhaps np.sum(array, where=True) would work, making use of broadcasting? (I haven't actually checked whether this is well-defined yet.)
I think we'd need to consider separately the operation on the mask and on
Your proposal would be something like np.sum(array, the data. In my proposal, the data would always do `np.sum(array, where=~mask)`, while how the mask would propagate might depend on the mask itself, i.e., we'd have different mask types for `skipna=True` (default) and `False` ("contagious") reductions, which differed in doing `logical_and.reduce` or `logical_or.reduce` on the mask.
OK, I think I finally understand what you're getting at. So suppose this this how we implement it internally. Would we really insist on a user creating a new MaskedArray with a new mask object, e.g., with a GreedyMask? We could add sugar for this, but certainly array.greedy_masked().sum() is significantly less clear than array.sum(skipna=False). I'm also a little concerned about a proliferation of MaskedArray/Mask types. New types are significantly harder to understand than new functions (or new arguments on existing functions). I don't know if we have enough distinct use cases for this many types. Are there use-cases for propagating masks separately from data? If not, it
might make sense to only define mask operations along with data, which could be much simpler.
I had only thought about separating out the concern of mask propagation from the "MaskedArray" class to the mask proper, but it might indeed make things easier if the mask also did any required preparation for passing things on to the data (such as adjusting the "where" argument in a reduction). I also like that this way the mask can determine even before the data what functionality is available (i.e., it could be the place from which to return `NotImplemented` for a ufunc.at call with a masked index argument).
You're going to have to come up with something more compelling than "separation of concerns" to convince me that this extra Mask abstraction is worthwhile. On its own, I think a separate Mask class would only obfuscate MaskedArray functions. For example, compare these two implementations of add: def add1(x, y): return MaskedArray(x.data + y.data, x.mask | y.mask) def add2(x, y): return MaskedArray(x.data + y.data, x.mask + y.mask) The second version requires that you *also* know how Mask classes work, and how they implement +. So now you need to look in at least twice as many places to understand add() for MaskedArray objects.