
I think we’d need to consider separately the operation on the mask and on the data. In my proposal, the data would always do np.sum(array, where=~mask), while how the mask would propagate might depend on the mask itself, I quite like this idea, and I think Stephan’s strawman design is actually plausible, where MaskedArray.mask is either an InvalidMask or a IgnoreMask instance to pick between the different propagation types. Both classes could simply have an underlying ._array attribute pointing to a duck-array of some kind that backs their boolean data. The second version requires that you *also* know how Mask classes work, and how they implement + I remain unconvinced that Mask classes should behave differently on different ufuncs. I don’t think np.minimum(ignore_na, b) is any different to np.add(ignore_na, b) - either both should produce b, or both should produce ignore_na. I would lean towards produxing ignore_na, and propagation behavior differing between “ignore” and “invalid” only for reduce / accumulate operations, where the concept of skipping an application is well-defined. Some possible follow-up questions that having two distinct masked types raise: - what if I want my data to support both invalid and skip fields at the same time? sum([invalid, skip, 1]) == invalid - is there a use case for more that these two types of mask? invalid_due_to_reason_A, invalid_due_to_reason_B would be interesting things to track through a calculation, possibly a dictionary of named masks. Eric On Sun, 23 Jun 2019 at 15:28, Stephan Hoyer <shoyer@gmail.com> wrote:
On Sun, Jun 23, 2019 at 11:55 PM Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:
Your proposal would be something like np.sum(array,
where=np.ones_like(array))? This seems rather verbose for a common operation. Perhaps np.sum(array, where=True) would work, making use of broadcasting? (I haven't actually checked whether this is well-defined yet.)
I think we'd need to consider separately the operation on the mask and on the data. In my proposal, the data would always do `np.sum(array, where=~mask)`, while how the mask would propagate might depend on the mask itself, i.e., we'd have different mask types for `skipna=True` (default) and `False` ("contagious") reductions, which differed in doing `logical_and.reduce` or `logical_or.reduce` on the mask.
OK, I think I finally understand what you're getting at. So suppose this this how we implement it internally. Would we really insist on a user creating a new MaskedArray with a new mask object, e.g., with a GreedyMask? We could add sugar for this, but certainly array.greedy_masked().sum() is significantly less clear than array.sum(skipna=False).
I'm also a little concerned about a proliferation of MaskedArray/Mask types. New types are significantly harder to understand than new functions (or new arguments on existing functions). I don't know if we have enough distinct use cases for this many types.
Are there use-cases for propagating masks separately from data? If not, it
might make sense to only define mask operations along with data, which could be much simpler.
I had only thought about separating out the concern of mask propagation from the "MaskedArray" class to the mask proper, but it might indeed make things easier if the mask also did any required preparation for passing things on to the data (such as adjusting the "where" argument in a reduction). I also like that this way the mask can determine even before the data what functionality is available (i.e., it could be the place from which to return `NotImplemented` for a ufunc.at call with a masked index argument).
You're going to have to come up with something more compelling than "separation of concerns" to convince me that this extra Mask abstraction is worthwhile. On its own, I think a separate Mask class would only obfuscate MaskedArray functions.
For example, compare these two implementations of add:
def add1(x, y): return MaskedArray(x.data + y.data, x.mask | y.mask)
def add2(x, y): return MaskedArray(x.data + y.data, x.mask + y.mask)
The second version requires that you *also* know how Mask classes work, and how they implement +. So now you need to look in at least twice as many places to understand add() for MaskedArray objects. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion