
On 6/24/19 3:09 PM, Marten van Kerkwijk wrote:
Hi Allan,
Thanks for bringing up the noclobber explicitly (and Stephan for asking for clarification; I was similarly confused).
It does clarify the difference in mental picture. In mine, the operation would indeed be guaranteed to be done on the underlying data, without copy and without `.filled(...)`. I should clarify further that I use `where` only to skip reading elements (i.e., in reductions), not writing elements (as you mention, the unwritten element will often be nonsense - e.g., wrong units - which to me is worse than infinity or something similar; I've not worried at all about runtime warnings). Note that my main reason here is not that I'm against filling with numbers for numerical arrays, but rather wanting to make minimal assumptions about the underlying data itself. This may well be a mistake (but I want to find out where it breaks).
Anyway, it would seem in many ways all the better that our models are quite different. I definitely see the advantages of your choice to decide one can do with masked data elements whatever is logical ahead of an operation!
Thanks also for bringing up a useful example with `np.dot(m, m)` - clearly, I didn't yet get beyond overriding ufuncs!
In my mental model, where I'd apply `np.dot` on the data and the mask separately, the result will be wrong, so the mask has to be set (which it would be). For your specific example, that might not be the best solution, but when using `np.dot(matrix_shaped, matrix_shaped)`, I think it does give the correct masking: any masked element in a matrix better propagate to all parts that it influences, even if there is a reduction of sorts happening. So, perhaps a price to pay for a function that tries to do multiple things.
The alternative solution in my model would be to replace `np.dot` with a masked-specific implementation of what `np.dot` is supposed to stand for (in your simple example, `np.add.reduce(np.multiply(m, m))` - more generally, add relevant `outer` and `axes`). This would be similar to what I think all implementations do for `.mean()` - we cannot calculate that from the data using any fill value or skipping, so rather use a more easily cared-for `.sum()` and divide by a suitable number of elements. But in both examples the disadvantage is that we took away the option to use the underlying class's `.dot()` or `.mean()` implementations.
Just to note, my current implementation uses the IGNORE style of mask, so does not propagate the mask in np.dot: >>> a = MaskedArray([[1,1,1], [1,X,1], [1,1,1]]) >>> np.dot(a, a) MaskedArray([[3, 2, 3], [2, 2, 2], [3, 2, 3]]) I'm not at all set on that behavior and we can do something else. For now, I chose this way since it seemed to best match the "IGNORE" mask behavior. The behavior you described further above where the output row/col would be masked corresponds better to "NA" (propagating) mask behavior, which I am leaving for later implementation. best, Allan
(Aside: considerations such as these underlie my longed-for exposure of standard implementations of functions in terms of basic ufunc calls.)
Another example of a function for which I think my model is not particularly insightful (and for which it is difficult to know what to do generally) is `np.fft.fft`. Since an fft is equivalent to a sine/cosine fits to data points, the answer for masked data is in principle quite well-defined. But much less easy to implement!
All the best,
Marten
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion