[Numpy-discussion] in the NA discussion, what can we agree on?

Fri Nov 4 20:37:40 EDT 2011

04.11.2011 22:29, Nathaniel Smith kirjoitti:
[clip]
> Continuing my theme of looking for consensus first... there are
> obviously a ton of ugly corners in here. But my impression is that at
> least for some simple cases, it's clear what users want:
>
>>>> a = [1, IGNORED(2), 3]
> # array-with-ignored-values + unignored scalar only affects unignored values
>>>> a + 2
> [3, IGNORED(2), 5]
> # reduction operations skip ignored values
>>>> np.sum(a)
> 4

This can break commutativity:

 >>> a = [1, IGNORED(2), 3]
 >>> b = [4, IGNORED(5), 6]
 >>> x = a + b
 >>> y = b + a
 >>> x[1] = ???
 >>> y[1] = ???

Defining

    unop(IGNORED(a))     == IGNORED(a)
    binop(IGNORED(a), b) == IGNORED(a)
    binop(a, IGNORED(b)) == IGNORED(b)
    binop(IGNORED(a), IGNORED(b)) == IGNORED(binop(a, b))   # or NA

could however get around that. That seems to be pretty much how NaN 
works, except that it now carries a "hidden" value with it.

> For example, Gary mentioned the common idiom of wanting to take an
> array and subtract off its mean, and he wants to do that while leaving
> the masked-out/ignored values unchanged.As long as the above cases
> work the way I wrote, we will have
>
>>>> np.mean(a)
> 2
>>>> a -= np.mean(a)
>>>> a
> [-1, IGNORED(2), 1]

That would be propagating + the above NaN-like rules for binary operators.

Whether the reduction methods have skip_IGNORE=True as default or not is 
in my opinion more of an API question, rather than a question on how the 
algebra of ignored values should work.

     ***

If destructive assignment is really needed to avoid problems with 
commutation, [see T. J. (2011)] is then maybe a problem. So, one would 
need to have

 >>> x = [1, IGNORED(2), 3]
 >>> y = [1, IGNORED(2), 3]
 >>> z = [4, IGNORED(5), IGNORED(6)]

 >>> x[:] = z
 >>> x
[4, IGNORED(5), IGNORED(6)]

 >>> y += z
 >>> y
[4, IGNORED(7), IGNORED(6)]

This is not how np.ma works. But if you do otherwise, there doesn't seem 
to be any guarantee that

 >>> a += 42
 >>> a += b

is the same thing as

 >>> a += b
 >>> a += 42

[clip]
> So before we start exploring the whole vast space of possible ways to
> handle masked-out data, does anyone see any reason to consider rules
> that don't have, as a subset, the ones above? Do other rules have any
> use cases or user demand? (I *love* playing with clever mathematics
> and making things consistent, but there's not much point unless the
> end result is something that people will use :-).)

Yep, it's important to keep in mind what people want.

People however tend to implicitly expect that simple arithmetic 
operations on arrays, containing ignored values or not, operate in a 
certain way. Actually stating how these operations work with scalars 
gives valuable insight on how you'd like things to work.

Also, if you propose to break the rules of arithmetic, in a fundamental 
library meant for scientific computation, you should be aware that you 
do so, and how you do so.

I mean, at least for me it was not clear before this formulation that 
there was a reason why binary ops in np.ma were not commutative! Now I 
kind of see that there is an asymmetry in assignment into masked arrays, 
and there is a conflict with commuting operations and with "what you'd 
expect ignored values to do". I'm not sure if it's possible to get rid 
of this problem, but it could be possible to restrict it to assignments 
and in-place operations rather than having it in binary ops.

-- 
Pauli Virtanen