[Numpy-discussion] in the NA discussion, what can we agree on?

Fri Nov 4 18:04:49 EDT 2011

On Fri, Nov 4, 2011 at 11:59 AM, Pauli Virtanen <pav at iki.fi> wrote:
> I have a feeling that if you don't start by mathematically defining the
> scalar operations first, and only after that generalize them to arrays,
> some conceptual problems may follow.
>
> On the other hand, I should note that numpy.ma does not work this way,
> and many people seem still happy with how it works.

Yes, my impression is that people who want MISSING just want something
that acts like a special scalar value (PdS, in your scheme), but the
people who want IGNORED want something that *can't* be defined in this
way (see my other recent post). That said...

> There are a two options how to behave with respect to binary/unary
> operations:
>
> (P) Propagating
>
> unop(SPECIAL_1) == SPECIAL_new
> binop(SPECIAL_1, SPECIAL_2) == SPECIAL_new
> binop(a, SPECIAL) == SPECIAL_new
>
> (N) Non-propagating
>
> unop(SPECIAL_1) == SPECIAL_new
> binop(SPECIAL_1, SPECIAL_2) == SPECIAL_new
> binop(a, SPECIAL) == binop(a, binop.identity) == a

SPECIAL_1 means "a special value with payload 1", right? Same thing
that some of us have been writing IGNORED(1) in other places?

Assuming that, I believe that what people want for IGNORED values is
  unop(SPECIAL_1) == SPECIAL_1
which doesn't seem to be an option in your taxonomy.

There's also the option of binop(a, SPECIAL) -> error.

> And three options on what to do on assignment:
>
> (d) Destructive
>
> a := SPECIAL      # -> a == SPECIAL
>
> (n) Non-destructive
>
> a := SPECIAL      # -> a unchanged
>
> (s) Self-destructive
>
> a := SPECIAL_1
> # -> if `a` is SPECIAL-class, then a == SPECIAL_1,
> # otherwise `a` remains unchanged

I'm not sure "assignment" is a useful way to think about what we've
been calling IGNORED values (for MISSING/NA it's fine). I've been
talking about masking/unmasking values or "toggling the IGNORED
state", because my impression is that what people want is something
like:

a[0] = 3
a[0] = SPECIAL
# now a[0] == SPECIAL(3)

This is pretty confusing when written as an assignment (and note that
now I'm assigning into an array, because if I were just assigning to a
python variable then these semantics would be impossible to
implement!). So we might prefer a syntax like
  a.visible[0] = False
or
  a.ignore(0)

> If classified this way, behaviour of items in np.ma arrays is different
> in different operations, but seems roughly PdX, where X stands for
> returning a masked value with the first argument as the payload in
> binary ops if either argument is masked.

No -- np.ma implements the assignment semantics I described above, not
"d" semantics. Trimming some output for readability:

>>> a = np.ma.masked_array([1, 2, 3])
>>> a[1] = np.ma.masked
>>> a
[1, --, 3]
>>> a.mask[1] = False
>>> a
[1, 2, 3]

So assignment is not destructive -- the old value is retained as the "payload".

-- Nathaniel