[Numpy-discussion] in the NA discussion, what can we agree on?

T J tjhnson at gmail.com
Fri Nov 4 15:49:17 EDT 2011

On Fri, Nov 4, 2011 at 11:59 AM, Pauli Virtanen <pav at iki.fi> wrote:
> I have a feeling that if you don't start by mathematically defining the
> scalar operations first, and only after that generalize them to arrays,
> some conceptual problems may follow.

Yes.  I was going to mention this point as well.

> For shorthand, we can refer to the above choices with the nomenclature
>    <shorthand> ::= <propagation> <destructivity> <payload_type>
>    <propagation> ::= "P" | "N"
>    <destructivity> ::= "d" | "n" | "s"
>    <payload_type> ::= "S" | "E" | "C"
> That makes 2 * 3 * 3 = 18 different ways to construct consistent
> behavior. Some of them might make sense, the problem is to find out which

This is great for the discussion, IMO.  The self-destructive assignment
hasn't come up at all, so I'm guessing we can probably ignore it.


Can you be a bit more explicit on the payload types?  Let me try, respond
with corrections if necessary.

"S" is singleton and in the case of "missing" data, we take it to mean that
we only care that data is missing and not *how* missing the data is.

>>> x = MISSING
>>> -x  # unary
>>> x + 3  # binary

"E" means that we acknowledge that we want to track the "how", but that we
aren't interested in it. So raise an error.
In the case of ignored data, we might have:

>>> x = 2
>>> ignore(x)
>>> x
>>> -x
>>> x + 3

"C" means that we acknowledge that we want to track the "how", and that we
are interested in it.  So do the computations.

>>> x = 2
>>> ignore(x)
>>> -x
>>> x + 3

Did I get that mostly right?

> NAN and NA apparently fall into the PdS class.

Here is where I think we need ot be a bit more careful.  It is true that we
want NAN and MISSING to propagate, but then we additionally want to ignore
it sometimes.  This is precisely why we have functions like nansum.
Although people are well-aware of this desire, I think this thread has
largely conflated the issues when discussing "propagation".

To push this forward a bit, can I propose that IGNORE behave as:   PnC

>>> x = np.array([1, 2, 3])
>>> y = np.array([10, 20, 30])
>>> ignore(x[2])
>>> x
[1, IGNORED(2), 3]
>>> x + 2
[3, IGNORED(4), 5]
>>> x + y
[11, IGNORED(22), 33]
>>> z = x.sum()
>>> z
>>> unignore(z)
>>> z
>>> x.sum(skipIGNORED=True)

When done in this fashion, I think it is perfectly fine for "masks to be
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20111104/ff39deca/attachment.html>

More information about the NumPy-Discussion mailing list