[Numpy-discussion] in the NA discussion, what can we agree on?
tjhnson at gmail.com
Fri Nov 4 20:26:04 EDT 2011
On Fri, Nov 4, 2011 at 4:29 PM, Pauli Virtanen <pav at iki.fi> wrote:
> 04.11.2011 23:29, Pauli Virtanen kirjoitti:
> > As the definition concerns only what happens on assignment, it does not
> > have problems with commutativity.
> This is of course then not really true in a wider sense, as an example
> from "T J" shows:
> a = 1
> a += IGNORE(3)
> # -> a := a + IGNORE(3)
> # -> a := IGNORE(4)
> # -> a == IGNORE(1)
> which is different from
> a = 1 + IGNORE(3)
> # -> a == IGNORE(4)
> Damn, it seemed so good. Probably anything expect destructive assignment
> leads to problems like this with propagating special values.
Ok...with what I understand now, it seems like for almost all operations:
MISSING : PdS
IGNORED : PdC (this gives commutivity when unignoring data points)
When you want some sort of "reduction", we want to change the behavior for
IGNORED so that it skips the IGNORED values by default. Personally, I still
believe that this non-consistent behavior warrants a new method name. What
I mean is:
>>> x = np.array([1, IGNORED(2), 3])
>>> y = x.sum()
>>> z = x + x + x
To say that y != z will only be a source of confusion. To remedy, we force
people to be explicit, even if they'll need to be explicit 99% of the time:
>>> q = x.sum(skipIGNORED=True)
Then we can have y == z and y != q. To make the 99% use case easier, we
provide a new method which passings the keyword for us.
With PdS and PdC is seems rather clear to me why MISSING should be
implemented as a bit pattern and IGNORED implemented using masks. Setting
implementation details aside and going back to Nathaniel's original
"biggest *un*resolved question", I am now convinced that these (IGNORED and
MISSING) should be distinct API concepts and still yet distinct from NaN
with floating point dtypes. The NA implementation in NumPy does not seem
to match either of these (IGNORED and MISSING) exactly. One cannot, as far
as I know, unignore an element marked as NA.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion