[Numpy-discussion] missing data discussion round 2

Wed Jun 29 09:20:57 EDT 2011

Matthew Brett writes:

>> Maybe instead of np.NA, we could say np.IGNORE, which sort of conveys
>> the idea that the entry is still there, but we're just ignoring it.  Of
>> course, that goes against common convention, but it might be easier to
>> explain.

> I think Nathaniel's point is that np.IGNORE is a different idea than
> np.NA, and that is why joining the implementations can lead to
> conceptual confusion.

This is how I see it:

>>> a = np.array([0, 1, 2], dtype=int)
>>> a[0] = np.NA
ValueError
>>> e = np.array([np.NA, 1, 2], dtype=int)
ValueError
>>> b  = np.array([np.NA, 1, 2], dtype=np.maybe(int))
>>> m  = np.array([np.NA, 1, 2], dtype=int, masked=True)
>>> bm = np.array([np.NA, 1, 2], dtype=np.maybe(int), masked=True)
>>> b[1] = np.NA
>>> np.sum(b)
np.NA
>>> np.sum(b, skipna=True)
2
>>> b.mask
None
>>> m[1] = np.NA
>>> np.sum(m)
2
>>> np.sum(m, skipna=True)
2
>>> m.mask
[False, False, True]
>>> bm[1] = np.NA
>>> np.sum(bm)
2
>>> np.sum(bm, skipna=True)
2
>>> bm.mask
[False, False, True]

So:

* Mask takes precedence over bit pattern on element assignment. There's
  still the question of how to assign a bit pattern NA when the mask is
  active.

* When using mask, elements are automagically skipped.

* "m[1] = np.NA" is equivalent to "m.mask[1] = False"

* When using bit pattern + mask, it might make sense to have the initial
  values as bit-pattern NAs, instead of masked (i.e., "bm.mask == [True,
  False, True]" and "np.sum(bm) == np.NA")

Lluis

-- 
 "And it's much the same thing with knowledge, for whenever you learn
 something new, the whole world becomes that much richer."
 -- The Princess of Pure Reason, as told by Norton Juster in The Phantom
 Tollbooth