[Numpy-discussion] missing data discussion round 2
Lluís
xscript at gmx.net
Wed Jun 29 09:20:57 EDT 2011
Matthew Brett writes:
>> Maybe instead of np.NA, we could say np.IGNORE, which sort of conveys
>> the idea that the entry is still there, but we're just ignoring it. Of
>> course, that goes against common convention, but it might be easier to
>> explain.
> I think Nathaniel's point is that np.IGNORE is a different idea than
> np.NA, and that is why joining the implementations can lead to
> conceptual confusion.
This is how I see it:
>>> a = np.array([0, 1, 2], dtype=int)
>>> a[0] = np.NA
ValueError
>>> e = np.array([np.NA, 1, 2], dtype=int)
ValueError
>>> b = np.array([np.NA, 1, 2], dtype=np.maybe(int))
>>> m = np.array([np.NA, 1, 2], dtype=int, masked=True)
>>> bm = np.array([np.NA, 1, 2], dtype=np.maybe(int), masked=True)
>>> b[1] = np.NA
>>> np.sum(b)
np.NA
>>> np.sum(b, skipna=True)
2
>>> b.mask
None
>>> m[1] = np.NA
>>> np.sum(m)
2
>>> np.sum(m, skipna=True)
2
>>> m.mask
[False, False, True]
>>> bm[1] = np.NA
>>> np.sum(bm)
2
>>> np.sum(bm, skipna=True)
2
>>> bm.mask
[False, False, True]
So:
* Mask takes precedence over bit pattern on element assignment. There's
still the question of how to assign a bit pattern NA when the mask is
active.
* When using mask, elements are automagically skipped.
* "m[1] = np.NA" is equivalent to "m.mask[1] = False"
* When using bit pattern + mask, it might make sense to have the initial
values as bit-pattern NAs, instead of masked (i.e., "bm.mask == [True,
False, True]" and "np.sum(bm) == np.NA")
Lluis
--
"And it's much the same thing with knowledge, for whenever you learn
something new, the whole world becomes that much richer."
-- The Princess of Pure Reason, as told by Norton Juster in The Phantom
Tollbooth
More information about the NumPy-Discussion
mailing list