[Numpy-discussion] missing data discussion round 2

Charles R Harris charlesr.harris at gmail.com
Wed Jun 29 16:17:58 EDT 2011


On Wed, Jun 29, 2011 at 1:32 PM, Matthew Brett <matthew.brett at gmail.com>wrote:

> Hi,
>
> On Wed, Jun 29, 2011 at 6:22 PM, Mark Wiebe <mwwiebe at gmail.com> wrote:
> > On Wed, Jun 29, 2011 at 8:20 AM, Lluís <xscript at gmx.net> wrote:
> >>
> >> Matthew Brett writes:
> >>
> >> >> Maybe instead of np.NA, we could say np.IGNORE, which sort of conveys
> >> >> the idea that the entry is still there, but we're just ignoring it.
>  Of
> >> >> course, that goes against common convention, but it might be easier
> to
> >> >> explain.
> >>
> >> > I think Nathaniel's point is that np.IGNORE is a different idea than
> >> > np.NA, and that is why joining the implementations can lead to
> >> > conceptual confusion.
> >>
> >> This is how I see it:
> >>
> >> >>> a = np.array([0, 1, 2], dtype=int)
> >> >>> a[0] = np.NA
> >> ValueError
> >> >>> e = np.array([np.NA, 1, 2], dtype=int)
> >> ValueError
> >> >>> b  = np.array([np.NA, 1, 2], dtype=np.maybe(int))
> >> >>> m  = np.array([np.NA, 1, 2], dtype=int, masked=True)
> >> >>> bm = np.array([np.NA, 1, 2], dtype=np.maybe(int), masked=True)
> >> >>> b[1] = np.NA
> >> >>> np.sum(b)
> >> np.NA
> >> >>> np.sum(b, skipna=True)
> >> 2
> >> >>> b.mask
> >> None
> >> >>> m[1] = np.NA
> >> >>> np.sum(m)
> >> 2
> >> >>> np.sum(m, skipna=True)
> >> 2
> >> >>> m.mask
> >> [False, False, True]
> >> >>> bm[1] = np.NA
> >> >>> np.sum(bm)
> >> 2
> >> >>> np.sum(bm, skipna=True)
> >> 2
> >> >>> bm.mask
> >> [False, False, True]
> >>
> >> So:
> >>
> >> * Mask takes precedence over bit pattern on element assignment. There's
> >>  still the question of how to assign a bit pattern NA when the mask is
> >>  active.
> >>
> >> * When using mask, elements are automagically skipped.
> >>
> >> * "m[1] = np.NA" is equivalent to "m.mask[1] = False"
> >>
> >> * When using bit pattern + mask, it might make sense to have the initial
> >>  values as bit-pattern NAs, instead of masked (i.e., "bm.mask == [True,
> >>  False, True]" and "np.sum(bm) == np.NA")
> >
> > There seems to be a general idea that masks and NA bit patterns imply
> > particular differing semantics, something which I think is simply false.
>
> Well - first - it's helpful surely to separate the concepts and the
> implementation.
>
> Concepts / use patterns (as delineated by Nathaniel):
> A) missing values == 'np.NA' in my emails.  Can we call that CMV
> (concept missing values)?
> B) masks == np.IGNORE in my emails . CMSK (concept masks)?
>
> Implementations
> 1) bit-pattern == na-dtype - how about we call that IBP
> (implementation bit patten)?
> 2) array.mask.  IM (implementation mask)?
>
>
Remember that the masks are invisible, you can't see them, they are an
implementation detail. A good reason to hide the implementation is so it can
be changed without impacting software that depends on the API.

<snip>

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110629/639ad598/attachment.html>


More information about the NumPy-Discussion mailing list