[Numpy-discussion] feedback request: proposal to add masks to the core ndarray

Mark Wiebe mwwiebe at gmail.com
Fri Jun 24 13:55:51 EDT 2011


On Fri, Jun 24, 2011 at 10:02 AM, Pierre GM <pgmdevlist at gmail.com> wrote:

> On Jun 24, 2011, at 4:44 PM, Robert Kern wrote:
>
> > On Fri, Jun 24, 2011 at 09:35, Robert Kern <robert.kern at gmail.com>
> wrote:
> >> On Fri, Jun 24, 2011 at 09:24, Keith Goodman <kwgoodman at gmail.com>
> wrote:
> >>> On Fri, Jun 24, 2011 at 7:06 AM, Robert Kern <robert.kern at gmail.com>
> wrote:
> >>>
> >>>> The alternative proposal would be to add a few new dtypes that are
> >>>> NA-aware. E.g. an nafloat64 would reserve a particular NaN value
> >>>> (there are lots of different NaN bit patterns, we'd just reserve one)
> >>>> that would represent NA. An naint32 would probably reserve the most
> >>>> negative int32 value (like R does). Using the NA-aware dtypes signals
> >>>> that you are using NA values; there is no need for an additional flag.
> >>>
> >>> I don't understand the numpy design and maintainable issues, but from
> >>> a user perspective (mine) nafloat64, etc sounds nice.
> >>
> >> It's worth noting that this is not a replacement for masked arrays,
> >> nor is it intended to be the be-all, end-all solution to missing data
> >> problems. It's mostly just intended to be a focused tool to fill in
> >> the gaps where masked arrays are less convenient for whatever reason;
> >> e.g. where you're tempted to (ab)use NaNs for the purpose and the
> >> limitations on the range of values is acceptable. Not every dtype
> >> would have an NA-aware counterpart. I would suggest just nabool,
> >> nafloat64, naint32, nastring (a little tricky due to the flexible
> >> size, but doable), and naobject. Maybe a couple more, if we get
> >> requests, like naint64 and nacomplex128.
> >
> > Oh, and nadatetime64 and natimedelta64.
>
> So, if I understand correctly:
> if my array has a nafloat type, it's an array that supports missing values
> and it will always have a mask, right ? And just viewing an array as a
> nafloat dtyped one would make it an 'array-with-missing-values' ? That's
> pretty elegant. I like that.
>

My understanding is a little bit different:

The na* discussion is about implementing a full or partial set of "shadow
types" which are like their regular types, but have a signal value
indicating they are "NA".

There's another idea, to create a parameterized type mechanism with types
like "NA[int32]", adding a missing-value flag to the int32 and growing its
size by up to the dtype's alignment.

Using the mask to implement the missing value semantics at the array level
instead of the dtype level is my proposal, neither of the others involve
separate masks.


> Now, how will masked values represented ? Different masked values from one
> dtype to another ? What would be the equivalent of something like `if a[0]
> is masked` that we have know?
>

If there's a global np.NA singleton, `if a[0] is np.NA` would work
equivalently. That's a strike against storing the dtype with the NA object.

-Mark


> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110624/94937701/attachment.html>


More information about the NumPy-Discussion mailing list