[Numpy-discussion] feedback request: proposal to add masks to the core ndarray

Fri Jun 24 10:33:18 EDT 2011

On Fri, Jun 24, 2011 at 8:06 AM, Robert Kern <robert.kern at gmail.com> wrote:

> On Fri, Jun 24, 2011 at 07:30, Laurent Gautier <lgautier at gmail.com> wrote:
> > On 2011-06-24 13:59,  Nathaniel Smith <njs at pobox.com> wrote:
> >> On Thu, Jun 23, 2011 at 5:56 PM, Benjamin Root<ben.root at ou.edu>  wrote:
> >>> Lastly, I am not entirely familiar with R, so I am also very curious
> about
> >>> what this magical "NA" value is, and how it compares to how NaNs work.
> >>> Although, Pierre brought up the very good point that NaNs woulldn't
> work
> >>> anyway with integer arrays (and object arrays, etc.).
> >> Since R is designed for statistics, they made the interesting decision
> >> that *all* of their core types have a special designated "missing"
> >> value. At the R level this is just called "NA". Internally, there are
> >> a bunch of different NA values -- for floats it's a particular NaN,
> >> for integers it's INT_MIN, for booleans it's 2 (IIRC), etc. (You never
> >> notice this, because R will silently cast a NA of one type into NA of
> >> another type whenever needed, and they all print the same.)
> >>
> >> Because any array can contain NA's, all R functions then have to have
> >> some way of handling this -- all their integer arithmetic knows that
> >> INT_MIN is special, for instance. The rules are basically the same as
> >> for NaN's, but NA and NaN are different from each other (because one
> >> means "I don't know, could be anything" and the other means "you tried
> >> to divide by 0, I *know* that's meaningless").
> >>
> >> That's basically it.
> >>
> >> -- Nathaniel
> >
> > Would the use of R's system for expressing "missing values" be possible
> > in numpy through a special flag ?
> >
> > Any given numpy array could have a boolean flag (say "na_aware")
> > indicating that some of the values are representing a missing cell.
> >
> > If the exact same system is used, interaction with R (through something
> > like rpy2) would be simplified and more robust.
>
> The alternative proposal would be to add a few new dtypes that are
> NA-aware. E.g. an nafloat64 would reserve a particular NaN value
> (there are lots of different NaN bit patterns, we'd just reserve one)
> that would represent NA. An naint32 would probably reserve the most
> negative int32 value (like R does). Using the NA-aware dtypes signals
> that you are using NA values; there is no need for an additional flag.
>
>
Definitely better names than r-int32. Going this way has the advantage of
reducing the friction between R and numpy, and since R has pretty much
become the standard software for statistics that is an important
consideration.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110624/9e65d6a0/attachment.html>