[Numpy-discussion] feedback request: proposal to add masks to the core ndarray

Fri Jun 24 11:07:06 EDT 2011

On 2011-06-24 16:43, Robert Kern <robert.kern at gmail.com> wrote:
> On Fri, Jun 24, 2011 at 09:33, Charles R Harris 
> <charlesr.harris at gmail.com> wrote:
>> >
>> >  On Fri, Jun 24, 2011 at 8:06 AM, Robert Kern<robert.kern at gmail.com>  wrote:
>>> >>  The alternative proposal would be to add a few new dtypes that are
>>> >>  NA-aware. E.g. an nafloat64 would reserve a particular NaN value
>>> >>  (there are lots of different NaN bit patterns, we'd just reserve one)
>>> >>  that would represent NA. An naint32 would probably reserve the most
>>> >>  negative int32 value (like R does). Using the NA-aware dtypes signals
>>> >>  that you are using NA values; there is no need for an additional flag.
>> >
>> >  Definitely better names than r-int32. Going this way has the advantage of
>> >  reducing the friction between R and numpy, and since R has pretty much
>> >  become the standard software for statistics that is an important
>> >  consideration.
> I would definitely steal their choices of NA value for naint32 and
> nafloat64. I have reservations about their string NA value (i.e. 'NA')
> as anyone doing business in North America and other continents may
> have issues with that....

May be there is not so much need for reservation over the string NA, 
when making the distinction between:
a- the internal representation of a "missing string" (what is stored in 
memory, and that C-level code would need to be aware of)
b- the 'external' representation of a missing string (in Python, what 
would be returned by repr() )
c- what is assumed to be a missing string value when reading from a file.

a/ is not 'NA', c/ should be a parameter in the relevant functions, b/ 
can be configured as a module-level, class-level, or instance-level 
variable.