[Numpy-discussion] using the same vocabulary for missing value ideas

Pierre GM pgmdevlist at gmail.com
Wed Jul 6 13:41:24 EDT 2011


 Ah, semantics...

On Jul 6, 2011, at 5:40 PM, Mark Wiebe wrote:
> 
> NA (Not Available)
>     A placeholder for a value which is unknown to computations. That
>     value may be temporarily hidden with a mask, may have been lost
>     due to hard drive corruption, or gone for any number of reasons.
>     This is the same as NA in the R project.

I have a problem with 'temporarily hidden with a mask'. In my mind, the concept of NA carries a notion of perennation. The data is just not available, just as a NaN is just not a number.

> IGNORE (Skip/Ignore)
>     A placeholder which should be treated by computations as if no value does
>     or could exist there. For sums, this means act as if the value
>     were zero, and for products, this means act as if the value were one.
>     It's as if the array were compressed in some fashion to not include
>     that element.

A data temporarily hidden by a mask becomes np.IGNORE.


> bitpattern
>     A technique for implementing either NA or IGNORE, where a particular
>     set of bit patterns are chosen from all the possible bit patterns of the
>     value's data type to signal that the element is NA or IGNORE.
> 
> mask
>     A technique for implementing either NA or IGNORE, where a
>     boolean or enum array parallel to the data array is used to signal
>     which elements are NA or IGNORE.
> 
> numpy.ma
>     The existing implementation of a particular form of masked arrays,
>     which is part of the NumPy codebase.

OK with that.



> 
> The most important distinctions I'm trying to draw are:
> 
> 1) NA vs IGNORE and bitpattern vs mask are completely independent. Any combination of NA as bitpattern, NA as mask, IGNORE as bitpattern, and IGNORE as mask are reasonable.

OK with that.



> 2) The idea of masking and the numpy.ma implementation are different. The numpy.ma object makes particular choices about how to interpret the mask, but while backwards compatibility is important, a fresh evaluation of all the design choices going into a mask implementation is worthwhile.

Indeed. 


More information about the NumPy-Discussion mailing list