[Numpy-discussion] using the same vocabulary for missing value ideas
Pierre GM
pgmdevlist at gmail.com
Wed Jul 6 13:41:24 EDT 2011
Ah, semantics...
On Jul 6, 2011, at 5:40 PM, Mark Wiebe wrote:
>
> NA (Not Available)
> A placeholder for a value which is unknown to computations. That
> value may be temporarily hidden with a mask, may have been lost
> due to hard drive corruption, or gone for any number of reasons.
> This is the same as NA in the R project.
I have a problem with 'temporarily hidden with a mask'. In my mind, the concept of NA carries a notion of perennation. The data is just not available, just as a NaN is just not a number.
> IGNORE (Skip/Ignore)
> A placeholder which should be treated by computations as if no value does
> or could exist there. For sums, this means act as if the value
> were zero, and for products, this means act as if the value were one.
> It's as if the array were compressed in some fashion to not include
> that element.
A data temporarily hidden by a mask becomes np.IGNORE.
> bitpattern
> A technique for implementing either NA or IGNORE, where a particular
> set of bit patterns are chosen from all the possible bit patterns of the
> value's data type to signal that the element is NA or IGNORE.
>
> mask
> A technique for implementing either NA or IGNORE, where a
> boolean or enum array parallel to the data array is used to signal
> which elements are NA or IGNORE.
>
> numpy.ma
> The existing implementation of a particular form of masked arrays,
> which is part of the NumPy codebase.
OK with that.
>
> The most important distinctions I'm trying to draw are:
>
> 1) NA vs IGNORE and bitpattern vs mask are completely independent. Any combination of NA as bitpattern, NA as mask, IGNORE as bitpattern, and IGNORE as mask are reasonable.
OK with that.
> 2) The idea of masking and the numpy.ma implementation are different. The numpy.ma object makes particular choices about how to interpret the mask, but while backwards compatibility is important, a fresh evaluation of all the design choices going into a mask implementation is worthwhile.
Indeed.
More information about the NumPy-Discussion
mailing list