Ah, semantics... On Jul 6, 2011, at 5:40 PM, Mark Wiebe wrote:
NA (Not Available) A placeholder for a value which is unknown to computations. That value may be temporarily hidden with a mask, may have been lost due to hard drive corruption, or gone for any number of reasons. This is the same as NA in the R project.
I have a problem with 'temporarily hidden with a mask'. In my mind, the concept of NA carries a notion of perennation. The data is just not available, just as a NaN is just not a number.
IGNORE (Skip/Ignore) A placeholder which should be treated by computations as if no value does or could exist there. For sums, this means act as if the value were zero, and for products, this means act as if the value were one. It's as if the array were compressed in some fashion to not include that element.
A data temporarily hidden by a mask becomes np.IGNORE.
bitpattern A technique for implementing either NA or IGNORE, where a particular set of bit patterns are chosen from all the possible bit patterns of the value's data type to signal that the element is NA or IGNORE.
mask A technique for implementing either NA or IGNORE, where a boolean or enum array parallel to the data array is used to signal which elements are NA or IGNORE.
numpy.ma The existing implementation of a particular form of masked arrays, which is part of the NumPy codebase.
OK with that.
The most important distinctions I'm trying to draw are:
1) NA vs IGNORE and bitpattern vs mask are completely independent. Any combination of NA as bitpattern, NA as mask, IGNORE as bitpattern, and IGNORE as mask are reasonable.
OK with that.
2) The idea of masking and the numpy.ma implementation are different. The numpy.ma object makes particular choices about how to interpret the mask, but while backwards compatibility is important, a fresh evaluation of all the design choices going into a mask implementation is worthwhile.
Indeed.