On Wed, Jul 6, 2011 at 12:41 PM, Pierre GM <pgmdevlist@gmail.com> wrote:
 Ah, semantics...

On Jul 6, 2011, at 5:40 PM, Mark Wiebe wrote:
>
> NA (Not Available)
>     A placeholder for a value which is unknown to computations. That
>     value may be temporarily hidden with a mask, may have been lost
>     due to hard drive corruption, or gone for any number of reasons.
>     This is the same as NA in the R project.

I have a problem with 'temporarily hidden with a mask'. In my mind, the concept of NA carries a notion of perennation. The data is just not available, just as a NaN is just not a number.

Yes, this gets directly to what I've been meaning when I say NA vs IGNORE is independent of mask vs bitpattern. The way I'm trying to structure things, NA vs IGNORE only affects the semantic meaning, i.e. the outputs produced by computations. This is precisely why I put 'temporarily hidden with a mask' first, to make that more clear.
 
> IGNORE (Skip/Ignore)
>     A placeholder which should be treated by computations as if no value does
>     or could exist there. For sums, this means act as if the value
>     were zero, and for products, this means act as if the value were one.
>     It's as if the array were compressed in some fashion to not include
>     that element.

A data temporarily hidden by a mask becomes np.IGNORE.

Are you willing to suspend the idea of that implication for the purposes of the present discussion? If not, do you see a way to amend things so that masked NAs and bitpattern-based IGNOREs make sense? Would renaming IGNORE to SKIP be more clear, perhaps?

Thanks,
Mark
 


> bitpattern
>     A technique for implementing either NA or IGNORE, where a particular
>     set of bit patterns are chosen from all the possible bit patterns of the
>     value's data type to signal that the element is NA or IGNORE.
>
> mask
>     A technique for implementing either NA or IGNORE, where a
>     boolean or enum array parallel to the data array is used to signal
>     which elements are NA or IGNORE.
>
> numpy.ma
>     The existing implementation of a particular form of masked arrays,
>     which is part of the NumPy codebase.

OK with that.



>
> The most important distinctions I'm trying to draw are:
>
> 1) NA vs IGNORE and bitpattern vs mask are completely independent. Any combination of NA as bitpattern, NA as mask, IGNORE as bitpattern, and IGNORE as mask are reasonable.

OK with that.



> 2) The idea of masking and the numpy.ma implementation are different. The numpy.ma object makes particular choices about how to interpret the mask, but while backwards compatibility is important, a fresh evaluation of all the design choices going into a mask implementation is worthwhile.

Indeed.
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion