On Wed, Jul 6, 2011 at 11:38 AM, Matthew Brett <matthew.brett@gmail.com> wrote:

Hi,

On Wed, Jul 6, 2011 at 4:40 PM, Mark Wiebe <mwwiebe@gmail.com> wrote:

> It appears to me that one of the biggest reason some of us have been talking
> past each other in the discussions is that different people have different
> definitions for the terms being used. Until this is thoroughly cleared up, I
> feel the design process is tilting at windmills.
> In the interests of clarity in our discussions, here is a starting point
> which is consistent with the NEP. These definitions have been added in a
> glossary within the NEP. If there are any ideas for amendments to these
> definitions that we can agree on, I will update the NEP with those
> amendments. Also, if I missed any important terms which need to be added,
> please propose definitions for them.
> NA (Not Available)
> A placeholder for a value which is unknown to computations. That
> value may be temporarily hidden with a mask, may have been lost
> due to hard drive corruption, or gone for any number of reasons.
> This is the same as NA in the R project.

Really? Can one implement NA with a mask in R? I thought an NA was
always bitpattern in R?

> IGNORE (Skip/Ignore)
> A placeholder which should be treated by computations as if no value
> does
> or could exist there. For sums, this means act as if the value
> were zero, and for products, this means act as if the value were one.
> It's as if the array were compressed in some fashion to not include
> that element.
> bitpattern
> A technique for implementing either NA or IGNORE, where a particular
> set of bit patterns are chosen from all the possible bit patterns of the
> value's data type to signal that the element is NA or IGNORE.
> mask
> A technique for implementing either NA or IGNORE, where a
> boolean or enum array parallel to the data array is used to signal
> which elements are NA or IGNORE.
> numpy.ma
> The existing implementation of a particular form of masked arrays,
> which is part of the NumPy codebase.
>
> The most important distinctions I'm trying to draw are:
> 1) NA vs IGNORE and bitpattern vs mask are completely independent. Any
> combination of NA as bitpattern, NA as mask, IGNORE as bitpattern, and
> IGNORE as mask are reasonable.
> 2) The idea of masking and the numpy.ma implementation are different. The
> numpy.ma object makes particular choices about how to interpret the mask,
> but while backwards compatibility is important, a fresh evaluation of all
> the design choices going into a mask implementation is worthwhile.

I agree that there has been some confusion due to the terms.

However, I continue to believe that the discussion is substantial and
not due to confusion.

I believe this is true as well, but the confusion due to the terms appears to be one of the root causes preventing the ideas from getting across. Without first clearing up this aspect of the discussion, things will stay confusing.

Let us then characterize the substantial discussion as this:

NEP: bitpattern and masked out values should be made nearly impossible
to distinguish in the API
alterNEP: bitpattern and masked out values should be distinct in the
API so that it can be made clear which is meant (and therefore,
implicitly, how they are implemented).

Do you agree that this is the discussion?

I'd like to get agreement on the definitions before moving to any of the points of contention that are being raised.

Thanks,

-Mark

See you,

Matthew

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion