[Numpy-discussion] Masked arrays: Rationale for "False convention"

Mon Sep 30 23:05:00 EDT 2013

On 2013/09/30 4:57 PM, Ondřej Čertík wrote:
> On Mon, Sep 30, 2013 at 8:29 PM, Eric Firing <efiring at hawaii.edu> wrote:
>> On 2013/09/30 4:05 PM, josef.pktd at gmail.com wrote:
>>> On Mon, Sep 30, 2013 at 9:38 PM, Charles R Harris
>>> <charlesr.harris at gmail.com> wrote:
>>>>
>>>>
>>>>
>>>> On Mon, Sep 30, 2013 at 7:05 PM, Ondřej Čertík <ondrej.certik at gmail.com>
>>>> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> What is the rationale for using False in 'mask' for elements that
>>>>> should be included?
>>>>>
>>>>> http://docs.scipy.org/doc/numpy/reference/maskedarray.generic.html
>>>>>
>>>>> As opposed to using True for elements that should be included, which
>>>>> is what I was intuitively expecting when I started using the masked
>>>>> arrays. This "True convention" also happens to be the one used in
>>>>> Fortran, see e.g.:
>>>>>
>>>>> http://gcc.gnu.org/onlinedocs/gfortran/SUM.html
>>>>>
>>>>> So it's confusing why NumPy would chose a "False convention". Could it
>>>>> be, that NumPy views 'mask' as opacity? Then it would make sense to
>>>>> use True to make a value 'opaque'.
>>>>
>>>>
>>>> There was a lengthy discussion of this point back when the NA work was done.
>>>> You might be able to find the thread with a search.
>>>>
>>>> As to why it is as it is, I suspect it is historical consistency. Pierre
>>>> wrote the masked array package for numpy, but it may very well go back to
>>>> the masked array package implemented for Numeric.
>>>
>>> I don't know ancient history, but I thought it's "natural". (Actually,
>>> I never thought about it.)
>>>
>>> I always thought `mask` indicates the "masked" (invalid, hidden)
>>> values, and masked arrays mask the missing values.
>>
>> Exactly.  It is also consistent with the C and Unix convention of
>> returning 0 on success and 1, or a non-zero error code on failure.  In a
>> similar vein, it works nicely with bit-mapped quality control flags,
>> etc.  When nothing is flagged, the value is good, and consequently not
>> masked out.
>
> I see, that makes sense. So to remember this, the rule is:
>
> "Specify elements that you want to get masked using True in 'mask'".
>
> But why do I need to invert the mask when I want to see the valid elements:
>
> In [1]: from numpy import ma
>
> In [2]: a = ma.array([1, 2, 3, 4], mask=[False, False, True, False])
>
> In [3]: a
> Out[3]:
> masked_array(data = [1 2 -- 4],
>               mask = [False False  True False],
>         fill_value = 999999)
>
>
> In [4]: a[~a.mask]
> Out[4]:
> masked_array(data = [1 2 4],
>               mask = [False False False],
>         fill_value = 999999)
>
>
> I would find natural to write [4] as a[a.mask]. This is when it gets confusing.

There is no getting around it; each of the two possible conventions has 
its advantages.  But try this instead:

In [2]: a = ma.array([1, 2, 3, 4], mask=[False, False, True, False])

In [3]: a.compressed()
Out[3]: array([1, 2, 4])

I do occasionally need a "goodmask" which is the inverse of a.mask, but 
not very often; and when I do, needing to invert a.mask doesn't bother me.

Eric

>
> For example in Fortran, one does:
>
> integer :: a(4) = [1, 2, 3, 4]
> logical :: mask(4) = [.true., .true., .false., .true.]
> print *, a
> print *, pack(a, mask)
>
> and it prints:
>
>             1           2           3           4
>             1           2           4
>
> So the behavior of mask when used as an index to select elements from
> an array is identical to NumPy --- True means include the element,
> False means exclude it.
>
> Ondrej
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>