[Numpy-discussion] Masked arrays: Rationale for "False convention"

Ondřej Čertík ondrej.certik at gmail.com
Mon Sep 30 22:57:18 EDT 2013


On Mon, Sep 30, 2013 at 8:29 PM, Eric Firing <efiring at hawaii.edu> wrote:
> On 2013/09/30 4:05 PM, josef.pktd at gmail.com wrote:
>> On Mon, Sep 30, 2013 at 9:38 PM, Charles R Harris
>> <charlesr.harris at gmail.com> wrote:
>>>
>>>
>>>
>>> On Mon, Sep 30, 2013 at 7:05 PM, Ondřej Čertík <ondrej.certik at gmail.com>
>>> wrote:
>>>>
>>>> Hi,
>>>>
>>>> What is the rationale for using False in 'mask' for elements that
>>>> should be included?
>>>>
>>>> http://docs.scipy.org/doc/numpy/reference/maskedarray.generic.html
>>>>
>>>> As opposed to using True for elements that should be included, which
>>>> is what I was intuitively expecting when I started using the masked
>>>> arrays. This "True convention" also happens to be the one used in
>>>> Fortran, see e.g.:
>>>>
>>>> http://gcc.gnu.org/onlinedocs/gfortran/SUM.html
>>>>
>>>> So it's confusing why NumPy would chose a "False convention". Could it
>>>> be, that NumPy views 'mask' as opacity? Then it would make sense to
>>>> use True to make a value 'opaque'.
>>>
>>>
>>> There was a lengthy discussion of this point back when the NA work was done.
>>> You might be able to find the thread with a search.
>>>
>>> As to why it is as it is, I suspect it is historical consistency. Pierre
>>> wrote the masked array package for numpy, but it may very well go back to
>>> the masked array package implemented for Numeric.
>>
>> I don't know ancient history, but I thought it's "natural". (Actually,
>> I never thought about it.)
>>
>> I always thought `mask` indicates the "masked" (invalid, hidden)
>> values, and masked arrays mask the missing values.
>
> Exactly.  It is also consistent with the C and Unix convention of
> returning 0 on success and 1, or a non-zero error code on failure.  In a
> similar vein, it works nicely with bit-mapped quality control flags,
> etc.  When nothing is flagged, the value is good, and consequently not
> masked out.

I see, that makes sense. So to remember this, the rule is:

"Specify elements that you want to get masked using True in 'mask'".

But why do I need to invert the mask when I want to see the valid elements:

In [1]: from numpy import ma

In [2]: a = ma.array([1, 2, 3, 4], mask=[False, False, True, False])

In [3]: a
Out[3]:
masked_array(data = [1 2 -- 4],
             mask = [False False  True False],
       fill_value = 999999)


In [4]: a[~a.mask]
Out[4]:
masked_array(data = [1 2 4],
             mask = [False False False],
       fill_value = 999999)


I would find natural to write [4] as a[a.mask]. This is when it gets confusing.

For example in Fortran, one does:

integer :: a(4) = [1, 2, 3, 4]
logical :: mask(4) = [.true., .true., .false., .true.]
print *, a
print *, pack(a, mask)

and it prints:

           1           2           3           4
           1           2           4

So the behavior of mask when used as an index to select elements from
an array is identical to NumPy --- True means include the element,
False means exclude it.

Ondrej



More information about the NumPy-Discussion mailing list