[Numpy-discussion] Masked arrays: Rationale for "False convention"
ondrej.certik at gmail.com
Mon Sep 30 22:57:18 EDT 2013
On Mon, Sep 30, 2013 at 8:29 PM, Eric Firing <efiring at hawaii.edu> wrote:
> On 2013/09/30 4:05 PM, josef.pktd at gmail.com wrote:
>> On Mon, Sep 30, 2013 at 9:38 PM, Charles R Harris
>> <charlesr.harris at gmail.com> wrote:
>>> On Mon, Sep 30, 2013 at 7:05 PM, Ondřej Čertík <ondrej.certik at gmail.com>
>>>> What is the rationale for using False in 'mask' for elements that
>>>> should be included?
>>>> As opposed to using True for elements that should be included, which
>>>> is what I was intuitively expecting when I started using the masked
>>>> arrays. This "True convention" also happens to be the one used in
>>>> Fortran, see e.g.:
>>>> So it's confusing why NumPy would chose a "False convention". Could it
>>>> be, that NumPy views 'mask' as opacity? Then it would make sense to
>>>> use True to make a value 'opaque'.
>>> There was a lengthy discussion of this point back when the NA work was done.
>>> You might be able to find the thread with a search.
>>> As to why it is as it is, I suspect it is historical consistency. Pierre
>>> wrote the masked array package for numpy, but it may very well go back to
>>> the masked array package implemented for Numeric.
>> I don't know ancient history, but I thought it's "natural". (Actually,
>> I never thought about it.)
>> I always thought `mask` indicates the "masked" (invalid, hidden)
>> values, and masked arrays mask the missing values.
> Exactly. It is also consistent with the C and Unix convention of
> returning 0 on success and 1, or a non-zero error code on failure. In a
> similar vein, it works nicely with bit-mapped quality control flags,
> etc. When nothing is flagged, the value is good, and consequently not
> masked out.
I see, that makes sense. So to remember this, the rule is:
"Specify elements that you want to get masked using True in 'mask'".
But why do I need to invert the mask when I want to see the valid elements:
In : from numpy import ma
In : a = ma.array([1, 2, 3, 4], mask=[False, False, True, False])
In : a
masked_array(data = [1 2 -- 4],
mask = [False False True False],
fill_value = 999999)
In : a[~a.mask]
masked_array(data = [1 2 4],
mask = [False False False],
fill_value = 999999)
I would find natural to write  as a[a.mask]. This is when it gets confusing.
For example in Fortran, one does:
integer :: a(4) = [1, 2, 3, 4]
logical :: mask(4) = [.true., .true., .false., .true.]
print *, a
print *, pack(a, mask)
and it prints:
1 2 3 4
1 2 4
So the behavior of mask when used as an index to select elements from
an array is identical to NumPy --- True means include the element,
False means exclude it.
More information about the NumPy-Discussion