[Numpy-discussion] Clarifications in numpy.ma module (Benjamin Root)

George Trojan george.trojan at noaa.gov
Wed Dec 31 12:27:25 EST 2014


Yet another example of an unexpected behaviour:

 >>> a=np.ma.array([], mask=0)
 >>> b=np.ma.array([])
 >>> np.ma.allequal(a,b)
True
 >>> a.mean()
masked
 >>> b.mean()
nan

But

 >>>a
masked_array(data = [],
              mask = [],
        fill_value = 1e+20)
 >>> b
masked_array(data = [],
              mask = False,
        fill_value = 1e+20)

After some googling I found on Stack Overflow
http://stackoverflow.com/questions/13354295/python-numpy-masked-array-initialization
  (this is not clearly explained on the numpy doc 
pagehttp://docs.scipy.org/doc/numpy/reference/maskedarray.baseclass.html#the-maskedarray-class)

 >>> d=np.ma.array([], mask=np.ma.nomask)
 >>> d
masked_array(data = [],
              mask = False,
        fill_value = 1e+20)

I suspect the reason is that mask defaults to np.ma.nomask and the 
rationale for that decision was performance. What follows is that masked 
array with the default nomask attribute behaves a regular array (hence 
the nan), having  a placeholder for mask to be set later, if needed. 
That tripped me recently, I had Cython code which relied on shapes of 
data and mask parts being equal.

George

On 12/30/2014 11:17 PM, numpy-discussion-request at scipy.org wrote:
> Message: 1
> Date: Tue, 30 Dec 2014 16:04:36 -0500
> From: Benjamin Root <ben.root at ou.edu>
> Subject: Re: [Numpy-discussion] Clarifications in numpy.ma module
> To: Discussion of Numerical Python <numpy-discussion at scipy.org>
> Message-ID:
> 	<CANNq6Fk4XTMcXeb64C9FWWnjWsVVK=Ri7CsGLsbE2wr=z-rBJQ at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> On Tue, Dec 30, 2014 at 3:29 PM, Alexander Belopolsky <ndarray at mac.com>
> wrote:
>
>> On Tue, Dec 30, 2014 at 2:49 PM, Benjamin Root <ben.root at ou.edu> wrote:
>>
>>> Where does it say that operations on masked arrays should not produce
>>> NaNs?
>>
>> Masked arrays were invented with the specific goal to avoid carrying NaNs
>> in computations.  Back in the days, NaNs were not available on some
>> platforms and had significant performance issues on others.  These days NaN
>> support for floating point types is nearly universal, but numpy types are
>> not limited by floating point.
>>
>>
> >From the numpy.ma docstring:
> "Arrays sometimes contain invalid or missing data.  When doing operations
>      on such arrays, we wish to suppress invalid values, which is the
> purpose masked
>      arrays fulfill (an example of typical use is given below)."
>
> A few lines down:
> "Here, we construct a masked array that suppress all ``NaN`` values.  We
>      may now proceed to calculate the mean of the other values"
>
> Note the repeated usage of the term "suppress" in the context of the input
> arrays. The phrase "We may now proceed to calculate the mean of the other
> values" implies that the mean of a masked array is taken to be the mean of
> everything but the masked values. If there are no values remaining, then I
> expect it to give me the equivalent of np.mean([]).
>
>
>
>>> Having np.mean([]) return the same thing as np.ma.mean([]) makes
>> complete sense.
>>
>> Does the following make sense as well?
>>
>>>>> import numpy
>>>>> numpy.ma.masked_values([0, 0], 0).mean()
>> masked
>>>>> numpy.ma.masked_values([0], 0).mean()
>> masked
>>>>> numpy.ma.masked_values([], 0).mean()
>> * Two warnings *
>> masked_array(data = nan,
>>               mask = False,
>>         fill_value = 0.0)
>>
>>
> No, I would consider the first two to be bugs. And actually, returning a
> masked array in the third one is also incorrect in this case. The result
> should be a scalar. This is now veering to the same issues discussed in the
> np.nanmean([]) vs. np.nanmean([np.nan]) discussion.
>
> Cheers!
> Ben Root
>




More information about the NumPy-Discussion mailing list