[Numpy-discussion] Clarifications in numpy.ma module (Benjamin Root)
George Trojan
george.trojan at noaa.gov
Wed Dec 31 12:27:25 EST 2014
Yet another example of an unexpected behaviour:
>>> a=np.ma.array([], mask=0)
>>> b=np.ma.array([])
>>> np.ma.allequal(a,b)
True
>>> a.mean()
masked
>>> b.mean()
nan
But
>>>a
masked_array(data = [],
mask = [],
fill_value = 1e+20)
>>> b
masked_array(data = [],
mask = False,
fill_value = 1e+20)
After some googling I found on Stack Overflow
http://stackoverflow.com/questions/13354295/python-numpy-masked-array-initialization
(this is not clearly explained on the numpy doc
pagehttp://docs.scipy.org/doc/numpy/reference/maskedarray.baseclass.html#the-maskedarray-class)
>>> d=np.ma.array([], mask=np.ma.nomask)
>>> d
masked_array(data = [],
mask = False,
fill_value = 1e+20)
I suspect the reason is that mask defaults to np.ma.nomask and the
rationale for that decision was performance. What follows is that masked
array with the default nomask attribute behaves a regular array (hence
the nan), having a placeholder for mask to be set later, if needed.
That tripped me recently, I had Cython code which relied on shapes of
data and mask parts being equal.
George
On 12/30/2014 11:17 PM, numpy-discussion-request at scipy.org wrote:
> Message: 1
> Date: Tue, 30 Dec 2014 16:04:36 -0500
> From: Benjamin Root <ben.root at ou.edu>
> Subject: Re: [Numpy-discussion] Clarifications in numpy.ma module
> To: Discussion of Numerical Python <numpy-discussion at scipy.org>
> Message-ID:
> <CANNq6Fk4XTMcXeb64C9FWWnjWsVVK=Ri7CsGLsbE2wr=z-rBJQ at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> On Tue, Dec 30, 2014 at 3:29 PM, Alexander Belopolsky <ndarray at mac.com>
> wrote:
>
>> On Tue, Dec 30, 2014 at 2:49 PM, Benjamin Root <ben.root at ou.edu> wrote:
>>
>>> Where does it say that operations on masked arrays should not produce
>>> NaNs?
>>
>> Masked arrays were invented with the specific goal to avoid carrying NaNs
>> in computations. Back in the days, NaNs were not available on some
>> platforms and had significant performance issues on others. These days NaN
>> support for floating point types is nearly universal, but numpy types are
>> not limited by floating point.
>>
>>
> >From the numpy.ma docstring:
> "Arrays sometimes contain invalid or missing data. When doing operations
> on such arrays, we wish to suppress invalid values, which is the
> purpose masked
> arrays fulfill (an example of typical use is given below)."
>
> A few lines down:
> "Here, we construct a masked array that suppress all ``NaN`` values. We
> may now proceed to calculate the mean of the other values"
>
> Note the repeated usage of the term "suppress" in the context of the input
> arrays. The phrase "We may now proceed to calculate the mean of the other
> values" implies that the mean of a masked array is taken to be the mean of
> everything but the masked values. If there are no values remaining, then I
> expect it to give me the equivalent of np.mean([]).
>
>
>
>>> Having np.mean([]) return the same thing as np.ma.mean([]) makes
>> complete sense.
>>
>> Does the following make sense as well?
>>
>>>>> import numpy
>>>>> numpy.ma.masked_values([0, 0], 0).mean()
>> masked
>>>>> numpy.ma.masked_values([0], 0).mean()
>> masked
>>>>> numpy.ma.masked_values([], 0).mean()
>> * Two warnings *
>> masked_array(data = nan,
>> mask = False,
>> fill_value = 0.0)
>>
>>
> No, I would consider the first two to be bugs. And actually, returning a
> masked array in the third one is also incorrect in this case. The result
> should be a scalar. This is now veering to the same issues discussed in the
> np.nanmean([]) vs. np.nanmean([np.nan]) discussion.
>
> Cheers!
> Ben Root
>
More information about the NumPy-Discussion
mailing list