[Numpy-discussion] [Cdat-discussion] Arrays containing NaNs

Fri Jul 25 14:43:41 EDT 2008

Hi Pierre,

Thanks for the answer, I'm ccing cdat's discussion list.

It makes sense, that's also the way we develop things here NEVER assume 
what the user is going to do with the data BUT give the user the 
necessary tools to do what you're assuming he/she wants to do (as simple 
as possible)

Thanks again for the answer.

C.

Pierre GM wrote:
> Oh, I guess this one's for me...
>
> On Thursday 01 January 1970 04:21:03 Charles Doutriaux wrote:
>
>   
>> Basically it was suggested to automarically mask NaN (and Inf ?) when
>> creating ma.
>> I'm sure you already thought of this on this list and was curious to
>> know why you decided not to do it.
>>     
>
> Because it's always best to let the user decide what to do with his/her data 
> and not impose anything ?
>
> Masking a point doesn't necessarily mean that the point is invalid (in the 
> sense of NaNs/Infs), just that it doesn't satisfy some particular condition. 
> In that sense, masks act as selecting tools.
>
> By forcing invalid data to be masked at the creation of an array, you run the 
> risk to tamper with the (potential) physical meaning of the mask you have 
> given as input, and/or miss the fact that some data are actually invalid when 
> you don't expect it to be.
>
> Let's take an example: 
> I want to analyze sea surface temperatures at the world scale. The data comes 
> as a regular 2D ndarray, with NaNs for missing or invalid data. In a first 
> step, I create a masked array of this data, filtering out the land masses by 
> a predefined geographical mask. The remaining NaNs in the masked array 
> indicate areas where the sensor failed... It's an important information I 
> would probably have missed by masking all the NaNs at first...
>
>
> As Eric F. suggested, you can use numpy.ma.masked_invalid to create a masked 
> array with NaNs/Infs filtered out:
>
>   
>>>> import numpy as np,. numpy.ma as ma
>>>> x = np.array([1,2,None,4], dtype=float)
>>>> x
>>>>         
> array([  1.,   2.,  NaN,   4.])
>   
>>>> mx = ma.masked_invalid(x)
>>>> mx
>>>>         
> masked_array(data = [1.0 2.0 -- 4.0],
>       mask = [False False  True False],
>       fill_value=1e+20)
>
> Note that the underlying data still has NaNs/Infs:
>   
>>>> mx._data
>>>>         
> array([  1.,   2.,  NaN,   4.])
>
> You can also use the ma.fix_invalid function: it creates a mask where the data 
> is not finite (NaNs/Infs), and set the corresponding points to fill_value.
>   
>>>> mx = ma.fix_invalid(x, fill_value=999)
>>>> mx
>>>>         
> masked_array(data = [1.0 2.0 -- 4.0],
>       mask = [False False  True False],
>       fill_value=1e+20)
>   
>>>> mx._data
>>>>         
> array([   1.,    2.,  999.,    4.])
>
>
> The advantage of the second approach is that you no longer have NaNs/Infs in 
> the underlying data, which speeds things up during computation. The obvious 
> disadvantage is that you no longer know where the data was invalid...
>
>