[Numpy-discussion] min() of array containing NaN

Tue Aug 12 10:18:22 EDT 2008

Anne Archibald wrote:
> 2008/8/12 Joe Harrington <jh at physics.ucf.edu>:
>
>   
>> So, I endorse extending min() and all other statistical routines to
>> handle NaNs, possibly with a switch to turn it on if a suitably fast
>> algorithm cannot be found (which is competitor IDL's solution).
>> Certainly without a switch the default behavior should be to return
>> NaN, not to return some random value, if a NaN is present.  Otherwise
>> the user may never know a NaN is present, and therefore has to check
>> every use for NaNs.  That constand manual NaN checking is slower and
>> more error-prone than any numerical speed advantage.
>>
>> So to sum, proposed for statistical routnes:
>> if NaN is not present, return value
>> if NaN is present, return NaN
>> if NaN is present and nan=True, return value ignoring all NaNs
>>
>> OR:
>> if NaN is not present, return value
>> if NaN is present, return value ignoring all NaNs
>> if NaN is present and nan=True, return NaN
>>
>> I'd prefer the latter.  IDL does the former and it is a pain to do
>> /nan all the time.  However, the latter might trip up the unwary,
>> whereas the former never does.
>>
>> This would apply at least to:
>> min
>> max
>> sum
>> prod
>> mean
>> median
>> std
>> and possibly many others.
>>     
>
> For almost all of these the current behaviour is to propagate NaNs
> arithmetically. For example, the sum of anything with a NaN is NaN. I
> think this is perfectly sufficient, given how easy it is to strip out
> NaNs if that's what you want. The issue that started this thread (and
> the many other threads that have come up as users stub their toes on
> this behaviour) is that min (and other functions based on comparisons)
> do not propagate NaNs. If you do np.amin(A) and A contains NaNs, you
> can't count on getting a NaN back, unlike np.mean or np.std. the fact
> that you get some random value not the minimum just adds insult to
> injury. (It is probably also true that the value you get back depends
> on how the array is stored in memory.)
>
> It really isn't very hard to replace
> np.sum(A)
> with
> np.sum(A[~isnan(A)])
> if you want to ignore NaNs instead of propagating them. So I don't
> feel a need for special code in sum() that treats NaN as 0. I would be
> content if the comparison-based functions propagated NaNs
> appropriately.
>
> If you did decide it was essential to make versions of the functions
> that removed NaNs, it would get you most of the way there to add an
> optional keyword argument to ufuncs' reduce method that skipped NaNs.
>
> Anne
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion
>
>   
Actually you probably need to use isfinite because of NumPy's support 
for IEEE 754 (means NaN is different from infinity).
Also, doesn't this also require an additional temporary copy of A?

The problem I have with this is that you must always know in advance 
that NaNs or infinities are present and assumes you want to ignore them.

Alternatively something simple like a new function.

Bruce

import numpy as np

def minnan(x, axis=None, out=None, hasnan=False):
        if hasnan:
                return np.nanmin(x,axis)
        elif np.isfinite(x).all():
                return np.min(x,axis, out)
        else:
                return np.nan # actually should be something else here

x = np.array([1,2,np.nan,4,5,6])
y = np.array([1,2,3,4,5,6])

print 'NumPy Min:', np.min(x)
print 'NumPy NaNMin:', np.nanmin(x)
print 'NumPy MinNaN:', minnan(x)
print 'NumPy MinNaN T:', minnan(x, hasnan=True)
print 'NumPy Min:', np.min(y)
print 'NumPy NaNMin:', np.nanmin(y)
print 'NumPy MinNan:', minnan(y)
print 'NumPy MinNaN T:', minnan(y, hasnan=True)