[Numpy-discussion] nan_to_num and bool arrays

Fri Dec 11 18:44:16 EST 2009

On Fri, Dec 11, 2009 at 2:22 PM, Robert Kern <robert.kern at gmail.com> wrote:
> On Fri, Dec 11, 2009 at 16:09, Keith Goodman <kwgoodman at gmail.com> wrote:
>> On Fri, Dec 11, 2009 at 1:14 PM, Robert Kern <robert.kern at gmail.com> wrote:
>>> On Fri, Dec 11, 2009 at 14:41, Keith Goodman <kwgoodman at gmail.com> wrote:
>>>> On Fri, Dec 11, 2009 at 12:08 PM, Bruce Southey <bsouthey at gmail.com> wrote:
>>>
>>>>> So I agree that it should leave the input untouched when a non-float
>>>>> dtype is used for some array-like input.
>>>>
>>>> Would only one line need to be changed? Would changing
>>>>
>>>> if not issubclass(t, _nx.integer):
>>>>
>>>> to
>>>>
>>>> if not issubclass(t, _nx.integer) and not issubclass(t, _nx.bool_):
>>>>
>>>> do the trick?
>>>
>>> That still leaves strings, voids, and objects. I recommend:
>>>
>>>  if issubclass(t, _nx.inexact):
>>>
>>> Arguably, one should handle nan float objects in object arrays and
>>> float columns in structured arrays, but the current code does not
>>> handle either of those anyways.
>>
>> Without your change both
>>
>>>> np.nan_to_num(np.array([True, False]))
>>>> np.nan_to_num([1])
>>
>> raise exceptions. With your change:
>>
>>>> np.nan_to_num(np.array([True, False]))
>>   array([ True, False], dtype=bool)
>>>> np.nan_to_num([1])
>>   array([1])
>
> I think this is correct, though the latter one happens by accident.
> Lists don't have a .dtype attribute so obj2sctype(type([1])) is
> checked and happens to be object_. The latter line is intended to
> handle scalars, not sequences. I think that sequences should be
> coerced to arrays for output and this check should be more explicit
> about what it handles. [1.0] will have a problem if you don't.

That makes sense. But I'm not smart enough to implement it.

>> On a separate note, this seems a little awkward:
>>
>>>> np.nan_to_num(1.0)
>>   1.0
>>>> np.nan_to_num(1)
>>   array(1)
>>>> x = np.ones(1, dtype=np.int)
>>>> np.nan_to_num(x[0])
>>   1
>
> Worth fixing.

Would this work?

def nan_to_num(x):
    try:
        t = x.dtype.type
    except AttributeError:
        t = obj2sctype(type(x))
    if issubclass(t, _nx.complexfloating):
        return nan_to_num(x.real) + 1j * nan_to_num(x.imag)
    else:
        try:
            y = x.copy()
        except AttributeError:
            y = array(x)
    if not y.shape:
        y = array([x])
        scalar = True
    else:
        scalar = False
    if issubclass(t, _nx.inexact):
        are_inf = isposinf(y)
        are_neg_inf = isneginf(y)
        are_nan = isnan(y)
        maxf, minf = _getmaxmin(y.dtype.type)
        y[are_nan] = 0
        y[are_inf] = maxf
        y[are_neg_inf] = minf
    if scalar:
        y = y[0]
    return y

Instead of

>> nan_to_num(1.0)
   1.0
>> nan_to_num(1)
   array(1)
>> nan_to_num(np.array(1.0))
   1.0
>> nan_to_num(np.array(1))
   array(1)

it gives

>> nan_to_num(1.0)
   1.0
>> nan_to_num(1)
   1
>> nan_to_num(np.array(1.0))
   1.0
>> nan_to_num(np.array(1))
   1

I guess a lot of unit tests need to be written before nan_to_num can
be fixed. But for now, your bool fix is an improvement.