[Numpy-discussion] NA masks for NumPy are ready to test

Fri Aug 19 16:05:23 EDT 2011

On Fri, Aug 19, 2011 at 11:44 AM, Charles R Harris <
charlesr.harris at gmail.com> wrote:

>
>
> On Fri, Aug 19, 2011 at 12:37 PM, Bruce Southey <bsouthey at gmail.com>wrote:
>
>> Hi,
>> Just some immediate minor observations that are really about trying to
>> be consistent:
>>
>> 1) Could you keep the display of the NA dtype be the same as the array?
>> For example, NA dtype is displayed as '<f8' but should be displayed as
>> 'float64' as that is the array dtype.
>>  >>> a=np.array([[1,2,3,np.NA], [3,4,np.nan,5]])
>> >>> a
>> array([[  1.,   2.,   3., NA],
>>       [  3.,   4.,  nan,   5.]])
>> >>> a.dtype
>> dtype('float64')
>> >>> a.sum()
>> NA(dtype='<f8')
>>
>> 2) Can the 'skipna' flag be added to the methods?
>> >>> a.sum(skipna=True)
>> Traceback (most recent call last):
>>   File "<stdin>", line 1, in <module>
>> TypeError: 'skipna' is an invalid keyword argument for this function
>> >>> np.sum(a,skipna=True)
>> nan
>>
>> 3) Can the skipna flag be extended to exclude other non-finite cases like
>> NaN?
>>
>> 4) Assigning a np.NA needs a better error message but the Integer
>> array case is more informative:
>> >>> b=np.array([1,2,3,4], dtype=np.float128)
>> >>> b[0]=np.NA
>> Traceback (most recent call last):
>>   File "<stdin>", line 1, in <module>
>> TypeError: float() argument must be a string or a number
>>
>> >>> j=np.array([1,2,3])
>> >>> j
>> array([1, 2, 3])
>> >>> j[0]=ina
>> Traceback (most recent call last):
>>   File "<stdin>", line 1, in <module>
>> TypeError: int() argument must be a string or a number, not 'numpy.NAType'
>>
>> But it is nice that np.NA 'adjusts' to the insertion array:
>> >>> b.flags.maskna = True
>> >>> ana
>> NA(dtype='<f8')
>> >>> b[0]=ana
>> >>> b[0]
>> NA(dtype='<f16')
>>
>> 5) Different display depending on masked state. That is I think that
>> 'maskna=True' should be displayed always when flags.maskna is True :
>> >>> j=np.array([1,2,3], dtype=np.int8)
>> >>> j
>> array([1, 2, 3], dtype=int8)
>> >>> j.flags.maskna=True
>> >>> j
>> array([1, 2, 3], maskna=True, dtype=int8)
>> >>> j[0]=np.NA
>> >>> j
>> array([NA, 2, 3], dtype=int8) # Ithink it should still display
>> 'maskna=True'.
>>
>>
> My main peeve is that NA is upper case ;) I suppose that could use some
> discussion.
>

There is some proliferation of cases in the NaN case:

>>> np.nan
nan
>>> np.NAN
nan
>>> np.NaN
nan

The pros I see for NA over na are:

* less confusion of NA vs nan (should this carry over to the np.isna
function, should it be np.isNA according to this point?)
* more comfortable for switching between NumPy and R when people have to use
both at the same time

The main con is:

* Inconsistent with current nan, inf printing. Here's a hackish workaround:

>>> np.na = np.NA
>>> np.set_printoptions(nastr='na')
>>> np.array([np.na, 2.0])
array([na,  2.])

What's your list of pros and cons?

-Mark

>
> Chuck
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110819/d823e2e2/attachment.html>