[Numpy-discussion] NA masks for NumPy are ready to test

Mark Wiebe mwwiebe at gmail.com
Fri Aug 19 18:46:50 EDT 2011


On Thu, Aug 18, 2011 at 2:43 PM, Mark Wiebe <mwwiebe at gmail.com> wrote:

> It's taken a lot of changes to get the NA mask support to its current
> point, but the code ready for some testing now. You can read the
> work-in-progress release notes here:
>
>
> https://github.com/m-paradox/numpy/blob/missingdata/doc/release/2.0.0-notes.rst
>
> To try it out, check out the missingdata branch from my github account,
> here, and build in the standard way:
>
> https://github.com/m-paradox/numpy
>
> The things most important to test are:
>
> * Confirm that existing code still works correctly. I've tested against
> SciPy and matplotlib.
> * Confirm that the performance of code not using NA masks is the same or
> better.
> * Try to do computations with the NA values, find places they don't work
> yet, and nominate unimplemented functionality important to you to be next on
> the development list. The release notes have a preliminary list of
> implemented/unimplemented functions.
> * Report any crashes, build problems, or unexpected behaviors.
>
> In addition to adding the NA mask, I've also added features and done a few
> performance changes here and there, like letting reductions like sum take
> lists of axes instead of being a single axis or all of them. These changes
> affect various bugs like http://projects.scipy.org/numpy/ticket/1143 and
> http://projects.scipy.org/numpy/ticket/533.
>

With a new fix to the unitless reduction logic I just committed, the
situation for bug http://projects.scipy.org/numpy/ticket/450 is also
improved.

Cheers,
Mark


> Thanks!
> Mark
>
> Here's a small example run using NAs:
>
> >>> import numpy as np
> >>> np.__version__
> '2.0.0.dev-8a5e2a1'
> >>> a = np.random.rand(3,3,3)
> >>> a.flags.maskna = True
> >>> a[np.random.rand(3,3,3) < 0.5] = np.NA
> >>> a
> array([[[NA, NA,  0.11511708],
>         [ 0.46661454,  0.47565512, NA],
>         [NA, NA, NA]],
>
>        [[NA,  0.57860351, NA],
>         [NA, NA,  0.72012669],
>         [ 0.36582123, NA,  0.76289794]],
>
>        [[ 0.65322748,  0.92794386, NA],
>         [ 0.53745165,  0.97520989,  0.17515083],
>         [ 0.71219688,  0.5184328 ,  0.75802805]]])
> >>> np.mean(a, axis=-1)
> array([[NA, NA, NA],
>        [NA, NA, NA],
>        [NA,  0.56260412,  0.66288591]])
> >>> np.std(a, axis=-1)
> array([[NA, NA, NA],
>        [NA, NA, NA],
>        [NA,  0.32710662,  0.10384331]])
> >>> np.mean(a, axis=-1, skipna=True)
> /home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2474:
> RuntimeWarning: invalid value encountered in true_divide
>   um.true_divide(ret, rcount, out=ret, casting='unsafe')
> array([[ 0.11511708,  0.47113483,         nan],
>        [ 0.57860351,  0.72012669,  0.56435958],
>        [ 0.79058567,  0.56260412,  0.66288591]])
> >>> np.std(a, axis=-1, skipna=True)
> /home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2707:
> RuntimeWarning: invalid value encountered in true_divide
>   um.true_divide(arrmean, rcount, out=arrmean, casting='unsafe')
> /home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2730:
> RuntimeWarning: invalid value encountered in true_divide
>   um.true_divide(ret, rcount, out=ret, casting='unsafe')
> array([[ 0.        ,  0.00452029,         nan],
>        [ 0.        ,  0.        ,  0.19853835],
>        [ 0.13735819,  0.32710662,  0.10384331]])
> >>> np.std(a, axis=(1,2), skipna=True)
> array([ 0.16786895,  0.15498008,  0.23811937])
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110819/e832b21d/attachment.html>


More information about the NumPy-Discussion mailing list