[Numpy-discussion] NA masks for NumPy are ready to test
Mark Wiebe
mwwiebe at gmail.com
Fri Aug 19 18:46:50 EDT 2011
On Thu, Aug 18, 2011 at 2:43 PM, Mark Wiebe <mwwiebe at gmail.com> wrote:
> It's taken a lot of changes to get the NA mask support to its current
> point, but the code ready for some testing now. You can read the
> work-in-progress release notes here:
>
>
> https://github.com/m-paradox/numpy/blob/missingdata/doc/release/2.0.0-notes.rst
>
> To try it out, check out the missingdata branch from my github account,
> here, and build in the standard way:
>
> https://github.com/m-paradox/numpy
>
> The things most important to test are:
>
> * Confirm that existing code still works correctly. I've tested against
> SciPy and matplotlib.
> * Confirm that the performance of code not using NA masks is the same or
> better.
> * Try to do computations with the NA values, find places they don't work
> yet, and nominate unimplemented functionality important to you to be next on
> the development list. The release notes have a preliminary list of
> implemented/unimplemented functions.
> * Report any crashes, build problems, or unexpected behaviors.
>
> In addition to adding the NA mask, I've also added features and done a few
> performance changes here and there, like letting reductions like sum take
> lists of axes instead of being a single axis or all of them. These changes
> affect various bugs like http://projects.scipy.org/numpy/ticket/1143 and
> http://projects.scipy.org/numpy/ticket/533.
>
With a new fix to the unitless reduction logic I just committed, the
situation for bug http://projects.scipy.org/numpy/ticket/450 is also
improved.
Cheers,
Mark
> Thanks!
> Mark
>
> Here's a small example run using NAs:
>
> >>> import numpy as np
> >>> np.__version__
> '2.0.0.dev-8a5e2a1'
> >>> a = np.random.rand(3,3,3)
> >>> a.flags.maskna = True
> >>> a[np.random.rand(3,3,3) < 0.5] = np.NA
> >>> a
> array([[[NA, NA, 0.11511708],
> [ 0.46661454, 0.47565512, NA],
> [NA, NA, NA]],
>
> [[NA, 0.57860351, NA],
> [NA, NA, 0.72012669],
> [ 0.36582123, NA, 0.76289794]],
>
> [[ 0.65322748, 0.92794386, NA],
> [ 0.53745165, 0.97520989, 0.17515083],
> [ 0.71219688, 0.5184328 , 0.75802805]]])
> >>> np.mean(a, axis=-1)
> array([[NA, NA, NA],
> [NA, NA, NA],
> [NA, 0.56260412, 0.66288591]])
> >>> np.std(a, axis=-1)
> array([[NA, NA, NA],
> [NA, NA, NA],
> [NA, 0.32710662, 0.10384331]])
> >>> np.mean(a, axis=-1, skipna=True)
> /home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2474:
> RuntimeWarning: invalid value encountered in true_divide
> um.true_divide(ret, rcount, out=ret, casting='unsafe')
> array([[ 0.11511708, 0.47113483, nan],
> [ 0.57860351, 0.72012669, 0.56435958],
> [ 0.79058567, 0.56260412, 0.66288591]])
> >>> np.std(a, axis=-1, skipna=True)
> /home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2707:
> RuntimeWarning: invalid value encountered in true_divide
> um.true_divide(arrmean, rcount, out=arrmean, casting='unsafe')
> /home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2730:
> RuntimeWarning: invalid value encountered in true_divide
> um.true_divide(ret, rcount, out=ret, casting='unsafe')
> array([[ 0. , 0.00452029, nan],
> [ 0. , 0. , 0.19853835],
> [ 0.13735819, 0.32710662, 0.10384331]])
> >>> np.std(a, axis=(1,2), skipna=True)
> array([ 0.16786895, 0.15498008, 0.23811937])
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110819/e832b21d/attachment.html>
More information about the NumPy-Discussion
mailing list