[Numpy-discussion] Faster NaN functions

Fri Dec 31 12:20:45 EST 2010

On Fri, Dec 31, 2010 at 8:21 AM, Lev Givon <lev at columbia.edu> wrote:
> Received from Erik Rigtorp on Fri, Dec 31, 2010 at 08:52:53AM EST:
>> Hi,
>>
>> I just send a pull request for some faster NaN functions,
>> https://github.com/rigtorp/numpy.
>>
>> I implemented the following generalized ufuncs: nansum(), nancumsum(),
>> nanmean(), nanstd() and for fun mean() and std(). It turns out that
>> the generalized ufunc mean() and std() is faster than the current
>> numpy functions. I'm also going to add nanprod(), nancumprod(),
>> nanmax(), nanmin(), nanargmax(), nanargmin().
>>
>> The current implementation is not optimized in any way and there are
>> probably some speedups possible.
>>
>> I hope we can get this into numpy 2.0, me and people around me seems
>> to have a need for these functions.
>>
>> Erik
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
> How does this compare to Bottleneck?
>
> http://pypi.python.org/pypi/Bottleneck/

I had all sorts of problems with ABI differences (this is the first
time I've tried numpy 2.0). So I couldn't get ipython, etc to work
with Erik's new nan functions. That's why my speed comparison below
might be hard to follow and only tests one example.

For timing I used bottleneck's autotimeit function:

>>> from bottleneck.benchmark.autotimeit import autotimeit

First Erik's new nanmean:

>>> stmt = "nanmean2(a.flat)"
>>> setup = "import numpy as np; from numpy.core.umath_tests import nanmean as nanmean2;  rs=np.random.RandomState([1,2,3]); a = rs.rand(100,100)"
>>> autotimeit(stmt, setup)
5.1356482505798338e-05

Bottleneck's low level nanmean:

>> stmt = "nanmean(a)"
>> setup = "import numpy as np; from bottleneck.func import nanmean_2d_float64_axisNone as nanmean; rs=np.random.RandomState([1,2,3]); a = rs.rand(100,100)"
>> autotimeit(stmt, setup)
   1.5422070026397704e-05

Bottleneck's high level nanmean:

>> setup = "import numpy as np; from bottleneck.func import nanmean; rs=np.random.RandomState([1,2,3]); a = rs.rand(100,100)"
>> autotimeit(stmt, setup)
   1.7850480079650879e-05

Numpy's mean:

>> setup = "import numpy as np; from numpy import mean; rs=np.random.RandomState([1,2,3]); a = rs.rand(100,100)"
>> stmt = "mean(a)"
>> autotimeit(stmt, setup)
   1.6718170642852782e-05

Scipy's nanmean:

>> setup = "import numpy as np; from scipy.stats import nanmean; rs=np.random.RandomState([1,2,3]); a = rs.rand(100,100)"
>> stmt = "nanmean(a)"
>> autotimeit(stmt, setup)
   0.00024667191505432128

The tests above should be repeated for arrays that contain NaNs, and
for different array sizes and different axes. Bottleneck's benchmark
suite can be modified to do all that but I can't import Erik's new
numpy and bottleneck at the same time at the moment.