On Fri, Nov 19, 2010 at 12:10 PM,
What's the speed advantage of nanny compared to np.nansum that you have if the arrays are larger, say (1000,10) or (10000,100) axis=0 ?
Good point. In the small examples I showed so far maybe the speed up was all in overhead. Fortunately, that's not the case:
arr = np.random.rand(1000, 1000) timeit np.nansum(arr) 100 loops, best of 3: 4.79 ms per loop timeit ny.nansum(arr) 1000 loops, best of 3: 1.53 ms per loop
arr[arr > 0.5] = np.nan timeit np.nansum(arr) 10 loops, best of 3: 44.5 ms per loop timeit ny.nansum(arr) 100 loops, best of 3: 6.18 ms per loop
timeit np.nansum(arr, axis=0) 10 loops, best of 3: 52.3 ms per loop timeit ny.nansum(arr, axis=0) 100 loops, best of 3: 12.2 ms per loop
np.nansum makes a copy of the input array and makes a mask (another copy) and then uses the mask to set the NaNs to zero in the copy. So not only is nanny faster, but it uses less memory.