Re: [Numpy-discussion] [ANN] Nanny, faster NaN functions

20 Nov 2010


      On Fri, Nov 19, 2010 at 8:19 PM, Charles R Harris 
...
wrote:
...
On Fri, Nov 19, 2010 at 1:50 PM, Keith Goodman wrote:
...
...
On Fri, Nov 19, 2010 at 12:19 PM, Pauli Virtanen  wrote:
...
Fri, 19 Nov 2010 11:19:57 -0800, Keith Goodman wrote:
[clip]
...
My guess is that having separate underlying functions for each dtype,
ndim, and axis would be a nightmare for a large project like Numpy.
But
manageable for a focused project like nanny.
Might be easier to migrate the nan* functions to using Ufuncs.
Unless I'm missing something,
np.nanmax -> np.fmax.reduce
       np.nanmin -> np.fmin.reduce
For `nansum`, we'd need to add an ufunc `nanadd`, and for
`nanargmax/min`, we'd need `argfmin/fmax'.
How about that! I wasn't aware of fmax/fmin. Yes, I'd like a nanadd,
On Fri, Nov 19, 2010 at 12:29 PM, Keith Goodman 
wrote:
please.
...
...
...
arr = np.random.rand(1000, 1000)
arr[arr > 0.5] = np.nan
np.nanmax(arr)
  0.49999625409581072
np.fmax.reduce(arr, axis=None)
<snip>
TypeError: an integer is required
np.fmax.reduce(np.fmax.reduce(arr, axis=0), axis=0)
  0.49999625409581072
...
...
timeit np.fmax.reduce(np.fmax.reduce(arr, axis=0), axis=0)
100 loops, best of 3: 12.7 ms per loop
timeit np.nanmax(arr)
10 loops, best of 3: 39.6 ms per loop
...
...
timeit np.nanmax(arr, axis=0)
10 loops, best of 3: 46.5 ms per loop
timeit np.fmax.reduce(arr, axis=0)
100 loops, best of 3: 12.7 ms per loop
Cython is faster than np.fmax.reduce.
I wrote a cython version of np.nanmax, called nanmax below. (It only
handles the 2d, float64, axis=None case, but since the array is large
I don't think that explains the time difference).
Note that fmax.reduce is slower than np.nanmax when there are no NaNs:
...
...
arr = np.random.rand(1000, 1000)
timeit np.nanmax(arr)
100 loops, best of 3: 5.82 ms per loop
timeit np.fmax.reduce(np.fmax.reduce(arr))
100 loops, best of 3: 9.14 ms per loop
timeit nanmax(arr)
1000 loops, best of 3: 1.17 ms per loop
...
...
arr[arr > 0.5] = np.nan
...
...
timeit np.nanmax(arr)
10 loops, best of 3: 45.5 ms per loop
timeit np.fmax.reduce(np.fmax.reduce(arr))
100 loops, best of 3: 12.7 ms per loop
timeit nanmax(arr)
1000 loops, best of 3: 1.17 ms per loop
There seem to be some odd hardware/compiler dependencies. I get quite a
different pattern of times:
In [1]: arr = np.random.rand(1000, 1000)
In [2]: timeit np.nanmax(arr)
100 loops, best of 3: 10.4 ms per loop
In [3]: timeit np.fmax.reduce(arr.flat)
100 loops, best of 3: 2.09 ms per loop
In [4]: arr[arr > 0.5] = np.nan
In [5]: timeit np.nanmax(arr)
100 loops, best of 3: 12.9 ms per loop
In [6]: timeit np.fmax.reduce(arr.flat)
100 loops, best of 3: 7.09 ms per loop
I've tweaked fmax with the reduce loop option but the nanmax times don't
look like yours at all. I'm also a bit surprised that
you don't see any difference in times when the array contains a lot of
nans. I'm running on AMD Phenom, gcc 4.4.5.
However, I noticed that the build wants to be -O1 by default. I have my own
CFLAGS that make it -O2, but It looks like ubuntu's python might be built
with -O1. Hmm. That could certainly cause some odd timings.

Chuck