On Mon, Nov 22, 2010 at 10:51 AM, <josef.pktd@gmail.com> wrote:
On Mon, Nov 22, 2010 at 1:39 PM, Keith Goodman <kwgoodman@gmail.com> wrote:
On Mon, Nov 22, 2010 at 10:32 AM, <josef.pktd@gmail.com> wrote:
On Mon, Nov 22, 2010 at 1:26 PM, Keith Goodman <kwgoodman@gmail.com> wrote:
On Mon, Nov 22, 2010 at 9:03 AM, Keith Goodman <kwgoodman@gmail.com> wrote:
@cython.boundscheck(False) @cython.wraparound(False) def nanstd_twopass(np.ndarray[np.float64_t, ndim=1] a, int ddof): "nanstd of 1d numpy array with dtype=np.float64 along axis=0." cdef Py_ssize_t i cdef int a0 = a.shape[0], count = 0 cdef np.float64_t asum = 0, a2sum=0, amean, ai, da for i in range(a0): ai = a[i] if ai == ai: asum += ai count += 1 if count > 0: amean = asum / count asum = 0 for i in range(a0): ai = a[i] if ai == ai: da = ai - amean asum += da a2sum += (da * da) asum = asum * asum return sqrt((a2sum - asum / count) / (count - ddof)) else: return np.float64(NAN)
This is 5% faster:
@cython.boundscheck(False) @cython.wraparound(False) def nanstd_1d_float64_axis0_2(np.ndarray[np.float64_t, ndim=1] a, int ddof): "nanstd of 1d numpy array with dtype=np.float64 along axis=0." cdef Py_ssize_t i cdef int a0 = a.shape[0], count = 0 cdef np.float64_t asum = 0, amean, ai for i in range(a0): ai = a[i] if ai == ai: asum += ai count += 1 if count > 0: amean = asum / count asum = 0 for i in range(a0): ai = a[i] if ai == ai: ai -= amean asum += (ai * ai) return sqrt(asum / (count - ddof)) else: return np.float64(NAN)
I think it would be better to write nanvar instead of nanstd and take the square root only in a delegating nanstd, instead of the other way around. (Also a change that should be made in scipy.stats)
Yeah, I noticed that numpy does that. I was planning to have separate var and std functions. Here's why (from the readme file, but maybe I should template it, the sqrt automatically converts large ddof to NaN):
I'm not sure what you are saying, dropping the squareroot in the function doesn't require nan handling in the inner loop. If you want to return nan when count-ddof<=0, then you could just replace
if count > 0: ...
by if count -ddof > 0: ...
Or am I missing the point?
Yes, sorry. Ignore the sqrt/nan comment. The point is that I want to be able to return the underlying, low-level cython function (see below). So I need separate var and std versions to do that (unless I can modify default kwargs on the fly).
Under the hood Nanny uses a separate Cython function for each combination of ndim, dtype, and axis. A lot of the overhead in ny.nanmax, for example, is in checking that your axis is within range, converting non-array data to an array, and selecting the function to use to calculate nanmax.
You can get rid of the overhead by doing all this before you, say, enter an inner loop:
arr = np.random.rand(10,10) axis = 0 func, a = ny.func.nanmax_selector(arr, axis) func.__name__ 'nanmax_2d_float64_axis0'
Let's see how much faster than runs:
timeit np.nanmax(arr, axis=0) 10000 loops, best of 3: 25.7 us per loop timeit ny.nanmax(arr, axis=0) 100000 loops, best of 3: 5.25 us per loop timeit func(a) 100000 loops, best of 3: 2.5 us per loop
Note that func is faster than the Numpy's non-nan version of max:
timeit arr.max(axis=0) 100000 loops, best of 3: 3.28 us per loop
So adding NaN protection to your inner loops has a negative cost!