Does np.std() make two passes through the data?
Does np.std() make two passes through the data? Numpy:
arr = np.random.rand(10) arr.std() 0.3008736260967052
Looks like an algorithm that makes one pass through the data (one for loop) wouldn't match arr.std():
np.sqrt((arr*arr).mean() - arr.mean()**2) 0.30087362609670526
But a slower two-pass algorithm would match arr.std():
np.sqrt(((arr - arr.mean())**2).mean()) 0.3008736260967052
Is there a way to get the same result as arr.std() in one pass (cython for loop) of the data?
On Sun, Nov 21, 2010 at 6:43 PM, Keith Goodman <kwgoodman@gmail.com> wrote:
Does np.std() make two passes through the data?
Numpy:
arr = np.random.rand(10) arr.std() 0.3008736260967052
Looks like an algorithm that makes one pass through the data (one for loop) wouldn't match arr.std():
np.sqrt((arr*arr).mean() - arr.mean()**2) 0.30087362609670526
But a slower two-pass algorithm would match arr.std():
np.sqrt(((arr - arr.mean())**2).mean()) 0.3008736260967052
Is there a way to get the same result as arr.std() in one pass (cython for loop) of the data?
reference several times pointed to on the list is the wikipedia page, e.g. http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#On-line_alg... I don't know about actual implementation. Josef
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Sun, Nov 21, 2010 at 4:18 PM, <josef.pktd@gmail.com> wrote:
On Sun, Nov 21, 2010 at 6:43 PM, Keith Goodman <kwgoodman@gmail.com> wrote:
Does np.std() make two passes through the data?
Numpy:
arr = np.random.rand(10) arr.std() 0.3008736260967052
Looks like an algorithm that makes one pass through the data (one for loop) wouldn't match arr.std():
np.sqrt((arr*arr).mean() - arr.mean()**2) 0.30087362609670526
But a slower two-pass algorithm would match arr.std():
np.sqrt(((arr - arr.mean())**2).mean()) 0.3008736260967052
Is there a way to get the same result as arr.std() in one pass (cython for loop) of the data?
reference several times pointed to on the list is the wikipedia page, e.g. http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#On-line_alg...
Unfortunately it doesn't give the same result as numpy's two pass algorithm.
From wikipedia:
def var(arr): n = 0 mean = 0 M2 = 0 for x in arr: n = n + 1 delta = x - mean mean = mean + delta / n M2 = M2 + delta * (x - mean) return M2 / n This set of random samples gives matching variance:
a = np.random.rand(100) a.var() 0.07867478939716277 var(a) 0.07867478939716277
But this sample gives a difference:
a = np.random.rand(100) a.var() 0.080232196646619805 var(a) 0.080232196646619791
As you know, I'm trying to make a drop-in replacement for scipy.stats.nanstd. Maybe I'll have to add an asterisk to the drop-in part. Either that, or suck it up and store the damn mean.
On Sun, Nov 21, 2010 at 19:49, Keith Goodman <kwgoodman@gmail.com> wrote:
But this sample gives a difference:
a = np.random.rand(100) a.var() 0.080232196646619805 var(a) 0.080232196646619791
As you know, I'm trying to make a drop-in replacement for scipy.stats.nanstd. Maybe I'll have to add an asterisk to the drop-in part. Either that, or suck it up and store the damn mean.
The difference is less than eps. Quite possibly, the one-pass version is even closer to the true value than the two-pass version. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
On Sun, Nov 21, 2010 at 5:56 PM, Robert Kern <robert.kern@gmail.com> wrote:
On Sun, Nov 21, 2010 at 19:49, Keith Goodman <kwgoodman@gmail.com> wrote:
But this sample gives a difference:
a = np.random.rand(100) a.var() 0.080232196646619805 var(a) 0.080232196646619791
As you know, I'm trying to make a drop-in replacement for scipy.stats.nanstd. Maybe I'll have to add an asterisk to the drop-in part. Either that, or suck it up and store the damn mean.
The difference is less than eps. Quite possibly, the one-pass version is even closer to the true value than the two-pass version.
Good, it passes the Kern test. Here's an even more robust estimate:
var(a - a.mean()) 0.080232196646619819
Which is better, numpy's two pass or the one pass on-line method?
test() NumPy error: 9.31135e-18 Nanny error: 6.5745e-18 <-- One pass wins!
def test(n=100000): numpy = 0 nanny = 0 for i in range(n): a = np.random.rand(10) truth = var(a - a.mean()) numpy += np.absolute(truth - a.var()) nanny += np.absolute(truth - var(a)) print 'NumPy error: %g' % (numpy / n) print 'Nanny error: %g' % (nanny / n) print
On Sun, Nov 21, 2010 at 5:56 PM, Robert Kern <robert.kern@gmail.com> wrote:
On Sun, Nov 21, 2010 at 19:49, Keith Goodman <kwgoodman@gmail.com> wrote:
But this sample gives a difference:
a = np.random.rand(100) a.var() 0.080232196646619805 var(a) 0.080232196646619791
As you know, I'm trying to make a drop-in replacement for scipy.stats.nanstd. Maybe I'll have to add an asterisk to the drop-in part. Either that, or suck it up and store the damn mean.
The difference is less than eps. Quite possibly, the one-pass version is even closer to the true value than the two-pass version.
I wrote 3 cython prototype implementations of nanstd for 1d float64 arrays:
a = np.random.rand(1000000)
# numpy; doesn't take care of NaNs
a.std() 0.28852169850186793
# cython of np.sqrt(((arr - arr.mean())**2).mean())
nanstd_twopass(a, ddof=0) 0.28852169850186798
# cython of np.sqrt((arr*arr).mean() - arr.mean()**2)
nanstd_simple(a, ddof=0) 0.28852169850187437
# http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#On-line_alg...
nanstd_online(a, ddof=0) 0.28852169850187243
# My target, scipy version
from scipy.stats import nanstd nanstd(a, bias=True) 0.28852169850186798
Timing:
timeit nanstd(a, bias=True) 10 loops, best of 3: 27.8 ms per loop timeit a.std() 100 loops, best of 3: 11.5 ms per loop timeit nanstd_twopass(a, ddof=0) 100 loops, best of 3: 3.24 ms per loop timeit nanstd_simple(a, ddof=0) 1000 loops, best of 3: 1.6 ms per loop timeit nanstd_online(a, ddof=0) 100 loops, best of 3: 10.8 ms per loop
nanstd_simple is the fastest but I assume the algorithm is no good for general use? I think I'll go with nanstd_twopass. It will most closely match numpy/scipy, is more robust than nanstd_simple, and is the second fastest. Here's the code. Improvements welcomed. @cython.boundscheck(False) @cython.wraparound(False) def nanstd_simple(np.ndarray[np.float64_t, ndim=1] a, int ddof): "nanstd of 1d numpy array with dtype=np.float64 along axis=0." cdef Py_ssize_t i cdef int a0 = a.shape[0], count = 0 cdef np.float64_t asum = 0, a2sum=0, ai for i in range(a0): ai = a[i] if ai == ai: asum += ai a2sum += ai * ai count += 1 if count > 0: asum = asum * asum return sqrt((a2sum - asum / count) / (count - ddof)) else: return np.float64(NAN) @cython.boundscheck(False) @cython.wraparound(False) def nanstd_online(np.ndarray[np.float64_t, ndim=1] a, int ddof): "nanstd of 1d numpy array with dtype=np.float64 along axis=0." cdef Py_ssize_t i cdef int a0 = a.shape[0], n = 0 cdef np.float64_t mean = 0, M2 = 0, delta, x for i in range(a0): x = a[i] if x == x: n += 1 delta = x - mean mean = mean + delta / n M2 = M2 + delta * (x - mean) if n > 0: return np.float64(sqrt(M2 / (n - ddof))) else: return np.float64(NAN) @cython.boundscheck(False) @cython.wraparound(False) def nanstd_twopass(np.ndarray[np.float64_t, ndim=1] a, int ddof): "nanstd of 1d numpy array with dtype=np.float64 along axis=0." cdef Py_ssize_t i cdef int a0 = a.shape[0], count = 0 cdef np.float64_t asum = 0, a2sum=0, amean, ai, da for i in range(a0): ai = a[i] if ai == ai: asum += ai count += 1 if count > 0: amean = asum / count asum = 0 for i in range(a0): ai = a[i] if ai == ai: da = ai - amean asum += da a2sum += (da * da) asum = asum * asum return sqrt((a2sum - asum / count) / (count - ddof)) else: return np.float64(NAN)
On Mon, Nov 22, 2010 at 11:03 AM, Keith Goodman <kwgoodman@gmail.com> wrote:
On Sun, Nov 21, 2010 at 5:56 PM, Robert Kern <robert.kern@gmail.com> wrote:
On Sun, Nov 21, 2010 at 19:49, Keith Goodman <kwgoodman@gmail.com> wrote:
But this sample gives a difference:
a = np.random.rand(100) a.var() 0.080232196646619805 var(a) 0.080232196646619791
As you know, I'm trying to make a drop-in replacement for scipy.stats.nanstd. Maybe I'll have to add an asterisk to the drop-in part. Either that, or suck it up and store the damn mean.
The difference is less than eps. Quite possibly, the one-pass version is even closer to the true value than the two-pass version.
I wrote 3 cython prototype implementations of nanstd for 1d float64 arrays:
a = np.random.rand(1000000)
# numpy; doesn't take care of NaNs
a.std() 0.28852169850186793
# cython of np.sqrt(((arr - arr.mean())**2).mean())
nanstd_twopass(a, ddof=0) 0.28852169850186798
# cython of np.sqrt((arr*arr).mean() - arr.mean()**2)
nanstd_simple(a, ddof=0) 0.28852169850187437
# http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#On-line_alg...
nanstd_online(a, ddof=0) 0.28852169850187243
# My target, scipy version
from scipy.stats import nanstd nanstd(a, bias=True) 0.28852169850186798
Timing:
timeit nanstd(a, bias=True) 10 loops, best of 3: 27.8 ms per loop timeit a.std() 100 loops, best of 3: 11.5 ms per loop timeit nanstd_twopass(a, ddof=0) 100 loops, best of 3: 3.24 ms per loop timeit nanstd_simple(a, ddof=0) 1000 loops, best of 3: 1.6 ms per loop timeit nanstd_online(a, ddof=0) 100 loops, best of 3: 10.8 ms per loop
nanstd_simple is the fastest but I assume the algorithm is no good for general use?
I think I'll go with nanstd_twopass. It will most closely match numpy/scipy, is more robust than nanstd_simple, and is the second fastest.
Here's the code. Improvements welcomed.
@cython.boundscheck(False) @cython.wraparound(False) def nanstd_simple(np.ndarray[np.float64_t, ndim=1] a, int ddof): "nanstd of 1d numpy array with dtype=np.float64 along axis=0." cdef Py_ssize_t i cdef int a0 = a.shape[0], count = 0 cdef np.float64_t asum = 0, a2sum=0, ai for i in range(a0): ai = a[i] if ai == ai: asum += ai a2sum += ai * ai count += 1 if count > 0: asum = asum * asum return sqrt((a2sum - asum / count) / (count - ddof)) else: return np.float64(NAN)
@cython.boundscheck(False) @cython.wraparound(False) def nanstd_online(np.ndarray[np.float64_t, ndim=1] a, int ddof): "nanstd of 1d numpy array with dtype=np.float64 along axis=0." cdef Py_ssize_t i cdef int a0 = a.shape[0], n = 0 cdef np.float64_t mean = 0, M2 = 0, delta, x for i in range(a0): x = a[i] if x == x: n += 1 delta = x - mean mean = mean + delta / n M2 = M2 + delta * (x - mean) if n > 0: return np.float64(sqrt(M2 / (n - ddof))) else: return np.float64(NAN)
@cython.boundscheck(False) @cython.wraparound(False) def nanstd_twopass(np.ndarray[np.float64_t, ndim=1] a, int ddof): "nanstd of 1d numpy array with dtype=np.float64 along axis=0." cdef Py_ssize_t i cdef int a0 = a.shape[0], count = 0 cdef np.float64_t asum = 0, a2sum=0, amean, ai, da for i in range(a0): ai = a[i] if ai == ai: asum += ai count += 1 if count > 0: amean = asum / count asum = 0 for i in range(a0): ai = a[i] if ai == ai: da = ai - amean asum += da a2sum += (da * da) asum = asum * asum return sqrt((a2sum - asum / count) / (count - ddof)) else: return np.float64(NAN)
I wonder how the results would change if the size of the array was larger than the processor cache? I still can't seem to wrap my head around the idea that a two-pass algorithm would be faster than a single-pass. Is this just a big-O thing where sometimes one algorithm will be faster than the other based on the size of the problem? Ben Root
On Mon, Nov 22, 2010 at 12:07 PM, Benjamin Root <ben.root@ou.edu> wrote:
On Mon, Nov 22, 2010 at 11:03 AM, Keith Goodman <kwgoodman@gmail.com> wrote:
On Sun, Nov 21, 2010 at 5:56 PM, Robert Kern <robert.kern@gmail.com> wrote:
On Sun, Nov 21, 2010 at 19:49, Keith Goodman <kwgoodman@gmail.com> wrote:
But this sample gives a difference:
a = np.random.rand(100) a.var() 0.080232196646619805 var(a) 0.080232196646619791
As you know, I'm trying to make a drop-in replacement for scipy.stats.nanstd. Maybe I'll have to add an asterisk to the drop-in part. Either that, or suck it up and store the damn mean.
The difference is less than eps. Quite possibly, the one-pass version is even closer to the true value than the two-pass version.
I wrote 3 cython prototype implementations of nanstd for 1d float64 arrays:
a = np.random.rand(1000000)
# numpy; doesn't take care of NaNs
a.std() 0.28852169850186793
# cython of np.sqrt(((arr - arr.mean())**2).mean())
nanstd_twopass(a, ddof=0) 0.28852169850186798
# cython of np.sqrt((arr*arr).mean() - arr.mean()**2)
nanstd_simple(a, ddof=0) 0.28852169850187437
# http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#On-line_alg...
nanstd_online(a, ddof=0) 0.28852169850187243
# My target, scipy version
from scipy.stats import nanstd nanstd(a, bias=True) 0.28852169850186798
Timing:
timeit nanstd(a, bias=True) 10 loops, best of 3: 27.8 ms per loop timeit a.std() 100 loops, best of 3: 11.5 ms per loop timeit nanstd_twopass(a, ddof=0) 100 loops, best of 3: 3.24 ms per loop timeit nanstd_simple(a, ddof=0) 1000 loops, best of 3: 1.6 ms per loop timeit nanstd_online(a, ddof=0) 100 loops, best of 3: 10.8 ms per loop
nanstd_simple is the fastest but I assume the algorithm is no good for general use?
I think I'll go with nanstd_twopass. It will most closely match numpy/scipy, is more robust than nanstd_simple, and is the second fastest.
Here's the code. Improvements welcomed.
@cython.boundscheck(False) @cython.wraparound(False) def nanstd_simple(np.ndarray[np.float64_t, ndim=1] a, int ddof): "nanstd of 1d numpy array with dtype=np.float64 along axis=0." cdef Py_ssize_t i cdef int a0 = a.shape[0], count = 0 cdef np.float64_t asum = 0, a2sum=0, ai for i in range(a0): ai = a[i] if ai == ai: asum += ai a2sum += ai * ai count += 1 if count > 0: asum = asum * asum return sqrt((a2sum - asum / count) / (count - ddof)) else: return np.float64(NAN)
@cython.boundscheck(False) @cython.wraparound(False) def nanstd_online(np.ndarray[np.float64_t, ndim=1] a, int ddof): "nanstd of 1d numpy array with dtype=np.float64 along axis=0." cdef Py_ssize_t i cdef int a0 = a.shape[0], n = 0 cdef np.float64_t mean = 0, M2 = 0, delta, x for i in range(a0): x = a[i] if x == x: n += 1 delta = x - mean mean = mean + delta / n M2 = M2 + delta * (x - mean) if n > 0: return np.float64(sqrt(M2 / (n - ddof))) else: return np.float64(NAN)
@cython.boundscheck(False) @cython.wraparound(False) def nanstd_twopass(np.ndarray[np.float64_t, ndim=1] a, int ddof): "nanstd of 1d numpy array with dtype=np.float64 along axis=0." cdef Py_ssize_t i cdef int a0 = a.shape[0], count = 0 cdef np.float64_t asum = 0, a2sum=0, amean, ai, da for i in range(a0): ai = a[i] if ai == ai: asum += ai count += 1 if count > 0: amean = asum / count asum = 0 for i in range(a0): ai = a[i] if ai == ai: da = ai - amean asum += da a2sum += (da * da) asum = asum * asum return sqrt((a2sum - asum / count) / (count - ddof)) else: return np.float64(NAN)
I wonder how the results would change if the size of the array was larger than the processor cache? I still can't seem to wrap my head around the idea that a two-pass algorithm would be faster than a single-pass. Is this just a big-O thing where sometimes one algorithm will be faster than the other based on the size of the problem?
nanstd_online requires too many divisions according to the Wikipedia article, and is useful mainly if the array doesn't fit in memory. Two pass would provide precision that we would expect in numpy, but I don't know if anyone ever tested the NIST problems for basic statistics. Josef
Ben Root
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Mon, Nov 22, 2010 at 9:13 AM, <josef.pktd@gmail.com> wrote:
Two pass would provide precision that we would expect in numpy, but I don't know if anyone ever tested the NIST problems for basic statistics.
Here are the results for their most difficult dataset. But I guess running one test doesn't mean anything. http://www.itl.nist.gov/div898/strd/univ/addinfo/numacc4.html
np.absolute(a.std(ddof=1) - 0.1) 5.5884095961911129e-10 np.absolute(nanstd_online(a, ddof=1) - 0.1) 5.5890501948763216e-10 np.absolute(nanstd_simple(a, ddof=1) - 0.1) nan # Ha! np.absolute(nanstd_twopass(a, ddof=1) - 0.1) 5.5879308125117433e-10
On Mon, Nov 22, 2010 at 12:28 PM, Keith Goodman <kwgoodman@gmail.com> wrote:
On Mon, Nov 22, 2010 at 9:13 AM, <josef.pktd@gmail.com> wrote:
Two pass would provide precision that we would expect in numpy, but I don't know if anyone ever tested the NIST problems for basic statistics.
Here are the results for their most difficult dataset. But I guess running one test doesn't mean anything.
http://www.itl.nist.gov/div898/strd/univ/addinfo/numacc4.html
np.absolute(a.std(ddof=1) - 0.1) 5.5884095961911129e-10 np.absolute(nanstd_online(a, ddof=1) - 0.1) 5.5890501948763216e-10 np.absolute(nanstd_simple(a, ddof=1) - 0.1) nan # Ha! np.absolute(nanstd_twopass(a, ddof=1) - 0.1) 5.5879308125117433e-10
Thanks, e-10 is better than I expected for a tough test, but confirms that I don't trust any statistics by more than 6 to 10 decimals or digits. Josef
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Mon, Nov 22, 2010 at 9:03 AM, Keith Goodman <kwgoodman@gmail.com> wrote:
@cython.boundscheck(False) @cython.wraparound(False) def nanstd_twopass(np.ndarray[np.float64_t, ndim=1] a, int ddof): "nanstd of 1d numpy array with dtype=np.float64 along axis=0." cdef Py_ssize_t i cdef int a0 = a.shape[0], count = 0 cdef np.float64_t asum = 0, a2sum=0, amean, ai, da for i in range(a0): ai = a[i] if ai == ai: asum += ai count += 1 if count > 0: amean = asum / count asum = 0 for i in range(a0): ai = a[i] if ai == ai: da = ai - amean asum += da a2sum += (da * da) asum = asum * asum return sqrt((a2sum - asum / count) / (count - ddof)) else: return np.float64(NAN)
This is 5% faster: @cython.boundscheck(False) @cython.wraparound(False) def nanstd_1d_float64_axis0_2(np.ndarray[np.float64_t, ndim=1] a, int ddof): "nanstd of 1d numpy array with dtype=np.float64 along axis=0." cdef Py_ssize_t i cdef int a0 = a.shape[0], count = 0 cdef np.float64_t asum = 0, amean, ai for i in range(a0): ai = a[i] if ai == ai: asum += ai count += 1 if count > 0: amean = asum / count asum = 0 for i in range(a0): ai = a[i] if ai == ai: ai -= amean asum += (ai * ai) return sqrt(asum / (count - ddof)) else: return np.float64(NAN)
On Mon, Nov 22, 2010 at 1:26 PM, Keith Goodman <kwgoodman@gmail.com> wrote:
On Mon, Nov 22, 2010 at 9:03 AM, Keith Goodman <kwgoodman@gmail.com> wrote:
@cython.boundscheck(False) @cython.wraparound(False) def nanstd_twopass(np.ndarray[np.float64_t, ndim=1] a, int ddof): "nanstd of 1d numpy array with dtype=np.float64 along axis=0." cdef Py_ssize_t i cdef int a0 = a.shape[0], count = 0 cdef np.float64_t asum = 0, a2sum=0, amean, ai, da for i in range(a0): ai = a[i] if ai == ai: asum += ai count += 1 if count > 0: amean = asum / count asum = 0 for i in range(a0): ai = a[i] if ai == ai: da = ai - amean asum += da a2sum += (da * da) asum = asum * asum return sqrt((a2sum - asum / count) / (count - ddof)) else: return np.float64(NAN)
This is 5% faster:
@cython.boundscheck(False) @cython.wraparound(False) def nanstd_1d_float64_axis0_2(np.ndarray[np.float64_t, ndim=1] a, int ddof): "nanstd of 1d numpy array with dtype=np.float64 along axis=0." cdef Py_ssize_t i cdef int a0 = a.shape[0], count = 0 cdef np.float64_t asum = 0, amean, ai for i in range(a0): ai = a[i] if ai == ai: asum += ai count += 1 if count > 0: amean = asum / count asum = 0 for i in range(a0): ai = a[i] if ai == ai: ai -= amean asum += (ai * ai) return sqrt(asum / (count - ddof)) else: return np.float64(NAN)
I think it would be better to write nanvar instead of nanstd and take the square root only in a delegating nanstd, instead of the other way around. (Also a change that should be made in scipy.stats) Josef
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Mon, Nov 22, 2010 at 10:32 AM, <josef.pktd@gmail.com> wrote:
On Mon, Nov 22, 2010 at 1:26 PM, Keith Goodman <kwgoodman@gmail.com> wrote:
On Mon, Nov 22, 2010 at 9:03 AM, Keith Goodman <kwgoodman@gmail.com> wrote:
@cython.boundscheck(False) @cython.wraparound(False) def nanstd_twopass(np.ndarray[np.float64_t, ndim=1] a, int ddof): "nanstd of 1d numpy array with dtype=np.float64 along axis=0." cdef Py_ssize_t i cdef int a0 = a.shape[0], count = 0 cdef np.float64_t asum = 0, a2sum=0, amean, ai, da for i in range(a0): ai = a[i] if ai == ai: asum += ai count += 1 if count > 0: amean = asum / count asum = 0 for i in range(a0): ai = a[i] if ai == ai: da = ai - amean asum += da a2sum += (da * da) asum = asum * asum return sqrt((a2sum - asum / count) / (count - ddof)) else: return np.float64(NAN)
This is 5% faster:
@cython.boundscheck(False) @cython.wraparound(False) def nanstd_1d_float64_axis0_2(np.ndarray[np.float64_t, ndim=1] a, int ddof): "nanstd of 1d numpy array with dtype=np.float64 along axis=0." cdef Py_ssize_t i cdef int a0 = a.shape[0], count = 0 cdef np.float64_t asum = 0, amean, ai for i in range(a0): ai = a[i] if ai == ai: asum += ai count += 1 if count > 0: amean = asum / count asum = 0 for i in range(a0): ai = a[i] if ai == ai: ai -= amean asum += (ai * ai) return sqrt(asum / (count - ddof)) else: return np.float64(NAN)
I think it would be better to write nanvar instead of nanstd and take the square root only in a delegating nanstd, instead of the other way around. (Also a change that should be made in scipy.stats)
Yeah, I noticed that numpy does that. I was planning to have separate var and std functions. Here's why (from the readme file, but maybe I should template it, the sqrt automatically converts large ddof to NaN): Under the hood Nanny uses a separate Cython function for each combination of ndim, dtype, and axis. A lot of the overhead in ny.nanmax, for example, is in checking that your axis is within range, converting non-array data to an array, and selecting the function to use to calculate nanmax. You can get rid of the overhead by doing all this before you, say, enter an inner loop:
arr = np.random.rand(10,10) axis = 0 func, a = ny.func.nanmax_selector(arr, axis) func.__name__ 'nanmax_2d_float64_axis0'
Let's see how much faster than runs:
timeit np.nanmax(arr, axis=0) 10000 loops, best of 3: 25.7 us per loop timeit ny.nanmax(arr, axis=0) 100000 loops, best of 3: 5.25 us per loop timeit func(a) 100000 loops, best of 3: 2.5 us per loop
Note that func is faster than the Numpy's non-nan version of max:
timeit arr.max(axis=0) 100000 loops, best of 3: 3.28 us per loop
So adding NaN protection to your inner loops has a negative cost!
On Mon, Nov 22, 2010 at 1:39 PM, Keith Goodman <kwgoodman@gmail.com> wrote:
On Mon, Nov 22, 2010 at 10:32 AM, <josef.pktd@gmail.com> wrote:
On Mon, Nov 22, 2010 at 1:26 PM, Keith Goodman <kwgoodman@gmail.com> wrote:
On Mon, Nov 22, 2010 at 9:03 AM, Keith Goodman <kwgoodman@gmail.com> wrote:
@cython.boundscheck(False) @cython.wraparound(False) def nanstd_twopass(np.ndarray[np.float64_t, ndim=1] a, int ddof): "nanstd of 1d numpy array with dtype=np.float64 along axis=0." cdef Py_ssize_t i cdef int a0 = a.shape[0], count = 0 cdef np.float64_t asum = 0, a2sum=0, amean, ai, da for i in range(a0): ai = a[i] if ai == ai: asum += ai count += 1 if count > 0: amean = asum / count asum = 0 for i in range(a0): ai = a[i] if ai == ai: da = ai - amean asum += da a2sum += (da * da) asum = asum * asum return sqrt((a2sum - asum / count) / (count - ddof)) else: return np.float64(NAN)
This is 5% faster:
@cython.boundscheck(False) @cython.wraparound(False) def nanstd_1d_float64_axis0_2(np.ndarray[np.float64_t, ndim=1] a, int ddof): "nanstd of 1d numpy array with dtype=np.float64 along axis=0." cdef Py_ssize_t i cdef int a0 = a.shape[0], count = 0 cdef np.float64_t asum = 0, amean, ai for i in range(a0): ai = a[i] if ai == ai: asum += ai count += 1 if count > 0: amean = asum / count asum = 0 for i in range(a0): ai = a[i] if ai == ai: ai -= amean asum += (ai * ai) return sqrt(asum / (count - ddof)) else: return np.float64(NAN)
I think it would be better to write nanvar instead of nanstd and take the square root only in a delegating nanstd, instead of the other way around. (Also a change that should be made in scipy.stats)
Yeah, I noticed that numpy does that. I was planning to have separate var and std functions. Here's why (from the readme file, but maybe I should template it, the sqrt automatically converts large ddof to NaN):
I'm not sure what you are saying, dropping the squareroot in the function doesn't require nan handling in the inner loop. If you want to return nan when count-ddof<=0, then you could just replace if count > 0: ... by if count -ddof > 0: ... Or am I missing the point? Josef
Under the hood Nanny uses a separate Cython function for each combination of ndim, dtype, and axis. A lot of the overhead in ny.nanmax, for example, is in checking that your axis is within range, converting non-array data to an array, and selecting the function to use to calculate nanmax.
You can get rid of the overhead by doing all this before you, say, enter an inner loop:
arr = np.random.rand(10,10) axis = 0 func, a = ny.func.nanmax_selector(arr, axis) func.__name__ 'nanmax_2d_float64_axis0'
Let's see how much faster than runs:
timeit np.nanmax(arr, axis=0) 10000 loops, best of 3: 25.7 us per loop timeit ny.nanmax(arr, axis=0) 100000 loops, best of 3: 5.25 us per loop timeit func(a) 100000 loops, best of 3: 2.5 us per loop
Note that func is faster than the Numpy's non-nan version of max:
timeit arr.max(axis=0) 100000 loops, best of 3: 3.28 us per loop
So adding NaN protection to your inner loops has a negative cost! _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Mon, Nov 22, 2010 at 10:51 AM, <josef.pktd@gmail.com> wrote:
On Mon, Nov 22, 2010 at 1:39 PM, Keith Goodman <kwgoodman@gmail.com> wrote:
On Mon, Nov 22, 2010 at 10:32 AM, <josef.pktd@gmail.com> wrote:
On Mon, Nov 22, 2010 at 1:26 PM, Keith Goodman <kwgoodman@gmail.com> wrote:
On Mon, Nov 22, 2010 at 9:03 AM, Keith Goodman <kwgoodman@gmail.com> wrote:
@cython.boundscheck(False) @cython.wraparound(False) def nanstd_twopass(np.ndarray[np.float64_t, ndim=1] a, int ddof): "nanstd of 1d numpy array with dtype=np.float64 along axis=0." cdef Py_ssize_t i cdef int a0 = a.shape[0], count = 0 cdef np.float64_t asum = 0, a2sum=0, amean, ai, da for i in range(a0): ai = a[i] if ai == ai: asum += ai count += 1 if count > 0: amean = asum / count asum = 0 for i in range(a0): ai = a[i] if ai == ai: da = ai - amean asum += da a2sum += (da * da) asum = asum * asum return sqrt((a2sum - asum / count) / (count - ddof)) else: return np.float64(NAN)
This is 5% faster:
@cython.boundscheck(False) @cython.wraparound(False) def nanstd_1d_float64_axis0_2(np.ndarray[np.float64_t, ndim=1] a, int ddof): "nanstd of 1d numpy array with dtype=np.float64 along axis=0." cdef Py_ssize_t i cdef int a0 = a.shape[0], count = 0 cdef np.float64_t asum = 0, amean, ai for i in range(a0): ai = a[i] if ai == ai: asum += ai count += 1 if count > 0: amean = asum / count asum = 0 for i in range(a0): ai = a[i] if ai == ai: ai -= amean asum += (ai * ai) return sqrt(asum / (count - ddof)) else: return np.float64(NAN)
I think it would be better to write nanvar instead of nanstd and take the square root only in a delegating nanstd, instead of the other way around. (Also a change that should be made in scipy.stats)
Yeah, I noticed that numpy does that. I was planning to have separate var and std functions. Here's why (from the readme file, but maybe I should template it, the sqrt automatically converts large ddof to NaN):
I'm not sure what you are saying, dropping the squareroot in the function doesn't require nan handling in the inner loop. If you want to return nan when count-ddof<=0, then you could just replace
if count > 0: ...
by if count -ddof > 0: ...
Or am I missing the point?
Yes, sorry. Ignore the sqrt/nan comment. The point is that I want to be able to return the underlying, low-level cython function (see below). So I need separate var and std versions to do that (unless I can modify default kwargs on the fly).
Under the hood Nanny uses a separate Cython function for each combination of ndim, dtype, and axis. A lot of the overhead in ny.nanmax, for example, is in checking that your axis is within range, converting non-array data to an array, and selecting the function to use to calculate nanmax.
You can get rid of the overhead by doing all this before you, say, enter an inner loop:
arr = np.random.rand(10,10) axis = 0 func, a = ny.func.nanmax_selector(arr, axis) func.__name__ 'nanmax_2d_float64_axis0'
Let's see how much faster than runs:
timeit np.nanmax(arr, axis=0) 10000 loops, best of 3: 25.7 us per loop timeit ny.nanmax(arr, axis=0) 100000 loops, best of 3: 5.25 us per loop timeit func(a) 100000 loops, best of 3: 2.5 us per loop
Note that func is faster than the Numpy's non-nan version of max:
timeit arr.max(axis=0) 100000 loops, best of 3: 3.28 us per loop
So adding NaN protection to your inner loops has a negative cost!
On Mon, Nov 22, 2010 at 1:59 PM, Keith Goodman <kwgoodman@gmail.com> wrote:
On Mon, Nov 22, 2010 at 10:51 AM, <josef.pktd@gmail.com> wrote:
On Mon, Nov 22, 2010 at 1:39 PM, Keith Goodman <kwgoodman@gmail.com> wrote:
On Mon, Nov 22, 2010 at 10:32 AM, <josef.pktd@gmail.com> wrote:
On Mon, Nov 22, 2010 at 1:26 PM, Keith Goodman <kwgoodman@gmail.com> wrote:
On Mon, Nov 22, 2010 at 9:03 AM, Keith Goodman <kwgoodman@gmail.com> wrote:
@cython.boundscheck(False) @cython.wraparound(False) def nanstd_twopass(np.ndarray[np.float64_t, ndim=1] a, int ddof): "nanstd of 1d numpy array with dtype=np.float64 along axis=0." cdef Py_ssize_t i cdef int a0 = a.shape[0], count = 0 cdef np.float64_t asum = 0, a2sum=0, amean, ai, da for i in range(a0): ai = a[i] if ai == ai: asum += ai count += 1 if count > 0: amean = asum / count asum = 0 for i in range(a0): ai = a[i] if ai == ai: da = ai - amean asum += da a2sum += (da * da) asum = asum * asum return sqrt((a2sum - asum / count) / (count - ddof)) else: return np.float64(NAN)
This is 5% faster:
@cython.boundscheck(False) @cython.wraparound(False) def nanstd_1d_float64_axis0_2(np.ndarray[np.float64_t, ndim=1] a, int ddof): "nanstd of 1d numpy array with dtype=np.float64 along axis=0." cdef Py_ssize_t i cdef int a0 = a.shape[0], count = 0 cdef np.float64_t asum = 0, amean, ai for i in range(a0): ai = a[i] if ai == ai: asum += ai count += 1 if count > 0: amean = asum / count asum = 0 for i in range(a0): ai = a[i] if ai == ai: ai -= amean asum += (ai * ai) return sqrt(asum / (count - ddof)) else: return np.float64(NAN)
I think it would be better to write nanvar instead of nanstd and take the square root only in a delegating nanstd, instead of the other way around. (Also a change that should be made in scipy.stats)
Yeah, I noticed that numpy does that. I was planning to have separate var and std functions. Here's why (from the readme file, but maybe I should template it, the sqrt automatically converts large ddof to NaN):
I'm not sure what you are saying, dropping the squareroot in the function doesn't require nan handling in the inner loop. If you want to return nan when count-ddof<=0, then you could just replace
if count > 0: ...
by if count -ddof > 0: ...
Or am I missing the point?
Yes, sorry. Ignore the sqrt/nan comment. The point is that I want to be able to return the underlying, low-level cython function (see below). So I need separate var and std versions to do that (unless I can modify default kwargs on the fly).
I think you could still delegate at the cython level, but given the amount of code duplication that is required, I don't expect it to make much difference for staying DRY. Josef
Under the hood Nanny uses a separate Cython function for each combination of ndim, dtype, and axis. A lot of the overhead in ny.nanmax, for example, is in checking that your axis is within range, converting non-array data to an array, and selecting the function to use to calculate nanmax.
You can get rid of the overhead by doing all this before you, say, enter an inner loop:
arr = np.random.rand(10,10) axis = 0 func, a = ny.func.nanmax_selector(arr, axis) func.__name__ 'nanmax_2d_float64_axis0'
Let's see how much faster than runs:
timeit np.nanmax(arr, axis=0) 10000 loops, best of 3: 25.7 us per loop timeit ny.nanmax(arr, axis=0) 100000 loops, best of 3: 5.25 us per loop timeit func(a) 100000 loops, best of 3: 2.5 us per loop
Note that func is faster than the Numpy's non-nan version of max:
timeit arr.max(axis=0) 100000 loops, best of 3: 3.28 us per loop
So adding NaN protection to your inner loops has a negative cost!
NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Mon, Nov 22, 2010 at 1:51 PM, <josef.pktd@gmail.com> wrote:
On Mon, Nov 22, 2010 at 1:39 PM, Keith Goodman <kwgoodman@gmail.com> wrote:
On Mon, Nov 22, 2010 at 10:32 AM, <josef.pktd@gmail.com> wrote:
On Mon, Nov 22, 2010 at 1:26 PM, Keith Goodman <kwgoodman@gmail.com> wrote:
On Mon, Nov 22, 2010 at 9:03 AM, Keith Goodman <kwgoodman@gmail.com> wrote:
@cython.boundscheck(False) @cython.wraparound(False) def nanstd_twopass(np.ndarray[np.float64_t, ndim=1] a, int ddof): "nanstd of 1d numpy array with dtype=np.float64 along axis=0." cdef Py_ssize_t i cdef int a0 = a.shape[0], count = 0 cdef np.float64_t asum = 0, a2sum=0, amean, ai, da for i in range(a0): ai = a[i] if ai == ai: asum += ai count += 1 if count > 0: amean = asum / count asum = 0 for i in range(a0): ai = a[i] if ai == ai: da = ai - amean asum += da a2sum += (da * da) asum = asum * asum return sqrt((a2sum - asum / count) / (count - ddof)) else: return np.float64(NAN)
This is 5% faster:
@cython.boundscheck(False) @cython.wraparound(False) def nanstd_1d_float64_axis0_2(np.ndarray[np.float64_t, ndim=1] a, int ddof): "nanstd of 1d numpy array with dtype=np.float64 along axis=0." cdef Py_ssize_t i cdef int a0 = a.shape[0], count = 0 cdef np.float64_t asum = 0, amean, ai for i in range(a0): ai = a[i] if ai == ai: asum += ai count += 1 if count > 0: amean = asum / count asum = 0 for i in range(a0): ai = a[i] if ai == ai: ai -= amean asum += (ai * ai) return sqrt(asum / (count - ddof)) else: return np.float64(NAN)
I think it would be better to write nanvar instead of nanstd and take the square root only in a delegating nanstd, instead of the other way around. (Also a change that should be made in scipy.stats)
Yeah, I noticed that numpy does that. I was planning to have separate var and std functions. Here's why (from the readme file, but maybe I should template it,
the sqrt automatically converts large ddof to NaN):
I don't think that works for complex numbers. (statsmodels has now a preference that calculations work also for complex numbers) Josef
I'm not sure what you are saying, dropping the squareroot in the function doesn't require nan handling in the inner loop. If you want to return nan when count-ddof<=0, then you could just replace
if count > 0: ...
by if count -ddof > 0: ...
Or am I missing the point?
Josef
Under the hood Nanny uses a separate Cython function for each combination of ndim, dtype, and axis. A lot of the overhead in ny.nanmax, for example, is in checking that your axis is within range, converting non-array data to an array, and selecting the function to use to calculate nanmax.
You can get rid of the overhead by doing all this before you, say, enter an inner loop:
arr = np.random.rand(10,10) axis = 0 func, a = ny.func.nanmax_selector(arr, axis) func.__name__ 'nanmax_2d_float64_axis0'
Let's see how much faster than runs:
timeit np.nanmax(arr, axis=0) 10000 loops, best of 3: 25.7 us per loop timeit ny.nanmax(arr, axis=0) 100000 loops, best of 3: 5.25 us per loop timeit func(a) 100000 loops, best of 3: 2.5 us per loop
Note that func is faster than the Numpy's non-nan version of max:
timeit arr.max(axis=0) 100000 loops, best of 3: 3.28 us per loop
So adding NaN protection to your inner loops has a negative cost! _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Mon, Nov 22, 2010 at 11:00 AM, <josef.pktd@gmail.com> wrote:
I don't think that works for complex numbers. (statsmodels has now a preference that calculations work also for complex numbers)
I'm only supporting int32, int64, float64 for now. Getting the other ints and floats should be easy. I don't have plans for complex numbers. If it's not a big change then someone can add support later.
On Mon, Nov 22, 2010 at 2:04 PM, Keith Goodman <kwgoodman@gmail.com> wrote:
On Mon, Nov 22, 2010 at 11:00 AM, <josef.pktd@gmail.com> wrote:
I don't think that works for complex numbers. (statsmodels has now a preference that calculations work also for complex numbers)
I'm only supporting int32, int64, float64 for now. Getting the other ints and floats should be easy. I don't have plans for complex numbers. If it's not a big change then someone can add support later.
Fair enough, but if you need numerical derivatives, then complex support looks useful. Josef
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Sun, Nov 21, 2010 at 17:43, Keith Goodman <kwgoodman@gmail.com> wrote:
Does np.std() make two passes through the data?
Yes. See PyArray_Std() in numpy/core/src/calculation.c -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
participants (4)
-
Benjamin Root -
josef.pktd@gmail.com -
Keith Goodman -
Robert Kern