On Sun, Nov 21, 2010 at 5:56 PM, Robert Kern <robert.kern@gmail.com> wrote:
On Sun, Nov 21, 2010 at 19:49, Keith Goodman <kwgoodman@gmail.com> wrote:
But this sample gives a difference:
a = np.random.rand(100) a.var() 0.080232196646619805 var(a) 0.080232196646619791
As you know, I'm trying to make a drop-in replacement for scipy.stats.nanstd. Maybe I'll have to add an asterisk to the drop-in part. Either that, or suck it up and store the damn mean.
The difference is less than eps. Quite possibly, the one-pass version is even closer to the true value than the two-pass version.
Good, it passes the Kern test. Here's an even more robust estimate:
var(a - a.mean()) 0.080232196646619819
Which is better, numpy's two pass or the one pass on-line method?
test() NumPy error: 9.31135e-18 Nanny error: 6.5745e-18 <-- One pass wins!
def test(n=100000): numpy = 0 nanny = 0 for i in range(n): a = np.random.rand(10) truth = var(a - a.mean()) numpy += np.absolute(truth - a.var()) nanny += np.absolute(truth - var(a)) print 'NumPy error: %g' % (numpy / n) print 'Nanny error: %g' % (nanny / n) print