[Numpy-discussion] non-standard standard deviation
Colin J. Williams
cjw at ncf.ca
Sun Nov 29 19:30:49 EST 2009
On 29-Nov-09 17:13 PM, Dr. Phillip M. Feldman wrote:
> All of the statistical packages that I am currently using and have used in
> the past (Matlab, Minitab, R, S-plus) calculate standard deviation using the
> sqrt(1/(n-1)) normalization, which gives a result that is unbiased when
> sampling from a normally-distributed population. NumPy uses the sqrt(1/n)
> normalization. I'm currently using the following code to calculate standard
> deviations, but would much prefer if this could be fixed in NumPy itself:
> def mystd(x=numpy.array(), axis=None):
> """This function calculates the standard deviation of the input using the
> definition of standard deviation that gives an unbiased result for
> from a normally-distributed population."""
> xd= x - x.mean(axis=axis)
> return sqrt( (xd*xd).sum(axis=axis) / (numpy.size(x,axis=axis)-1.0) )
Anne Archibald has suggested a work-around. Perhaps ddof could be set,
by default to
1 as other values are rarely required.
Where the distribution of a variate is not known a priori, then I
believe that it can be shown
that the n-1 divisor provides the best estimate of the variance.
More information about the NumPy-Discussion