[Numpy-discussion] non-standard standard deviation

Colin J. Williams cjw at ncf.ca
Sun Nov 29 19:30:49 EST 2009



On 29-Nov-09 17:13 PM, Dr. Phillip M. Feldman wrote:
> All of the statistical packages that I am currently using and have used in
> the past (Matlab, Minitab, R, S-plus) calculate standard deviation using the
> sqrt(1/(n-1)) normalization, which gives a result that is unbiased when
> sampling from a normally-distributed population.  NumPy uses the sqrt(1/n)
> normalization.  I'm currently using the following code to calculate standard
> deviations, but would much prefer if this could be fixed in NumPy itself:
>
> def mystd(x=numpy.array([]), axis=None):
>     """This function calculates the standard deviation of the input using the
>     definition of standard deviation that gives an unbiased result for
> samples
>     from a normally-distributed population."""
>
>     xd= x - x.mean(axis=axis)
>     return sqrt( (xd*xd).sum(axis=axis) / (numpy.size(x,axis=axis)-1.0) )
>    
Anne Archibald has suggested a work-around.  Perhaps ddof could be set, 
by default to
1 as other values are rarely required.

Where the distribution of a variate is not known a priori, then I 
believe that it can be shown
that the n-1 divisor provides the best estimate of the variance.

Colin W.



More information about the NumPy-Discussion mailing list