[Numpy-discussion] non-standard standard deviation

Sun Nov 29 20:15:35 EST 2009

On Mon, Nov 30, 2009 at 12:30 AM, Colin J. Williams <cjw at ncf.ca> wrote:
> On 29-Nov-09 17:13 PM, Dr. Phillip M. Feldman wrote:
>> All of the statistical packages that I am currently using and have used in
>> the past (Matlab, Minitab, R, S-plus) calculate standard deviation using the
>> sqrt(1/(n-1)) normalization, which gives a result that is unbiased when
>> sampling from a normally-distributed population.  NumPy uses the sqrt(1/n)
>> normalization.  I'm currently using the following code to calculate standard
>> deviations, but would much prefer if this could be fixed in NumPy itself:
>>
>> def mystd(x=numpy.array([]), axis=None):
>>     """This function calculates the standard deviation of the input using the
>>     definition of standard deviation that gives an unbiased result for
>> samples
>>     from a normally-distributed population."""
>>
>>     xd= x - x.mean(axis=axis)
>>     return sqrt( (xd*xd).sum(axis=axis) / (numpy.size(x,axis=axis)-1.0) )
>>
> Anne Archibald has suggested a work-around.  Perhaps ddof could be set,
> by default to
> 1 as other values are rarely required.
>
> Where the distribution of a variate is not known a priori, then I
> believe that it can be shown
> that the n-1 divisor provides the best estimate of the variance.

There have been previous discussions on this (but I can't find them
now) and I believe the current default was chosen deliberately. I
think it is the view of the numpy developers that the n divisor has
more desireable properties in most cases than the traditional n-1 -
see this paper by Travis Oliphant for details:
http://hdl.handle.net/1877/438

Cheers

Robin