[Numpy-discussion] Standard Deviation (std): Suggested change for "ddof" default value

Nathaniel Smith njs at pobox.com
Tue Apr 1 16:08:49 EDT 2014


On Tue, Apr 1, 2014 at 9:02 PM, Sturla Molden <sturla.molden at gmail.com> wrote:
> Haslwanter Thomas <Thomas.Haslwanter at fh-linz.at> wrote:
>
>> Personally I cannot think of many applications where it would be desired
>> to calculate the standard deviation with ddof=0. In addition, I feel that
>> there should be consistency between standard modules such as numpy, scipy, and pandas.
>
> ddof=0 is the maxiumum likelihood estimate. It is also needed in Bayesian
> estimation.

It's true, but the counter-arguments are also strong. And regardless
of whether ddof=1 or ddof=0 is better, surely the same one is better
for both numpy and scipy.

> If you are not eatimating from a sample, but rather calculating for the
> whole population, you always want ddof=0.
>
> What does Matlab do by default? (Yes, it is a retorical question.)

R (which is probably a more relevant comparison) does do ddof=1 by default.

>> I am wondering if there is a good reason to stick to "ddof=0" as the
>> default for "std", or if others would agree with my suggestion to change
>> the default to "ddof=1"?
>
> It is a bad idea to suddenly break everyone's code.

It would be a disruptive transition, but OTOH having inconsistencies
like this guarantees the ongoing creation of new broken code.

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org



More information about the NumPy-Discussion mailing list