[Numpy-discussion] Standard Deviation (std): Suggested change for "ddof" default value

Nathaniel Smith njs at pobox.com
Tue Apr 1 17:11:19 EDT 2014

On Tue, Apr 1, 2014 at 9:51 PM, Ralf Gommers <ralf.gommers at gmail.com> wrote:
> On Tue, Apr 1, 2014 at 10:08 PM, Nathaniel Smith <njs at pobox.com> wrote:
>> On Tue, Apr 1, 2014 at 9:02 PM, Sturla Molden <sturla.molden at gmail.com>
>> wrote:
>> > Haslwanter Thomas <Thomas.Haslwanter at fh-linz.at> wrote:
>> >
>> >> Personally I cannot think of many applications where it would be
>> >> desired
>> >> to calculate the standard deviation with ddof=0. In addition, I feel
>> >> that
>> >> there should be consistency between standard modules such as numpy,
>> >> scipy, and pandas.
>> >
>> > ddof=0 is the maxiumum likelihood estimate. It is also needed in
>> > Bayesian
>> > estimation.
>> It's true, but the counter-arguments are also strong. And regardless
>> of whether ddof=1 or ddof=0 is better, surely the same one is better
>> for both numpy and scipy.
> If we could still choose here without any costs, obviously that's true. This
> particular ship sailed a long time ago though. By the way, there isn't even
> a `scipy.stats.std`, so we're comparing with differently named functions
> (nanstd for example).

Presumably nanstd is a lot less heavily used than std, and presumably
people expect 'nanstd' to be a 'nan' version of 'std' -- what do you
think of changing nanstd to ddof=0 to match numpy? (With appropriate
FutureWarning transition, etc.)

Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh

More information about the NumPy-Discussion mailing list