[Numpy-discussion] Standard Deviation (std): Suggested change for "ddof" default value

josef.pktd at gmail.com josef.pktd at gmail.com
Fri Apr 4 09:04:05 EDT 2014


On Fri, Apr 4, 2014 at 8:50 AM, Daπid <davidmenhur at gmail.com> wrote:
>
> On 2 April 2014 16:06, Sturla Molden <sturla.molden at gmail.com> wrote:
>>
>> <josef.pktd at gmail.com> wrote:
>>
>> > pandas came later and thought ddof=1 is worth more than consistency.
>>
>> Pandas is a data analysis package. NumPy is a numerical array package.
>>
>> I think ddof=1 is justified for Pandas, for consistency with statistical
>> software (SPSS et al.)
>>
>> For NumPy, there are many computational tasks where the Bessel correction
>> is not wanted, so providing a uncorrected result is the correct thing to
>> do. NumPy should be a low-level array library that does very little magic.
>
>
> All this discussion reminds me of the book "Numerical Recipes":
>
> "if the difference between N and N − 1 ever matters to you, then you
> are probably up to no good anyway — e.g., trying to substantiate a
> questionable
> hypothesis with marginal data."
>
> For any reasonably sized data set, it is a correction in the second
> significant figure.

I fully agree, but sometimes you don't have much choice.

`big data` == `statistics with negative degrees of freedom` ?

or maybe

`machine learning` == `statistics with negative degrees of freedom` ?

Josef

>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>



More information about the NumPy-Discussion mailing list