[Numpy-discussion] Standard Deviation (std): Suggested change for "ddof" default value
Sturla Molden
sturla.molden at gmail.com
Wed Apr 2 10:06:08 EDT 2014
<josef.pktd at gmail.com> wrote:
> pandas came later and thought ddof=1 is worth more than consistency.
Pandas is a data analysis package. NumPy is a numerical array package.
I think ddof=1 is justified for Pandas, for consistency with statistical
software (SPSS et al.)
For NumPy, there are many computational tasks where the Bessel correction
is not wanted, so providing a uncorrected result is the correct thing to
do. NumPy should be a low-level array library that does very little magic.
Those who need the Bessel correction can multiply with sqrt(n/float(n-1))
or specify ddof. Bu that belongs in the docs.
Sturla
P.S. Personally I am not convinced "unbiased" is ever a valid argument, as
the biased estimator has smaller error. This is from experience in
marksmanship: I'd rather shoot a tight series with small systematic error
than scatter my bullets wildly but "unbiased" on the target. It is the
total error that counts. The series with smallest total error gets the best
score. It is better to shoot two series and calibrate the sight in between
than use a calibration-free sight that don't allow us to aim. That's why I
think classical statistics got this one wrong. Unbiased is never a virtue,
but the smallest error is. Thus, if we are to repeat an experiment, we
should calibrate our estimator just like a marksman calibrates his sight.
But the aim should always be calibrated to give the smallest error, not an
unbiased scatter. Noone in their right mind would claim a shotgun is more
precise than a rifle because it has smaller bias. But that is what applying
the Bessel correction implies.
More information about the NumPy-Discussion
mailing list