[Python-ideas] Pre-PEP: adding a statistics module to Python

Wed Aug 7 06:45:50 CEST 2013

On 7 August 2013 05:25, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> Steven D'Aprano writes:
>
>  > >   >       Consequently, the above naive mean fails this
>  > >   >       "torture test" with an error of 100%:
>  > >   >
>  > >   >           assert mean([1e30, 1, 3, -1e30]) == 1
>  > >
>  > > 100%?  This is a relative error of sqrt(2)*1e-30.
>  >
>  > I don't understand your calculation here. Where are you getting the
>  > values 2 and 1e-30 from?
>
> The standard deviation of the example data.
>
> Your calculation of relative error is statistically irrelevant, unless
> you can assert 30 decimal places of accuracy in the measurements 1e30
> and -1e30.  If you just have data and no theory about where it came
> from, the relevant unit is the standard deviation.

It depends what you're using the mean for. If you divide by the mean
(to make the new data's mean 1) an error like this can be the difference between
dividing by 0 and dividing by 5 and you get very different results in
those cases, hence error relative to the mathematically true value
*is* relevant.

Not being a statistics person I'm not able to say how often this¹ would
be the case but I wouldn't ignore it entirely either.

¹ Error relative to the true value being more significant than
relative to std. deviation