[issue20499] Rounding errors with statistics.variance

Oscar Benjamin report at bugs.python.org
Mon Feb 3 13:46:19 CET 2014


New submission from Oscar Benjamin:

The mean/variance functions in the statistics module don't quite round correctly.

The reasons for this are that although exact rational arithmetic is used internally in the _sum function it is not used throughout the module. In particular the _sum function should be changed to return an exact result and exact arithmetic should be used right up to the point before returning to the user at which point a rounding/coercion should be used to give the user their answer in the appropriate type correctly rounded once.

Using exact arithmetic everywhere makes it possible to replace all of the variance* functions with single-pass algorithms based on the computational formula for variance which should be more efficient as well.

For example the following implements pvariance so that it returns a perfectly rounded result for float input and output in a single pass:

def pvariance(data):
    sx = 0
    sx2 = 0
    for n, x in enumerate(map(Fraction, data), 1):
        sx += x
        sx2 += x ** 2
    Ex = sx / n
    Ex2 = sx2 / n
    var = Ex2 - Ex ** 2
    return float(var)

Comparing the above with the statistics module:

>>> pvariance([0, 0, 1])
0.2222222222222222
>>> statistics.pvariance([0, 0, 1])
0.22222222222222224

The true answer is:

>>> from fractions import Fraction as F
>>> float(statistics.pvariance([F(0), F(0), F(1)]))
0.2222222222222222

The logic in the _sum function for computing exact integer ratios and coercing back to the output type could be moved into utility functions so that it does not need to be duplicated.

Some examples of rounding issues:

>>> from statistics import variance, mean
>>> from decimal import Decimal as D, getcontext
>>> from fractions import Fraction as F

Variance with ints or floats returns a float but the float is not quite the nearest possible float:

>>> variance([0, 0, 2])
1.3333333333333335
>>> float(variance([F(0), F(0), F(2)]))  # true result rounded once
1.3333333333333333

Another example with Decimal:

>>> getcontext().prec = 5
>>> getcontext()
Context(prec=5, rounding=ROUND_HALF_EVEN, Emin=-999999999, Emax=999999999, capitals=1, clamp=0, flags=[Rounded, Inexact], traps=[DivisionByZero, Overflow, InvalidOperation])

>>> variance([D(0), D(0), D(2)] * 2)  # Rounded down instead of up
Decimal('1.0666')
>>> r = (variance([F(0), F(0), F(2)] * 2))
>>> D(r.numerator) / r.denominator  # Correctly rounded
Decimal('1.0667')

The mean function may also not be correctly rounded:

>>> getcontext().prec = 2
>>> r = mean((F('1.2'), F('1.3'), F('1.55')))
>>> r
Fraction(27, 20)
>>> D(r.numerator) / r.denominator # Correctly rounded
Decimal('1.4')
>>> mean([D('1.2'), D('1.3'), D('1.55')])
Decimal('1.3')

----------
components: Library (Lib)
messages: 210121
nosy: oscarbenjamin, stevenjd
priority: normal
severity: normal
status: open
title: Rounding errors with statistics.variance
versions: Python 3.5

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue20499>
_______________________________________


More information about the Python-bugs-list mailing list