[issue20499] Rounding errors with statistics.variance

Oscar Benjamin report at bugs.python.org
Fri Feb 7 16:43:39 CET 2014


Oscar Benjamin added the comment:

A fast Decimal.as_integer_ratio() would be useful in any case.

If you're going to use decimals though then you can trap inexact and
keep increasing the precision until it becomes exact. The problem is
with rationals that cannot be expressed in a finite number of decimal
digits - these need to be handled separately. I've attached
decimalsum.py that shows how to compute an exact sum of any mix of
int, float and Decimal, but not Fraction.

When I looked at this before, having special cases for everything from
int to float to Decimal to Fraction makes the code really complicated.
The common cases are int and float. For these cases sum() and fsum()
are much faster. However you need to also have code that checks
everything in the iterable.

One option is to do something like:

import math
import itertools
from decimal import Decimal
from decimalsum import decimalsum

def _sum(numbers):
    subtotals = []
    for T, nums in itertools.groupby(numbers, type):
        if T is int:
            subtotals.append(sum(nums))
        elif T is float:
            subtotals.append(math.fsum(nums))
        elif T is Decimal:
            subtotals.append(decimalsum(nums))
        else:
            raise NotImplementedError
    return decimalsum(subtotals)

The main problem here is that fsum rounds every time it returns
meaning that this sum is order-dependent if there are a mix of floats
and other types (See issue19086 where I asked for way to change that).

Also having separate code blocks to manage all the different types
internally in e.g. the less trivial variance calculations is tedious.

----------
Added file: http://bugs.python.org/file33960/decimalsum.py

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue20499>
_______________________________________
-------------- next part --------------
from decimal import getcontext, Inexact, Decimal

def decimalsum(iterable, start=Decimal('0')):
    '''Exact sum of Decimal/int/float mix; Result is *unrounded*'''
    if not isinstance(start, Decimal):
        start = Decimal(start)
    # We need our own context and we can't just set it once because
    # the loop could be over a generator/iterator/coroutine
    ctx = getcontext().copy()
    ctx.traps[Inexact] = True
    one = Decimal(1)

    total = start
    for x in iterable:
        if not isinstance(x, Decimal):
            x = Decimal(x)
        # Increase the precision until we get an exact result.
        while True:
            try:
                total = total.fma(one, x, ctx)
                break
            except Inexact:
                ctx.prec *= 2

    # Result is exact and unrounded.
    return total


D = Decimal
assert decimalsum([D("1.02"), 3e100, D("0.98"), -3e100]) == 2


More information about the Python-bugs-list mailing list