[Python-Dev] sum()

Sat Mar 12 19:06:29 CET 2005

[Raymond Hettinger]
> Computing an error term can get the bit back and putting that term back
> in the input queue restores the overall sum.

Right!

>  Once the inputs are exhausted, the components of exp2sum should be exact.

Ditto.  I'll cover some subtleties below:

> from math import frexp
> from itertools import chain
>
> def summer2(iterable):
>    exp2sum = {}
>    queue = []
>    for x in chain(iterable, queue):
>        mant, exp = frexp(x)
>        while exp in exp2sum:
>            y = exp2sum.pop(exp)
>            z = x + y
>            d = x - z + y   # error term
>            assert z + d == x + y

Much more is true there, but hard to assert in Python:  the
mathematical sum z+d is exactly equal to the mathematical sum x+y
there, and the floating-point |d| is less than 1 unit in the last
place relative to the floating-point |z|.  It would be clearer to name
"z" as "hi" and "d" as "lo".

More, that's not _generally_ true:  it relies on that x and y have the
same binary exponent.  For example, pick x=1 and y=1e100, and you end
up with hi=1e100 and lo=0.  The mathematical x+y isn't equal to the
mathematical hi+lo then.  It's a real weakness of "Kahan summation"
that most spellings of it don't bother to account for this; it's
sufficient to normalize first, swapping x and y if necessary so that
abs(x) >= abs(y) (although that isn't needed _here_ because we know
they have the same binary exponent).  There's another way to handle
the general case that doesn't require test, branch, or abs(), but at
the cost of several more addition/subtractions.

>            if d:
>                queue.append(d)

If x and y have different signs, this part doesn't trigger.  If all
inputs are positive, then we expect it to trigger about half the time
(the cases where exactly one of x's and y's least-significant bits is
set).  So the queue can be expected to grow to about half the length
of the original iterable by the time the original iterable is
exhausted.

>            x = z
>            mant, exp = frexp(x)
>        exp2sum[exp] = x
>    return sum(sorted(exp2sum.itervalues(), key=abs), 0.0)
>
> The implementation can be tweaked to consume the error term right away
> so the queue won't grow by more than few pending items.

Theorem 10 in Shewchuk's "Adaptive Precision Floating-Point Arithmetic
and Fast Robust Geometric Predicates" gives the obvious <wink> correct
way to do that, although as a practical matter it greatly benifits
from a bit more logic to eliminate zero entries as they're produced
(Shewchuk doesn't because it's not his goal to add a bunch of
same-precision floats).  BTW, extracting binary exponents isn't
necessary in his approach (largely specializations to "perfect" 754
arithmetic of Priest's earlier less-demanding methods).

>  Also, the speed can be boosted by localizing frexp, exp2sum.pop, and
> queue.append.

I'm very glad you quit while it was still interesting <wink>.

People who are paranoid about fp sums should use something like this. 
Even pairing is prone to disaster, given sufficiently nasty input. 
For example:

>>> xs = [1, 1e100, 1, -1e100] * 10000
>>> sum(xs)  # and the obvious pairing method gives the same
0.0
>>> sum(sorted(xs)) # the result is nearly incomprehensible
-8.0076811737544552e+087
>>> sum(sorted(xs, reverse=True))
8.0076811737544552e+087
>>> summer2(xs)  # exactly right
20000.0