
[Raymond Hettinger]
Computing an error term can get the bit back and putting that term back in the input queue restores the overall sum.
Right!
Once the inputs are exhausted, the components of exp2sum should be exact.
Ditto. I'll cover some subtleties below:
from math import frexp from itertools import chain
def summer2(iterable): exp2sum = {} queue = [] for x in chain(iterable, queue): mant, exp = frexp(x) while exp in exp2sum: y = exp2sum.pop(exp) z = x + y d = x - z + y # error term assert z + d == x + y
Much more is true there, but hard to assert in Python: the mathematical sum z+d is exactly equal to the mathematical sum x+y there, and the floating-point |d| is less than 1 unit in the last place relative to the floating-point |z|. It would be clearer to name "z" as "hi" and "d" as "lo". More, that's not _generally_ true: it relies on that x and y have the same binary exponent. For example, pick x=1 and y=1e100, and you end up with hi=1e100 and lo=0. The mathematical x+y isn't equal to the mathematical hi+lo then. It's a real weakness of "Kahan summation" that most spellings of it don't bother to account for this; it's sufficient to normalize first, swapping x and y if necessary so that abs(x) >= abs(y) (although that isn't needed _here_ because we know they have the same binary exponent). There's another way to handle the general case that doesn't require test, branch, or abs(), but at the cost of several more addition/subtractions.
if d: queue.append(d)
If x and y have different signs, this part doesn't trigger. If all inputs are positive, then we expect it to trigger about half the time (the cases where exactly one of x's and y's least-significant bits is set). So the queue can be expected to grow to about half the length of the original iterable by the time the original iterable is exhausted.
x = z mant, exp = frexp(x) exp2sum[exp] = x return sum(sorted(exp2sum.itervalues(), key=abs), 0.0)
The implementation can be tweaked to consume the error term right away so the queue won't grow by more than few pending items.
Theorem 10 in Shewchuk's "Adaptive Precision Floating-Point Arithmetic and Fast Robust Geometric Predicates" gives the obvious <wink> correct way to do that, although as a practical matter it greatly benifits from a bit more logic to eliminate zero entries as they're produced (Shewchuk doesn't because it's not his goal to add a bunch of same-precision floats). BTW, extracting binary exponents isn't necessary in his approach (largely specializations to "perfect" 754 arithmetic of Priest's earlier less-demanding methods).
Also, the speed can be boosted by localizing frexp, exp2sum.pop, and queue.append.
I'm very glad you quit while it was still interesting <wink>. People who are paranoid about fp sums should use something like this. Even pairing is prone to disaster, given sufficiently nasty input. For example:
xs = [1, 1e100, 1, -1e100] * 10000 sum(xs) # and the obvious pairing method gives the same 0.0 sum(sorted(xs)) # the result is nearly incomprehensible -8.0076811737544552e+087 sum(sorted(xs, reverse=True)) 8.0076811737544552e+087 summer2(xs) # exactly right 20000.0