[Alex]
FWIW, when accuracy is an issue, I use:
sum(sorted(data, key=abs))
...and you may still lose a LOT of accuracy that way:-(.
Raymond, you technically reviewed the 2nd ed Cookbook -- don't you recall the sidebar about why partials are the RIGHT way to do summations? I was SO proud of that one, thought I'd made the issue clearest than anywhere previously in print...
No doubt about it. Partials are superior. My little snippet meets my needs with no fuss -- my usual situation is many small values whose accuracy gets clobbered upon encountering a handful of large values. In the draft statistics module, nondist/sandbox/statistics/statistics.py , I used yet another approach and tracked a separate error term. The advantage is adding precision while still keeping O(n) summation speed. Of course, we can always use the decimal module to add as much precision as needed. Raymond
FYI, there are a lot of ways to do accurate fp summation, but in general people worry about it too much (except for those who don't worry about it at all -- they're _really_ in trouble <0.5 wink>). One clever way is to build on that whenever |x| and |y| are within a factor of 2 of each other, x+y is exact in 754 arithmetic. So you never lose information (not even one bit) when adding two floats with the same binary exponent. That leads directly to this kind of code: from math import frexp class Summer: def __init__(self): self.exp2sum = {} def add(self, x): while 1: exp = frexp(x)[1] if exp in self.exp2sum: x += self.exp2sum.pop(exp) # exact! else: break self.exp2sum[exp] = x # trivially exact def total(self): items = self.exp2sum.items() items.sort() return sum((x for dummy, x in items), 0.0) exp2sum maps a binary exponent to a float having that exponent. If you pass a sequence of fp numbers to .add(), then ignoring underflow/overflow endcases, the key invariant is that the exact (mathematical) sum of all the numbers passed to add() so far is equal to the exact (mathematical) sum of exp2sum.values(). While it's not obvious at first, the total number of additions performed inside add() is typically a bit _less_ than the total number of times add() is called. More importantly, len(exp2sum) can never be more than about 2K. The worst case for that is having one input with each possible binary exponent, like 2.**-1000 + 2.**-999 + ... + 2.**999 + 2.**1000. No inputs are like that in real life, and exp2sum typically has no more than a few dozen entries. total() then adds those, in order of increasing exponent == in order of increasing absolute value. This can lose information, but there is no bad case for it, in part because there are typically so few addends, and in part because that no two addends have the same binary exponent implies massive cancellation can never occur. Playing with this can show why most fp apps shouldn't worry most often. Example: Get a million floats of about the same magnitude:
xs = [random.random() for dummy in xrange(1000000)]
Sum them accurately:
s = Summer() for x in xs: ... s.add(x)
No information has been lost yet (if you could look at your FPU's "inexact result" flag, you'd see that it hasn't been set yet), and the million inputs have been squashed into just a few dozen buckets:
len(s.exp2sum) 22 from pprint import pprint pprint(s.exp2sum) {-20: 8.8332388070710977e-007, -19: 1.4206079529399673e-006, -16: 1.0065260162672729e-005, -15: 2.4398649189794064e-005, -14: 5.3980784313178987e-005, -10: 0.00074737138777436485, -9: 0.0014605198999595448, -8: 0.003361820812962546, -7: 0.0063811680318408559, -5: 0.016214300821313588, -4: 0.044836286041944229, -2: 0.17325355843673518, -1: 0.46194788522906305, 3: 6.4590200674982423, 4: 11.684394209886134, 5: 24.715676913177944, 6: 49.056084672323166, 10: 767.69329043309051, 11: 1531.1560084859361, 13: 6155.484212371357, 17: 98286.760386143636, 19: 393290.34884990752}
Add those 22, smallest to largest:
s.total() 500124.06621686369
Add the originals, "left to right":
sum(xs) 500124.06621685845
So they're the same to about 14 significant decimal digits. No good if exact pennies are important, but far more precise than real-world measurements. How much does sorting first help?
xs.sort() sum(xs) 500124.06621685764
Not much! It actually moved a bit in the wrong direction (which is unusual, but them's the breaks). Using decimal as a (much more expensive) sanity check:
import decimal t = decimal.Decimal(0) for x in xs: ... t += decimal.Decimal(repr(x)) print t 500124.0662168636972127329455
Of course <wink> Summer's result is very close to that.
[Tim Peters] ...
One clever way is to build on that whenever |x| and |y| are within a factor of 2 of each other, x+y is exact in 754 arithmetic.
Ack, I'm fried. Forget that, it's wrong. The correct statement is that x-y is always exact whenever x and y are within a factor of two of each other. Summer.add() _can_ lose info -- it needs additional trickery to make it loss-free, and because x+y can lose (at most one bit) when x and y have the same binary exponent. Exercise for the abused reader <wink>.
[Tim Peters] Summer.add() _can_ lose info -- it needs additional trickery to make it loss-free, and because x+y can lose (at most one bit) when x and y have the same binary exponent.
Computing an error term can get the bit back and putting that term back in the input queue restores the overall sum. Once the inputs are exhausted, the components of exp2sum should be exact. from math import frexp from itertools import chain def summer2(iterable): exp2sum = {} queue = [] for x in chain(iterable, queue): mant, exp = frexp(x) while exp in exp2sum: y = exp2sum.pop(exp) z = x + y d = x - z + y # error term assert z + d == x + y if d: queue.append(d) x = z mant, exp = frexp(x) exp2sum[exp] = x return sum(sorted(exp2sum.itervalues(), key=abs), 0.0) The implementation can be tweaked to consume the error term right away so the queue won't grow by more than few pending items. Also, the speed can be boosted by localizing frexp, exp2sum.pop, and queue.append. Raymond Hettinger
[Raymond Hettinger]
Computing an error term can get the bit back and putting that term back in the input queue restores the overall sum.
Right!
Once the inputs are exhausted, the components of exp2sum should be exact.
Ditto. I'll cover some subtleties below:
from math import frexp from itertools import chain
def summer2(iterable): exp2sum = {} queue = [] for x in chain(iterable, queue): mant, exp = frexp(x) while exp in exp2sum: y = exp2sum.pop(exp) z = x + y d = x - z + y # error term assert z + d == x + y
Much more is true there, but hard to assert in Python: the mathematical sum z+d is exactly equal to the mathematical sum x+y there, and the floating-point |d| is less than 1 unit in the last place relative to the floating-point |z|. It would be clearer to name "z" as "hi" and "d" as "lo". More, that's not _generally_ true: it relies on that x and y have the same binary exponent. For example, pick x=1 and y=1e100, and you end up with hi=1e100 and lo=0. The mathematical x+y isn't equal to the mathematical hi+lo then. It's a real weakness of "Kahan summation" that most spellings of it don't bother to account for this; it's sufficient to normalize first, swapping x and y if necessary so that abs(x) >= abs(y) (although that isn't needed _here_ because we know they have the same binary exponent). There's another way to handle the general case that doesn't require test, branch, or abs(), but at the cost of several more addition/subtractions.
if d: queue.append(d)
If x and y have different signs, this part doesn't trigger. If all inputs are positive, then we expect it to trigger about half the time (the cases where exactly one of x's and y's least-significant bits is set). So the queue can be expected to grow to about half the length of the original iterable by the time the original iterable is exhausted.
x = z mant, exp = frexp(x) exp2sum[exp] = x return sum(sorted(exp2sum.itervalues(), key=abs), 0.0)
The implementation can be tweaked to consume the error term right away so the queue won't grow by more than few pending items.
Theorem 10 in Shewchuk's "Adaptive Precision Floating-Point Arithmetic and Fast Robust Geometric Predicates" gives the obvious <wink> correct way to do that, although as a practical matter it greatly benifits from a bit more logic to eliminate zero entries as they're produced (Shewchuk doesn't because it's not his goal to add a bunch of same-precision floats). BTW, extracting binary exponents isn't necessary in his approach (largely specializations to "perfect" 754 arithmetic of Priest's earlier less-demanding methods).
Also, the speed can be boosted by localizing frexp, exp2sum.pop, and queue.append.
I'm very glad you quit while it was still interesting <wink>. People who are paranoid about fp sums should use something like this. Even pairing is prone to disaster, given sufficiently nasty input. For example:
xs = [1, 1e100, 1, -1e100] * 10000 sum(xs) # and the obvious pairing method gives the same 0.0 sum(sorted(xs)) # the result is nearly incomprehensible -8.0076811737544552e+087 sum(sorted(xs, reverse=True)) 8.0076811737544552e+087 summer2(xs) # exactly right 20000.0
the queue can be expected to grow to about half the length of the original iterable by the time the original iterable is exhausted.
x = z mant, exp = frexp(x) exp2sum[exp] = x return sum(sorted(exp2sum.itervalues(), key=abs), 0.0)
The implementation can be tweaked to consume the error term right
away
so the queue won't grow by more than few pending items.
Theorem 10 in Shewchuk's "Adaptive Precision Floating-Point Arithmetic and Fast Robust Geometric Predicates" gives the obvious <wink> correct way to do that, although as a practical matter it greatly benifits from a bit more logic to eliminate zero entries as they're produced (Shewchuk doesn't because it's not his goal to add a bunch of same-precision floats). BTW, extracting binary exponents isn't necessary in his approach (largely specializations to "perfect" 754 arithmetic of Priest's earlier less-demanding methods).
The approach I'm using relies on being able to exactly multiply the 0 or 1 bit error term mantissas by an integer (a count of how many times the term occurs). With a Python dictionary keeping the counts, many steps are saved and the tool becomes much more memory friendly than with the previous queue approach. from math import frexp def summer3(iterable): exp2sum = {} # map to a value with a given exponent errdict = {} # map error terms to an occurrence count def addone(x, exppop=exp2sum.pop, errget=errdict.get, frexp=frexp): mant, exp = frexp(x) # extract exponent from float representation while exp in exp2sum: y = exppop(exp) # pair with a value having the same exp z = x + y # sum may be inexact by less than 1 ulp of z d = x - z + y # difference is the error term assert z + d == x + y # sum plus error term is exact! errdict[d] = errget(d, 0) + 1 # count occurrences of each term x = z # continue with new sum mant, exp = frexp(x) exp2sum[exp] = x for x in iterable: addone(x) while errdict: d, q = errdict.popitem() assert frexp(d)[0] in [-0.5, 0.0, 0.5] # error terms are 1 bit # product of 1 bit error term with an integer count is exact addone(d * q) dummy = errdict.pop(0.0, None) # At this point, the errdict is empty, all partial sums are exact, # and each partial sum has a different exponent. They can now be # added smallest absolute value to largest and only lose bits that # cannot fit in the final result. IOW, the final result is accurate # to full precision (assuming no representation error in the inputs). return sum(sorted(exp2sum.itervalues(), key=abs), 0.0) Aside from being a nice recipe, this was a good exercise in seeing what pure Python can do with IEEE-754 guarantees. The modest memory consumption and the typical O(n) runtime are a nice plus (no pun intended). Raymond
You guys have way too much time on your hands and neurons devoted to this stuff. I'm glad that means that I can spend the afternoon playing w/ my kids and searching python-dev when I need to add numbers =).
[Raymond Hettinger]
The approach I'm using relies on being able to exactly multiply the 0 or 1 bit error term mantissas by an integer (a count of how many times the term occurs). With a Python dictionary keeping the counts, many steps are saved and the tool becomes much more memory friendly than with the previous queue approach.
Cool! That's a nice approach. For contrast, here's one that doesn't use frexp(), and is probably faster because of that; internally, len(sums) probably won't exceed 5 in real life (but could get as large as 2K for pathological inputs, spanning fp's full dynamic range): def summer4(iterable): sums = [0.0] for x in iterable: sums.append(x) for i in xrange(len(sums)-2, -1, -1): y = sums[i] if abs(x) < abs(y): x, y = y, x hi = x+y lo = y - (hi - x) if lo: sums[i+1] = lo else: del sums[i+1] x = hi sums[0] = x return sum(reversed(sums), 0.0) In effect, that keeps an arbitrarily long list of "error terms" so that no info is lost before the final sum(). I think it's surprising at first how short that list usually is.
... Aside from being a nice recipe, this was a good exercise in seeing what pure Python can do with IEEE-754 guarantees.
Now you know I how feel everytime sometime proclaims that fp arithmetic is inherently unreliable <0.3 wink>. Learning how to use it correctly is indeed the blackest of obscure arts, though.
The modest memory consumption and the typical O(n) runtime are a nice plus (no pun intended).
Yup, they're all emminently practical if you don't care too much about speed. The one above is the fastest I tried, and it still takes some 40x longer than plain sum() over a million-element random.random() list.
[Tim]
For contrast, here's one that doesn't use frexp(), and is probably faster because of that; internally, len(sums) probably won't exceed 5 in real life (but could get as large as 2K for pathological inputs, spanning fp's full dynamic range):
def summer4(iterable): sums = [0.0] for x in iterable: sums.append(x) for i in xrange(len(sums)-2, -1, -1): y = sums[i] if abs(x) < abs(y): x, y = y, x hi = x+y lo = y - (hi - x) if lo: sums[i+1] = lo else: del sums[i+1] x = hi sums[0] = x return sum(reversed(sums), 0.0)
Here's a rewrite with the partial sums ordered by increasing magnitude. A cursor is used to write-out new, non-zero terms. That makes the list growth or shrinkage occur at the end of the list and it avoids reversed indexing. Those two changes made the code easier for me to follow. Leaving off the 0.0 makes the routine generic. It works equally well with int, long, Decimal, float, and complex. def summer5(iterable, start=0): "Binary or Decimal floating point summation accurate to full precision." partials = [] # sorted, non-overlapping partial sums for x in iterable: i = 0 # cursor for writing-out new partials sums for y in partials: if abs(x) < abs(y): x, y = y, x hi = x + y lo = y - (hi - x) if lo: partials[i] = lo i += 1 x = hi partials[i:] = [x] return sum(partials, start) The loop invariants for the list of partial sums are: assert partials == sorted(partials, key=abs) assert nonoverlapping(partials) where a rough check for overlaps is: def nonoverlapping(seqn, offset=100): """Verify that sequence elements have no overlapping bits. Set offset to -Emin to handle the full range of floats. """ cumv = 0L for elem in seqn: m, exp = frexp(abs(elem)) v = int(m * 2 ** 53) * 2L ** (exp + offset) if cumv & v: return False cumv |= v return True Raymond
participants (3)
-
David Ascher
-
Raymond Hettinger
-
Tim Peters