More accurate summations [was Re: This math scares me]

Fri Mar 16 22:43:57 EST 2001

And <jurgen.defurne at philips.com> writes:
 - 
 - Try summing all financial operations of one day together, or
 - better, all operations involving percentages and then adding them
 - together : cumulative errors.

More concretely, look at the Vancouver Stock Exchange index's 
problems of the late 70s-early 80s.  They had chosen to truncate 
floating point values rather than round them, causing a downward 
drift of around 20 points a month[1].  

You could say that's because of poor handling of floating-point 
arithmetic, and you'd be right.  However, just about every inaccuracy 
in floating-point financial calculations can be attributed to poor 
handling of fp arithmetic.

That doesn't mean floating point is a poor choice for all financial
programs.  Indeed, the HP financial calculators _all_ use floating
point, although decimal.  Those calculators were designed twenty-odd
years ago, and they're still selling for almost as much as a Pilot.

If you want a _very_ accurate sum for all the operations in one day, 
use the following algorithm.  It is often attributed to Kahan, but, 
if I recall his comments correctly, he's not entirely sure of the 
origins of the embodied idea.  Other names around this style of
algorithm are Moller (with a backwards slash through the o), Dekker,
Pichat, Knuth, Bohlender, and Linnainmaa in no particular order.
That's a surprising number of people for just adding numbers.  ;)

class DistilledSum:
  "Keep an accurate running sum."

  def __init__ (self):
    self.s = 0.0
    self.c = 0.0

  def add (self, x):
    "Incorporate x into the running sum."
    y = x - self.c
    t = self.s + y
    self.c = (t - self.s) - y
    self.s = t

  def value (self):
    "Return the sum itself."
    return self.s

  def asTuple (self):
    "Return both the sum and the error term."
    return (self.s, self.c)

The accumulated error in s from floating-point calculations is at 
most one unit in the last place of each summand x, and the value of
c gives you a tighter bound on that error.  With not entirely 
horrible amounts of extra work, you can have however much precision 
you need[2].  Unless your numbers are horribly scaled (perhaps the
debt and GNPs of the entire globe along with a person's), this in 
double plus rounding the stored answer to float works wonders for 
financial accumulations.

You can find an explanation of how this works either in [2] or [3].  
Essentially, the self.c term serves as a small correction.  The
trick with (t - self.s) - y first zeros the high-order bits that
are shared between the old and new sum, then zeros the bits shared
with the pre-corrected x, giving a new correction.  

Jason, unsure of how to create other languages' characters from
this particular keyboard...

[1] Q20 at http://catless.ncl.ac.uk/Risks/3.41.html#subj4.1
[2] http://citeseer.nj.nec.com/priest92propertie.html
[3] http://www.validlab.com/goldberg/paper.ps
--