# Microbenchmark: Summing over array of doubles

Yaroslav Bulatov bulatov at engr.orst.edu
Tue Aug 3 01:32:21 CEST 2004

```Duncan Booth <me at privacy.net> wrote in message news:<Xns95395FD0EF85duncanrcpcouk at 127.0.0.1>...
> Christopher T King <squirrel at WPI.EDU> wrote in
> news:Pine.LNX.4.44.0408011840050.21160-100000 at ccc4.wpi.edu:
>
> > On 1 Aug 2004, Duncan Booth wrote:
> >
> >> I just had a brief look at the code you posted. Are you not concerned
> >> about accuracy in any of your calculations? Summing a 10 million
> >> element array by simply accumulating each element into a running
> >> total is guaranteed to give a lousy result.
> >
> > Lousy or not, I believe that's how numarray is implemented internally,
> > so at least all the benchmarks are the same.  If you want accuracy
> > summing that many numbers, you're going to have to do it in software
> > (likely by summing each mantissa bit individually and reconstructing
> > the float afterward), so it will be abysmally slow (obviously not what
> > the OP wants).
> >
>
> My point being that speed isn't everything. Most applications doing large
> floating point calculations should be concerned about accuracy, and how not
> to add up a large array of floating point numbers was one of the first
> things I was taught in computer science classes. The fastest way to sum 10
> million numbers (albeit at some considerable loss of accuracy):
>
>     	return 0
>
>
> The 'correct' way to sum a large array of floats is, of course, to
> calculate a lot of partial sums and sum them. For example, naively, we
> might say that to sum the array we sum the first half, sum the second half
> and add the results together. This definition being recursive ensures that
> if the inputs are all of a similar size then all the intermediate
> calculations involve summing similarly sized values. A more sophisticated
> technique might also try to handle the case where not all values are a
> similar size.
>
> If you are worried about speed, then calculating partial sums is also the
> way to go: the naive technique used by the original poster leaves no scope
> for parallel calculation. It should be possible to write a slightly more
> complex loop in the C version that runs a lot faster on systems that are
> capable of performing multiple floating point instructions in parallel.

You are right, naive summing generates significant accuracy losses.
I estimated the error by summing [0.123456789012345]*1000000 and
comparing it to
1234567.89012345. All methods have error about 1e-4 .

The method below sums the array at the same speed as regular Python
sum loop, but reports error < 1e-15 .

def sum2(arr):
size = len(arr)
for itr in range(int(ceil(log(size)/log(2)))):
for i in xrange(0, size, 2**(itr+1)):
next_i = i+2**itr
if next_i<size:
arr[i]+=arr[next_i]
return arr[0]

When arbitrary precision numbers are being used, this method has
performance O(nd) vs. regular summing O(n(d+log d)) where n is size of
array, d is number of digits of the elements. In practice I got about
20% performance increase when summing an array of 1000 Decimals using
method above.

A better estimate of error difference would use random digits as
opposed to [x]*10000000, but I don't know how I would calculate exact
answer in any reasonable amount of time. (using Decimals takes over a
second for 4000 array (with conversion))

Yaroslav

```