[Numpy-discussion] Reducing the rounding error of np.sum
jtaylor.debian at googlemail.com
Thu Jul 4 14:15:12 EDT 2013
numpys implementation of sum is just a simple:
for i in d:
sum += d[i]
this suffers from rather high rounding errors in the order of the d.size
* epsilon. Consider:
(np.ones(50000) / 10.).sum()
There are numerous algorithms which reduce the error of this operation.
E.g. python implements one in math.fsum which is accurate but slow
compared to np.sum .
Numpy is currently lacking a precise summation, but I think it would
make sense to add one.
The question is whether we go with the python approach of adding a new
function which is slower but more precise or if we even change the
default summation algorithm (or do nothing :) ).
For a new function I guess the method used in python itself makes sense,
its probably well chosen by the python developers (though I did not
lookup the rational for the choice yet).
For replacing the default, two algorithms come to my mind, pairwise
summation  and kahan summation (compensated sum) .
pairwise summation adds in pairs so usually the magnitude of the two
operands is the same magnitude, this produces an error of O(log n *
epsilon) for the common case.
This algorithm has the advantage that it is almost as fast as the naive
sum with an reasonable error.
Problematic might be the buffering numpy does when reducing, this would
limit the error reduction to the buffer size.
kahan summation adds some extra operations to recover the rounding
errors. This results in an error of o(epsilon).
It is four times slower than the naive summation but this can be
(partially) compensated by vectorizing it.
It has the advantage of lower error, simpler implementation and
buffering does not interfere.
I did prototype implementations (only unit strides) of both in [3, 4].
Any thoughts on this?
More information about the NumPy-Discussion