Feature suggestion: sum() ought to use a compensated summation algorithm
Szabolcs Horvát
szhorvat at gmail.com
Mon May 5 03:31:14 EDT 2008
Duncan Booth wrote:
> Szabolcs Horvát <szhorvat at gmail.com> wrote:
>
>> I thought that it would be very nice if the built-in sum() function used
>> this algorithm by default. Has this been brought up before? Would this
>> have any disadvantages (apart from a slight performance impact, but
>> Python is a high-level language anyway ...)?
>
> There's a thread you might find interesting:
>
> http://groups.google.com/group/comp.lang.python/browse_thread/thread/9eda29faf92f532e/027cef7d4429aa3a
>
> In that thread I suggested that Python ought to implement sum by adding
> together each pair of values, then each pair of results and so on. This
> means that for input values of a similar magnitude the intermediate results
> stay balanced. The implementation doesn't actually do all the first level
> sums first, it builds up a stack containing the sum of 2^k, 2^(k-1)...2
> values. Also it works for other types as well, and for strings or lists has
> the advantage of minimising the number of large string or list concatenations.
>
> If you follow that thread far enough you'll find a post by Jussi Salmela
> which shows that summing adjacent pairs performs as well as Kahan summation
> (he says 'almost as good' but in fact the only time they get different
> results the 'correct' answer is 500000.000005 and Kahan and sumpairs get
> the two nearest representable results either side: 500000.00000499998 and
> 500000.00000500004 respectively).
>
> I never did get round to re-coding my sumpairs() function in C, I would be
> interested to see just how much overhead it introduces (and I suspect it
> would be faster than the existing sum in some cases such as for lists).
Thanks for this link! Though some of the thread is missing, it was an
interesting read.
This sounds like a very good idea: it is more generally applicable than
the Kahan summation because it does not require subtraction/negation, so
it would work with user-defined types, too.
While now I am convinced that using Kahan summation by default is not
such a good idea, I cannot see any serious disadvantages/problems with
pairwise summation!
More information about the Python-list
mailing list