On Wed, Aug 13, 2014 at 9:37 AM, Steven D'Aprano <steve@pearwood.info> wrote:
In the statistics module, I have a private _sum() function which tried
really hard to deal with high-accuracy sums of mixed arbitrary numeric
types without compromising too badly on speed, and it's much harder than
it seems. Trying to handle non-numeric types too increases the
complexity significantly. If you're smarter than me (I expect that many
of you probably are) and believe that you can write a version of sum()
which:

(1) is fast
(2) has better than O(N**2) performance
(3) is correct
(4) is accurate
(5) handles INFs and NANs (for those types which have them)
(6) handles mixed types (for those types which allow mixing)
(7) honours subclasses with custom __add__ and __radd__ methods
(8) and keeps the expected semantics that sum() is like repeated
    addition (or concatenation)

and does so for *both* numeric and non-numeric cases (like strings,
bytes, tuples, lists), then PLEASE write some code and publish it. I for
one would love to see it or use it for the statistics module. But until
you have tried writing such a thing, whether in C or Python, I think
you're probably underestimating how hard it is and how fragile the
result will be.

The basic form I would use for an updated sum, which would often, but not always, perform better would be something like (preferably, written in C, not pure Python):

def sum(items, start=0):
    value = copy.copy(start) # Make sure that start is not mutated.
    for item in items:
        value += item
    return value

To avoid needing to access copy.copy, something like the following would also work, but with a bit more local complexity:

def sum(items, start=0):
    count = len(items)
    if count > 0:
        value = start + items[0] # Make sure not to mutate start.
    else
        return start

    for i in range(1, count):
        value += items[i]
    return value

Either of these would likely perform better when summing items which support an optimized __iadd__, which allows many types to perform better. For those which do not support __iadd__, it would fallback to the slower __add__ but still function. Note that the first version may be slower than existing some in some cases, namely if copy.copy(start) is very slow for the type. The second case should never be slower than the existing sum, presuming __iadd__ is not implemented in a slower manner than __add__.

Due to the string add optimization CPython, this should make strings sum better than O(N**2), many custom types will also sum faster, and, for all well-behaved types, should produce the same result as the existing sum.

Chris