I'm going to cut straight to the chase here because this thread, and its related ones, on Python-Dev are giving me a headache and overloading my inbox. So I'm going to make a probably futile :-( attempt to cut off yet another huge thread before it starts by explaining why I think sum() ought to stay exactly as it is. Built-in sum() is already quite complex. It has a fast path and a slow path, and that's just for numbers. While it's tempting to imagine sum() being even more clever and able to handle more cases, that increases the complexity and makes it more likely to end up slower rather than faster, or buggy, or both. Better to let the caller choose a specialist function (like numpy.array.sum, or math.fsum, or str.join) that handles the caller's specialist needs, than to try to make sum() master of everything. The more special cases sum() has, the more the pressure to add even more. In the statistics module, I have a private _sum() function which tried really hard to deal with high-accuracy sums of mixed arbitrary numeric types without compromising too badly on speed, and it's much harder than it seems. Trying to handle non-numeric types too increases the complexity significantly. If you're smarter than me (I expect that many of you probably are) and believe that you can write a version of sum() which: (1) is fast (2) has better than O(N**2) performance (3) is correct (4) is accurate (5) handles INFs and NANs (for those types which have them) (6) handles mixed types (for those types which allow mixing) (7) honours subclasses with custom __add__ and __radd__ methods (8) and keeps the expected semantics that sum() is like repeated addition (or concatenation) and does so for *both* numeric and non-numeric cases (like strings, bytes, tuples, lists), then PLEASE write some code and publish it. I for one would love to see it or use it for the statistics module. But until you have tried writing such a thing, whether in C or Python, I think you're probably underestimating how hard it is and how fragile the result will be. So, a plea: please stop trying to overloaded poor ol' built-in sum. sum() is *not* the One Obvious Way to add arbitrary objects in every domain. sum() is intended for simple cases of adding numbers, it is not intended as a specialist summation function for everything under the sun that can be added or concatenated. A bit of history, as I remember it: sum() exists because for half of Python's lifetime, people were regularly defining this: def sum(numbers): return reduce(lambda a, b: a+b, numbers) so they could easy add up a bunch of numbers: num_pages = sum([ch.pages() for ch in self.chapters]) sort of thing. Since this was a common need, it was decided to add it to the built-ins. But sum() was never intended to replace str.join or list.extend, let alone even more exotic cases. Built-in sum is aimed at sequences of numbers, not strings, lists, tuples, or Widgets for that matter. Perhaps giving it a start parameter was a mistake, but it is there and backwards compatibility says it isn't going to be removed. But that doesn't mean that the use of sum() on arbitrary types ought to be *encouraged*, even if it is *allowed*. Conceptually, sum() is intended to behave like: for value in sequence: start = start + value That means calling custom __add__ or __radd__ methods if they exist. It also means that sum() cannot delegate to (say) str.join() without changing the semantics. Given: class Special(str): def __radd__(self, other): print("I'm special!") return other + str(self) s = Special('y') the sum 'x' + s is *not* the same as ''.join(['x', s]). A similar argument applies to list.extend, and the source code in bltinmodule.c already makes that point. Replying to a couple of points from Stephen: On Wed, Aug 13, 2014 at 03:21:42PM +0900, Stephen J. Turnbull wrote:
sum() can be used for any type that has an __add__ defined.
I'd like to see that be mutable types with __iadd__.
Surely you don't mean that. That would mean that sum([1, 2, 3]) would no longer work, since ints are not mutable types with __iadd__. [...]
Summing tuples works (with appropriate start=tuple()). Haven't benchmarked, but I bet that's O(N^2).
Correct: increasing the number of tuples being added by a factor of 10 requires almost a factor of 100 more time: py> t = tuple([(i,) for i in range(1000)]) py> with Stopwatch(): ... _ = sum(t, ()) ... time taken: 0.003805 seconds py> t *= 10 py> with Stopwatch(): ... _ = sum(t, ()) ... time taken: 0.230225 seconds py> t *= 10 py> with Stopwatch(): ... _ = sum(t, ()) ... time taken: 32.206471 seconds
My argument is that in practical use sum() is a bad idea, period, until you book up on the types and applications where it *does* work. N.B. It doesn't even work properly for numbers (inaccurate for floats).
sum() works fine for its intended uses, especially: - summing ints exactly - "low precision" sums of floats I put "low precision" in scare quotes because, for many purposes, that precision is plenty high enough. For the use-case of adding together a dozen or a hundred positive floats of similar magnitude, sum() is fine. It's only advanced and high-precision uses where it falls short. To put it another way: if you want to add the mass of Jupiter to the mass of a flea, you probably want math.fsum(). If you want to add the weight of an apple to the weight of a banana, both measured on a typical kitchen scale, sum() will do the job perfectly adequately.
while we are at it, having the default sum() for floats be fsum() would be nice
How do you propose to implement that, given math.fsum is perfectly happy to sum integers?
And you really don't want sum(integers) to convert to float by default: py> from math import fsum py> fsum([10**30, 1234]) 1e+30 py> sum([10**30, 1234]) 1000000000000000000000000001234 Insisting that there ought to be one and only one way to sum up is, in my opinion, foolhardy, no matter how attractive it might seem. I believe that the right way is what Python already has: a simple sum() for simple cases, a few specialist sums (including ''.join) for the most common or important specialist cases, and leave the rest to third-party libraries or code you write yourself. That way the caller can then decide exactly what trade-offs between time, memory, convenience, accuracy and behaviour they wish, instead of invariably being surprised or disappointed by whatever trade-offs the one Über-sum() made. -- Steven