[Python-Dev] sum(...) limitation

Chris Barker chris.barker at noaa.gov
Tue Aug 12 21:11:35 CEST 2014

On Mon, Aug 11, 2014 at 11:07 PM, Stephen J. Turnbull <stephen at xemacs.org>

> I'm referring to removing the unnecessary information that there's a
>  better way to do it, and simply raising an error (as in Python 3.2,
> say) which is all a RealProgrammer[tm] should ever need!

I can't imagine anyone is suggesting that -- disallow it, but don't tell
anyone why?

The only thing that is remotely on the table here is:

1) remove the special case for strings -- buyer beware -- but consistent
and less "ugly"

2) add a special case for strings that is fast and efficient -- may be as
simple as calling "".join() under the hood --no more code than the
exception check.

And I doubt anyone really is pushing for anything but (2)

Steven Turnbull wrote:

>   IMO we'd also want a homogeneous_iterable ABC

Actually, I've thought for years that that would open the door to a lot of
optimizations -- but that's a much broader question that sum(). I even
brought it up probably over ten years ago -- but no one was the least bit
iinterested -- nor are they now -- I now this was a rhetorical suggestion
to make the point about what not to do....

  Because obviously we'd want the
> attractive nuisance of "if you have __add__, there's a default
> definition of __sum__"

now I'm confused -- isn't that exactly what we have now?

It's possible that Python could provide some kind of feature that
> would allow an optimized sum function for every type that has __add__,
> but I think this will take a lot of thinking.

does it need to be every type? As it is the common ones work fine already
except for strings -- so if we add an optimized string sum() then we're

 *Somebody* will do it
> (I don't think anybody is +1 on restricting sum() to a subset of types
> with __add__).

uhm, that's exactly what we have now -- you can use sum() with anything
that has an __add__, except strings. Ns by that logic, if we thought there
were other inefficient use cases, we'd restrict those too.

But users can always define their own classes that have a __sum__ and are
really inefficient -- so unless sum() becomes just for a certain subset of
built-in types -- does anyone want that? Then we are back to the current

sum() can be used for any type that has an __add__ defined.

But naive users are likely to try it with strings, and that's bad, so we
want to prevent that, and have a special case check for strings.

What I fail to see is why it's better to raise an exception and point users
to a better way, than to simply provide an optimization so that it's a mute

The only justification offered here is that will teach people that summing
strings (and some other objects?) is order(N^2) and a bad idea. But:

a) Python's primary purpose is practical, not pedagogical (not that it
isn't great for that)

b) I doubt any naive users learn anything other than "I can't use sum() for
strings, I should use "".join()". Will they make the leap to "I shouldn't
use string concatenation in a loop, either"? Oh, wait, you can use string
concatenation in a loop -- that's been optimized. So will they learn: "some
types of object shave poor performance with repeated concatenation and
shouldn't be used with sum(). So If I write such a class, and want to sum
them up, I'll need to write an optimized version of that code"?

I submit that no naive user is going to get any closer to a proper
understanding of algorithmic Order behavior from this small hint. Which
leaves no reason to prefer an Exception to an optimization.

One other point: perhaps this will lead a naive user into thinking --
"sum() raises an exception if I try to use it inefficiently, so it must be
OK to use for anything that doesn't raise an exception" -- that would be a
bad lesson to mis-learn....


Armin Rigo wrote:

> It also improves a
> lot the precision of sum(list_of_floats) (though not reaching the same
> precision levels of math.fsum()).

while we are at it, having the default sum() for floats be fsum() would be
nice -- I'd rather the default was better accuracy loser performance. Folks
that really care about performance could call math.fastsum(), or really,
use numpy...

This does turn sum() into a function that does type-based dispatch, but
isn't python full of those already? do something special for the types you
know about, call the generic dunder method for the rest.


Christopher Barker, Ph.D.

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20140812/8779b69b/attachment.html>

More information about the Python-Dev mailing list