[Python-ideas] Fast sum() for non-numbers - why so much worries?

Andrew Barnert abarnert at yahoo.com
Wed Jul 10 22:14:20 CEST 2013


On Jul 10, 2013, at 10:20, Steven D'Aprano <steve at pearwood.info> wrote:

> On 11/07/13 02:50, Ethan Furman wrote:
> 
>> Currently, sum() does not modify its arguments.
>> 
>> You (or whoever) are suggesting that it should modify one of them.
>> 
>> That makes it a semantic change, and a bad one.
>> 
>> -1
> 
> 
> Actually, Sergey's suggestion is a bit more clever than that. I haven't tested his C version, but the intention of his pure-Python demo code is to make a temporary list, modify the temporary list in place for speed, and then convert to whatever type is needed. That will avoid modifying any of the arguments[1]. So credit to Sergey for avoiding that trap.

Actually, he has two versions.

The first does a + once and then a += repeatedly on the result. This solves the problem neatly (except with empty iterables, but that's trivial to fix, and I think his C code actually doesn't have that problem...). There's no overhead, it automatically falls back to __add__ if __iadd__ is missing, and the only possible semantic differences are for types that are already broken.

The second makes a list of the argument (which means copying it if it's already a list), then calls extend repeatedly on the result, then converts back. This doesn't solve the problem in many cases, does the wrong thing in many others, and always adds overhead.

And that's exactly why I think it's worth splitting into separate pieces. It's very easy for people to see problems with the second version and wrongly assume they also apply to the first (and the way he presents and argues for his ideas doesn't help).

As far as I know, nobody has yet found any problem with the first version, except for the fact that it would encourage people to use sum on lists. I don't think that's a serious problem--the docs already say not to do it--and if it's a useful optimization for any number-like types, I think it's worth having.

It's the second version, together with all of the attempts to make it fully generally for any concatenable type--or, alternatively, to argue that only builtin concatenable types matter--that I have a problem with.



More information about the Python-ideas mailing list