[Python-ideas] Fast sum() for non-numbers

Steven D'Aprano steve at pearwood.info
Tue Jul 16 09:22:38 CEST 2013


On Tue, Jul 16, 2013 at 08:36:05AM +0300, Sergey wrote:

> Anyway, I've got your point. You want sum() to be O(N) for numbers
> and some rare/custom containers, but want it to stay O(N*N) for
> the most popular container types for some reasons, that are too
> hard for me to understand.

Right now, sum() behaves in a certain way. There are certain things 
which people expect sum() to do. Some of those things are documented 
explicitly. Some of them are implied. Some of them have regression 
tests. Some of them don't. But regardless, we can tell how sum() behaves 
right now by running it and seeing what it does.

Your suggested optimizations change that behaviour. It does not just 
speed sum() up, they lead to an actual semantic change. So we are not 
just arguing about speed, we are arguing about behaviour as well.

You are worried about sum() being slow for people who call it with list 
arguments. That is a valid concern. Nobody here *wants* sum() to be 
slow. If it was a pure speed optimization, then we would all be 100% 
behind it. But it is not a pure speed optimization, it also changes 
behaviour, sometimes in subtle, hard to see ways.

So there are three approaches we can take:

- Do nothing. sum() continues to work exactly the same way as it 
  currently works, even if that means sometimes it is slow.

- Accept your patches. sum() changes its behaviour, which will break 
  somebody's working code, but it will be fast, at least for some 
  objects.

- Accept a compromise position. We can make sum() faster for built-in 
  lists, and maybe built-in tuples, while keeping the same behaviour. 
  Everything else, including subclasses of list and tuple, keep the 
  same behaviour, which may mean it remains slow.

They are the only choices.

You are concerned more about sum() being slow than you are about 
breaking code that today works. Some of us here disagree, and think that 
breaking code is worse than slow code, especially for something as 
uncommon as sum(list_of_lists).

It's not that we want sum() to be slow. But if we have a choice between 
accepting your patch:

# this code works now, but your patch will break it
result = sum(list_of_objects)

and rejecting it:

# this code works now, but is slow, and will remain slow
result = sum(list_of_lists)

I believe that the decision is simple. Breaking code that works now for 
a mere optimization is unacceptable.

But, a compromise patch that speeds up some code without breaking any 
code may be acceptable.


> Same applies to sum(): even if it's impossible to make if fast for
> all collection types, it does not mean that it should not be fast for
> some of them, e.g. lists and tuples.

That is change from your previous posts where you said you could make it 
fast for "everything". I am glad to see you have accepted this.



-- 
Steven


More information about the Python-ideas mailing list