[Python-ideas] Fast sum summary [was Re: Fast sum() for non-numbers - why so much worries?]

Fri Jul 12 02:52:56 CEST 2013

This started out as a response to Ronald, but turned into a summary and pseudo-code implementation, so I've changed the subject line.

On 11/07/13 21:15, Ronald Oussoren wrote:
[...]
> That, and Paul's example about using += for something else than addition,
> pretty much kills the idea of using += in the implementation of sum() as that
> would break to much code.

Not quite :-)

After spending so much time explaining why I don't think sum *should* optimize the list case, I'm now going to suggest a way which (I think) sum *could* optimize the list case without changing behaviour.

[puts on Devil's Advocate cap]

What we have to give up is Sergey's hope to make sum fast for "everything". I don't think that is possible, but even if it is, we should not let the Perfect be the enemy of the Good. Can we make sum fast for lists, without changing behaviour? I think we can.

We know that list.__iadd__ is a fast, inplace version of __add__. We can't make that assumption about every class, not even list subclasses, but we don't have to speed up every class. Let's not be greedy: incremental improvements are better than no improvements.

sum already has special cases for ints and floats and strings (the string case is to just prohibit the use of sum). We could add a special case for lists: if the start parameter is an actual built-in list, not a subclass (because subclasses might override __iadd__) then we start with an accumulator list, and call __iadd__ (or extend) on that list to concatenate on it. As soon as we find an item which is not a built-in list, we drop out of the fast-code and fall back to the normal non-fast path. This doesn't make it "fast for everything", but it's still an optimization.

Here's some pseudo-code, modified from the code Sergey posted earlier:

# untested
def sum(values, start=0):
     values = iter(values)
     [... skip special cases for int, string, float, etc.]
     elif type(start) is list:
         result = []
         result += start
         for value in values:
             if type(value) is not list:
                 start = result + value
                 break
             result += value
         else:  # No break.
             return result
     # If we get here, fall back to the non-fast path.
     result = start
     for value in values:
         result = result + value
     return result

(Credit to Sergey for coming up with the original code.)

Since we cannot *guarantee* that sum is fast for all classes, we should not promise what we can't guarantee. Better to deliver more than you promise, than to make grandiose promises of "fast for everything" that you cannot live up to.

Arguments in favour of this suggestion:

- The simple use-case of summing a list of lists will speed up.

- The behaviour of sum doesn't change (except for speed):

   * sum still returns a new list, it doesn't modify in place;

   * officially, sum still relies on __add__ only, not __iadd__.

Arguments against:

- Somebody has to write this code, and debug it, and maintain it. It won't be me.

- Optimized code is complex code, and perhaps this extra complication is too much complication. After all, the Zen says "Simple is better than complex, complex is better than complicated."

- If sum of lists is fast, it will lull people into a false sense of security. sum remains an attractive nuisance, since you cannot rely on it being fast. As soon as you add one list subclass, it falls back to the slow code again.

- But on the other hand, that's not much worse than the situation now. People are already lulled into a false sense of security, because "Everything is fast for small enough N".

Further questions:

1) Can we be less conservative about subclasses? E.g. if a subclass does not override __add__ or __iadd__, can we safely use the fast path, and only fall back on the slow path for those that override?

- In favour: that would make sum more useful, rather than less.

- Against: lots more complications...

2) Sergey wants to do something similar for tuples, using a temporary list accumulator. I've raised the possibility that if the accumulated list is huge enough, you might get a MemoryError trying to copy it back to a tuple. Does anyone care about this hypothetical case? If not, then we could extend the optimization to tuples (at least those that don't override __add__).

3) And the critical question: with this (theoretical) patch in place, do the test suites still pass? I asked Sergey this same question about his patch some days ago, but haven't seen an answer.

-- 
Steven