[Python-ideas] All ideas together: Fast sum() for non-numbers

Steven D'Aprano steve at pearwood.info
Fri Jul 5 10:36:20 CEST 2013


On 05/07/13 17:59, Paul Moore wrote:
> On 5 July 2013 08:41, Sergey <sergemp at mail.ru> wrote:
>
>> 2. Fast sum() for strings. [3]
>>     This patch is a small enhancement of #1, that makes sum() O(N) for
>>     strings. Obviously it introduces a behavior change and I do NOT
>>     suggest it for inclusion. I believe that ''.join() alternative is
>>     simple enough and don't have to be replaces with sum() for normal
>>     use cases. It is just to shows that sum() can be O(N) for strings.
>>
>
> Why does this "obviously" introduce a behaviour change? If it is just a
> performance improvement, that's not a behaviour change.

Have you tried summing strings?


py> sum(['a', 'b'], '')
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
TypeError: sum() can't sum strings [use ''.join(seq) instead]



> It's hard to see why there would be any argument over *pure* performance
> improvements. But behaviour changes, especially ones which are "needed" to
> get performance benefits, are a different matter.

Even pure performance improvements don't come for free. Adding all these extra special cases to sum increases the complexity of the code. More complex code, more documentation, more tests means more code to contain bugs, more documentation to be wrong, more tests which can fail. Simplicity is a virtue in and of itself. What benefit does this extra complexity give you?

If you *only* look at the advantages, then *every* optimization looks like an obvious win. And if you *only* look at the risks, then every new piece of code looks like a problem to be avoided. The trick is to balance the two, and that relies on some sense of how often you will receive the benefit versus how often you pay the cost.

Sergey's suggestion also has a more subtle behavioural change. In order to guarantee that sum is fast, i.e. O(N) instead of O(N**2), the semantics of sum have to change from "support anything which supports __add__" to "support anything which supports fast __iadd__".

I don't believe that optimizing sum for non-numbers gives enough benefit to make up for the increased complexity. Who, apart from Sergey, uses sum() on large numbers of lists or tuples? Both have had quadratic behaviour for a decade, and I've never known anyone who noticed or cared. So it seems to me that optimizing these special cases will have very little benefit.

I've already made it clear that I'm -0 on this: on balance, I believe the disadvantage of more complex code just slightly outweighs the benefit. But Sergey doesn't have to convince *me*, he just has to convince those who will accept or reject the patch.



-- 
Steven


More information about the Python-ideas mailing list