[Python-ideas] All ideas together: Fast sum() for non-numbers

Paul Moore p.f.moore at gmail.com
Fri Jul 5 11:20:47 CEST 2013


On 5 July 2013 09:36, Steven D'Aprano <steve at pearwood.info> wrote:

> On 05/07/13 17:59, Paul Moore wrote:
>
>> On 5 July 2013 08:41, Sergey <sergemp at mail.ru> wrote:
>>
>>  2. Fast sum() for strings. [3]
>>>     This patch is a small enhancement of #1, that makes sum() O(N) for
>>>     strings. Obviously it introduces a behavior change and I do NOT
>>>     suggest it for inclusion. I believe that ''.join() alternative is
>>>     simple enough and don't have to be replaces with sum() for normal
>>>     use cases. It is just to shows that sum() can be O(N) for strings.
>>>
>>>
>> Why does this "obviously" introduce a behaviour change? If it is just a
>> performance improvement, that's not a behaviour change.
>>
>
> Have you tried summing strings?
>
>
> py> sum(['a', 'b'], '')
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> TypeError: sum() can't sum strings [use ''.join(seq) instead]


Ah, yes. I'd forgotten there was an explicit rejection of strings, rather
than just the obvious performance trap.

It's hard to see why there would be any argument over *pure* performance
>> improvements. But behaviour changes, especially ones which are "needed" to
>> get performance benefits, are a different matter.
>>
>
> Even pure performance improvements don't come for free. Adding all these
> extra special cases to sum increases the complexity of the code. More
> complex code, more documentation, more tests means more code to contain
> bugs, more documentation to be wrong, more tests which can fail. Simplicity
> is a virtue in and of itself. What benefit does this extra complexity give
> you?
>

Good point - that was to an extent what I was trying to express by
emphasizing *pure* - that very few things in practice are unqualified
improvements. I didn't put it well, though.

The history of all this is somewhere in the archives. I recall the original
discussion (although not the details). The explicit rejection of str is
clearly deliberate, and implemented by people who certainly knew how to
special-case str to use ''.join(), so has Sergey researched the original
discussion and explained why his proposal invalidates the conclusions
reached at the time? (Again, I apologise for jumping in late - if this is
all old news, I'll shut up :-))


> If you *only* look at the advantages, then *every* optimization looks like
> an obvious win. And if you *only* look at the risks, then every new piece
> of code looks like a problem to be avoided. The trick is to balance the
> two, and that relies on some sense of how often you will receive the
> benefit versus how often you pay the cost.
>
> Sergey's suggestion also has a more subtle behavioural change. In order to
> guarantee that sum is fast, i.e. O(N) instead of O(N**2), the semantics of
> sum have to change from "support anything which supports __add__" to
> "support anything which supports fast __iadd__".
>

Whoa. That's a non-trivial change. I thought the proposal was "guarantee
O(N) for types with a constant-time __iadd__ and don't impact performance
for other types". But that of course begs the question of what happens in
pathological cases where __iadd__ is (say) O(N^3). There's no way to
programatically detect whether __iadd__ or __add__ gives the best
performance, so what I had thought of course cannot be the case :-(


> I don't believe that optimizing sum for non-numbers gives enough benefit
> to make up for the increased complexity. Who, apart from Sergey, uses sum()
> on large numbers of lists or tuples? Both have had quadratic behaviour for
> a decade, and I've never known anyone who noticed or cared. So it seems to
> me that optimizing these special cases will have very little benefit.
>
> I've already made it clear that I'm -0 on this: on balance, I believe the
> disadvantage of more complex code just slightly outweighs the benefit. But
> Sergey doesn't have to convince *me*, he just has to convince those who
> will accept or reject the patch.


Your points are good ones. Personally, I have no need for this change -
like you, I never use sum() on anything other than numbers.

Given the subtle implications of the change, and the fact that it
explicitly reverses a deliberate decision by the core devs, I think this
proposal would require a PEP to have any chance of being accepted.

Paul.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130705/45592ed5/attachment.html>


More information about the Python-ideas mailing list