[Python-ideas] All ideas together: Fast sum() for non-numbers
Haoyi Li
haoyi.sg at gmail.com
Fri Jul 5 11:28:35 CEST 2013
Is there a better way to flatten lists than [item for sublist in l for item
in sublist]? I naturally don't use sum() for much, but if i could,
sum(my_list) looks much better than [item for sublist in my_list for item
in sublist]. Looking at
http://stackoverflow.com/questions/952914/making-a-flat-list-out-of-list-of-lists-in-python,
all the alternatives seem equally obtuse and verbose. I have a lot of list
flattenings I would love to use sum(my_list) for if I had the chance.
On Fri, Jul 5, 2013 at 5:20 PM, Paul Moore <p.f.moore at gmail.com> wrote:
> On 5 July 2013 09:36, Steven D'Aprano <steve at pearwood.info> wrote:
>
>> On 05/07/13 17:59, Paul Moore wrote:
>>
>>> On 5 July 2013 08:41, Sergey <sergemp at mail.ru> wrote:
>>>
>>> 2. Fast sum() for strings. [3]
>>>> This patch is a small enhancement of #1, that makes sum() O(N) for
>>>> strings. Obviously it introduces a behavior change and I do NOT
>>>> suggest it for inclusion. I believe that ''.join() alternative is
>>>> simple enough and don't have to be replaces with sum() for normal
>>>> use cases. It is just to shows that sum() can be O(N) for strings.
>>>>
>>>>
>>> Why does this "obviously" introduce a behaviour change? If it is just a
>>> performance improvement, that's not a behaviour change.
>>>
>>
>> Have you tried summing strings?
>>
>>
>> py> sum(['a', 'b'], '')
>> Traceback (most recent call last):
>> File "<stdin>", line 1, in <module>
>> TypeError: sum() can't sum strings [use ''.join(seq) instead]
>
>
> Ah, yes. I'd forgotten there was an explicit rejection of strings, rather
> than just the obvious performance trap.
>
> It's hard to see why there would be any argument over *pure* performance
>>> improvements. But behaviour changes, especially ones which are "needed"
>>> to
>>> get performance benefits, are a different matter.
>>>
>>
>> Even pure performance improvements don't come for free. Adding all these
>> extra special cases to sum increases the complexity of the code. More
>> complex code, more documentation, more tests means more code to contain
>> bugs, more documentation to be wrong, more tests which can fail. Simplicity
>> is a virtue in and of itself. What benefit does this extra complexity give
>> you?
>>
>
> Good point - that was to an extent what I was trying to express by
> emphasizing *pure* - that very few things in practice are unqualified
> improvements. I didn't put it well, though.
>
> The history of all this is somewhere in the archives. I recall the
> original discussion (although not the details). The explicit rejection of
> str is clearly deliberate, and implemented by people who certainly knew how
> to special-case str to use ''.join(), so has Sergey researched the original
> discussion and explained why his proposal invalidates the conclusions
> reached at the time? (Again, I apologise for jumping in late - if this is
> all old news, I'll shut up :-))
>
>
>> If you *only* look at the advantages, then *every* optimization looks
>> like an obvious win. And if you *only* look at the risks, then every new
>> piece of code looks like a problem to be avoided. The trick is to balance
>> the two, and that relies on some sense of how often you will receive the
>> benefit versus how often you pay the cost.
>>
>> Sergey's suggestion also has a more subtle behavioural change. In order
>> to guarantee that sum is fast, i.e. O(N) instead of O(N**2), the semantics
>> of sum have to change from "support anything which supports __add__" to
>> "support anything which supports fast __iadd__".
>>
>
> Whoa. That's a non-trivial change. I thought the proposal was "guarantee
> O(N) for types with a constant-time __iadd__ and don't impact performance
> for other types". But that of course begs the question of what happens in
> pathological cases where __iadd__ is (say) O(N^3). There's no way to
> programatically detect whether __iadd__ or __add__ gives the best
> performance, so what I had thought of course cannot be the case :-(
>
>
>> I don't believe that optimizing sum for non-numbers gives enough benefit
>> to make up for the increased complexity. Who, apart from Sergey, uses sum()
>> on large numbers of lists or tuples? Both have had quadratic behaviour for
>> a decade, and I've never known anyone who noticed or cared. So it seems to
>> me that optimizing these special cases will have very little benefit.
>>
>> I've already made it clear that I'm -0 on this: on balance, I believe the
>> disadvantage of more complex code just slightly outweighs the benefit. But
>> Sergey doesn't have to convince *me*, he just has to convince those who
>> will accept or reject the patch.
>
>
> Your points are good ones. Personally, I have no need for this change -
> like you, I never use sum() on anything other than numbers.
>
> Given the subtle implications of the change, and the fact that it
> explicitly reverses a deliberate decision by the core devs, I think this
> proposal would require a PEP to have any chance of being accepted.
>
> Paul.
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130705/56454440/attachment-0001.html>
More information about the Python-ideas
mailing list