[Python-ideas] Fast sum() for non-numbers - why so much worries?

Joshua Landau joshua at landau.ws
Wed Jul 10 08:09:53 CEST 2013


On 9 July 2013 17:13, Steven D'Aprano <steve at pearwood.info> wrote:
> On 09/07/13 19:35, Sergey wrote:
>>
>> On Jul 5, 2013 Stefan Behnel wrote:
>>
>>> No, please. Using sum() on lists is not more than a hack that
>>> seems to be a cool idea but isn't. Seriously - what's the sum of
>>> lists? Intuitively, it makes no sense at all to say sum(lists).
>>
>>
>> It's the same as it is now. What else can you think about when you
>> see: [1, 2, 3] + [4, 5] ?
>
>
> Some of us think that using + for concatenation is an abuse of terminology,
> or at least an unfortunate choice of operator, and are wary of anything that
> encourages that terminology.
>
> Nevertheless, you are right, in Python 3 both + and sum of lists is
> well-defined. At the moment sum is defined in terms of __add__. You want to
> change it to be defined in terms of __iadd__. That is a semantic change that
> needs to be considered carefully, it is not just an optimization.

I agree it's not totally backward-compatible, but AFAICT that's only
for broken code. __iadd__ should always just be a faster, in-place
__add__ and so this change should never cause problems in properly
written code. That makes it anything but a semantic change. It's the
same way people discuss the order of __hash__ calls on updates to code
but no-one calls it a *semantics* change.

> I am uncomfortable about changing the semantics to use
> __iadd__ instead of __add__, because I expect that this will change the
> behaviour of sum() for non-builtins.

Other than broken stuff, any guesses as to what? I'm trying to think
of maybe an IO thing (directories where __add__ makes a new "directory
viewer" and __iadd__ does a "cd") but none of them actually *change*
behaviour.

> I worry about increased complexity
> making maintenance harder for no good reason. It's the "for no good reason"
> that concerns me: you could answer some of my objections if you showed:

The move to __iadd__, in my opinion, is such a trivial thing that
"maintainability" shouldn't be concerned. Overriding for multiple
types is definitely going to cause a hazard, but this is adding like 1
line to the codebase.

> - bug reports or other public complaints by people (other than you)
> complaining that sum(lists) is slow;

I don't think that is a good measure -- I've personally found cases
where "sum" looks nicer but isn't the best algorithm yet I've never
complained because 2-3 lines is really not that big a deal and it
*felt* like sum *had* to be O(n**2).

I largely don't think of sum(list_of_lists) as a nice looking
construct, but that could just be a learnt opinion and I'd never think
of "sum(list_of_lists, [])" as counterintuitive. I might think "OMG
INEFFICIENCY" for a long time coming, but I find it so hard to agree
with those of you who say it doesn't make sense.

I also think that holding back potential cases where __iadd__ is
better (which is every __iadd__) because you think a fast
"sum(list_of_lists, [])" would encourage that construct is a bit
silly. Just say "that's not the best way to do it because it's not
generic enough¹ whereas chain.from_iterables is" if you really feel
that way. This is especially true if others agree that
"chain.from_iterables" is deserving of __builtins__.

> Earlier in this discussion, you posted benchmarks for the patched sum using
> Python 2.7. Would you be willing to do it again for 3.3? And confirm that
> the Python test suite continues to pass?

Seconded.

¹ Doesn't work for anything other than mutatable, addable objects


More information about the Python-ideas mailing list