[Python-ideas] Fast sum() for non-numbers - why so much worries?

Andrew Barnert abarnert at yahoo.com
Fri Jul 12 09:43:34 CEST 2013


From: Sergey <sergemp at mail.ru>

Sent: Thursday, July 11, 2013 10:16 PM


> On Jul 10, 2013 Andrew Barnert wrote:
> 
>>  But you haven't found a workable solution for all of the other
>>  types people have brought up--or even for a single one of them. So
>>  that's a serious misrepresentation.
> 
> Strings, tuples (2 options), lists (3 options) and your cons-lists
> (2 options). What other types have I missed?


Again, you have not actually made cons lists work; you'd made different types work (deques, C++-style double-linked lists, etc.), and argued that those are "better types" or something.

And then there's the whole class of immutable collection types besides tuple. You've suggested that we can patch sum for each one, and that you could theoretically rewrite every immutable collection type in the world to be fast-summable, but neither of those is even remotely feasible as a solution.

Those aren't the only cases that have been brought up, but it hardly matters. If you can't answer any of the cases anyone brings up, why should anyone bother giving you new ones?

>>>  It's just instead of discussing what is the best way to fix a 
> slowness,
>>>  I'm spending most time trying to convince people that slowness 
> should
>>>  be fixed.
>>>  — sum is slow for lists, let's fix that!
>>>  — you shouldn't use sum...
>>>  — why can't I use sum?
>>>  — because it's slow
>> 
>>  But that's not why you shouldn't use sum.
> 
> No, That's the MAIN reason!

No, it really isn't. I and others have given multiple reasons. Let me try to put them all together in one place, roughly in order of importance:

1. If sum is the right way to concatenate sequences, it's ambiguous and confusing for types that are both addable and iterable, like numpy.array and other vectors. (The fact that you proposed a solution that would cause sum to flatten arrays is proof that this confusion is not just possible, but likely.)


2. The obvious way to concatenate sequences should work on iterators and other iterables; sum can't.

3. The word "sum" doesn't mean concatenating a bunch of sequences together. This isn't as much of a problem for +, because + isn't a word—people read it as "add", "plus", "and", then", etc. in different contexts, but they read "sum" as "sum".

4. Iterator interfaces are better in a variety of ways—that's why 3.x map, zip, etc. return iterators rather than lists.

5. The right way to concatenate sequences should have consistent performance characteristics; it's impossible for sum to be O(N) for all sequence types.

So, that's 4 reasons that sum cannot be the right obvious way to concatenate sequences that are more important than performance. And even reason 5 is more about consistency than performance: if people learn that sum is fast, it had better be fast for their types; if it's not going to be fast for their types, we'd better not promise that it will be.

And again, we already have an alternative that has none of those problems. Chain is not confusing for vectors, it works on all iterables, it clearly says what it means and means what it says, it provides an iterator interface, and it's O(N) for all iterables. (I'm ignoring the chain-vs.-chain.from_iterable issue, which is being hashed out in another thread, because it's not relevant here.)

> There's one more, about "+" (and 
> sum)
> being confusing for concatenation, and I can understand that, however
> it's very weak as long "+" is used for joining sequences. But the
> main reason that most people, including you, are pushing is that
> sum() is slow. The whole our "you can't make it fast for" thread
> is the evidence.

No, the problem is that _you_ keep insisting that the only problem with sum is that it's not fast for all types, you ignore everyone who offers others reasons, so people engage you on the one thing you'll talk about.

>>  Besides, being fast for list and tuple but slow for other

>>  collection types would be an attractive nuisance.
> 
> You're pushing "being slow" as a key argument. 

No, I'm pushing the being _inconsistent_ as a key argument, as I explained above, and not even as the most important one.

Also, the very first word in that sentence is "Besides"; and you ignored the real point before the aside—and you had to go back at least 5 messages to find one you could misrepresent in that way.

> If due to some
> technical details sum() was fast from the very beginning then nobody
> would say that you shouldn't use it.

Nobody? How about the person who created the sum function in the first place? You've already been given the quote. The only reason you can misuse sum for sequences at all is that he couldn't figure out a good way to prevent people from making that mistake.

> Same as nobody is saying that
> you should use reduce() instead of sum() for numbers because sum()
> works only for __add__ while reduce works for every operator possible.

Actually, those are very nearly the opposite. The reason to use sum instead of reduce for adding numbers is that sum is a specific, fit-for-purpose function that does exactly what you want to do and says exactly what it's doing, while reduce is an overly-general function that makes it unclear what you're trying to do. And that exact same reason means you should use chain instead of sum for concatenating sequences.

> min() is good for list of tuples, right? max() is also good for list
> of tuples? Even any() is good for list of tuples (weird, but good).
> Then why sum() is bad for list of tuples? Because it's SLOW!

No. Finding the smallest tuple in a list is an obviously sensible idea that should obviously be spelled min. Flattening, a bunch of tuples is an obviously sensible idea that should obviously be spelled something like "chain" or "flatten" or "concatenate", not "sum".

>>  Your only response to that has been to claim that it can be fast
>>  for every possible collection type, but it can't. You haven't
>>  gotten it to work. And there are good reasons to believe it's not
>>  just hard, but impossible.
> 
> Sorry, but can you please quote what response are you referencing?

Picking one of your emails at random:

> Despite we CAN make it faster. For all cases, or for most
> of them,

Here's another:

>> So, if you're suggesting that sum can be fast for anything
>> reasonable, you're just wrong.

> I suggested two ways how to do that. First, one can use the approach
> above, i.e. use mutable type internally. Second, for internal cpython
> types we have a simpler option to implement optimization directly
> in sum(). And there may be many others, specific to the types in
> question.

And so on. About half your messages have some form or other of the assertion that sum can be fast for all sequence types. Half of them also have you claiming that you already solved some type (by replacing it with another type), and the other half have you denying you ever claimed such a thing.

It's like talking to http://en.wikipedia.org/wiki/Nathan_Thurm. "I never said that. Who told you I said that? It's so funny you would think that."

> PS: maybe for 10 years people got used to the slow sum() and related
> arguments so much that they don't want to lose them? ;)


More Nathan Thurm: "I'm not being defensive. You're the one who's being defensive. Do you not want children to have toys? Did you not have toys growing up? It's so funny to me that anyone would want to deprive children of toys."


More information about the Python-ideas mailing list