2009/12/5 Adam Olsen <rhamph@gmail.com>:
On Sat, Dec 5, 2009 at 10:23, Vitor Bosshard <algorias@gmail.com> wrote:
And in that case the special string handling could also be dropped?
sum(["a","b"], "start") Traceback (most recent call last): File "<pyshell#0>", line 1, in <module> sum(["a","b"], "start") TypeError: sum() can't sum strings [use ''.join(seq) instead]
This behaviour is quite bothersome. Sum can handle arbitrary objects in theory (as long as they define the correct special methods, etc.), but it gratuitously raises an exception on strings. This behaviour is also inconsistent with the following:
sum(["a","b"]) Traceback (most recent call last): File "<pyshell#1>", line 1, in <module> sum(["a","b"]) TypeError: unsupported operand type(s) for +: 'int' and 'str'
Where sum actually tries to add "a" to the default value of 0.
sum is defined by repeatedly adding each number in a sequence. As each number is usually constant, and the size of total grows logarithmically, this is O(n log n) (but due to implementation coarseness it usually isn't distinguished from O(n)).
Concatenation however grows the total's size very quickly. You instead get a performance of O(n**2). Same result, wrong algorithm.
It would be possible to special case strings, but why? The programmer should know what algorithm they're using and what complexity class it has, so they can pick the right one (''.join(seq) in this case). IOW, handling arbitrary objects is an illusion.
I think you misunderstood my point. Sorry if I wasn't clear enough in my original message. I understand the performance characteristics of repeated concatenation vs str.join. I just wonder why the language goes out of its way to catch this particular occurrence of bad code, given there are plenty of ways to misuse sum or any other builtin for that matter. A newbie is more likely to get n**2 performance by using a for loop than sum: final = "" for s in strings: final += s Should python refuse to compile the above snippet? The answer is an emphatic "no".