[Python-ideas] Why does `sum` use a default for the `start` parameter?
Adam Olsen
rhamph at gmail.com
Sat Dec 5 19:48:52 CET 2009
On Sat, Dec 5, 2009 at 10:23, Vitor Bosshard <algorias at gmail.com> wrote:
> And in that case the special string handling could also be dropped?
>
>>>> sum(["a","b"], "start")
> Traceback (most recent call last):
> File "<pyshell#0>", line 1, in <module>
> sum(["a","b"], "start")
> TypeError: sum() can't sum strings [use ''.join(seq) instead]
>
>
> This behaviour is quite bothersome. Sum can handle arbitrary objects
> in theory (as long as they define the correct special methods, etc.),
> but it gratuitously raises an exception on strings. This behaviour is
> also inconsistent with the following:
>
>>>> sum(["a","b"])
> Traceback (most recent call last):
> File "<pyshell#1>", line 1, in <module>
> sum(["a","b"])
> TypeError: unsupported operand type(s) for +: 'int' and 'str'
>
>
> Where sum actually tries to add "a" to the default value of 0.
sum is defined by repeatedly adding each number in a sequence. As
each number is usually constant, and the size of total grows
logarithmically, this is O(n log n) (but due to implementation
coarseness it usually isn't distinguished from O(n)).
Concatenation however grows the total's size very quickly. You
instead get a performance of O(n**2). Same result, wrong algorithm.
It would be possible to special case strings, but why? The programmer
should know what algorithm they're using and what complexity class it
has, so they can pick the right one (''.join(seq) in this case). IOW,
handling arbitrary objects is an illusion.
For an another example on why the programmer needs to understand the
algorithmic complexity of the operations they're using, and that the
language should value performance consistency and not just correct
output, see ABC's usage of rational numbers:
http://python-history.blogspot.com/2009/03/problem-with-integer-division.html
--
Adam Olsen, aka Rhamphoryncus
More information about the Python-ideas
mailing list