sum and strings

Steven D'Aprano steve at REMOVEME.cybersource.com.au
Mon Aug 21 05:36:21 EDT 2006


On Fri, 18 Aug 2006 19:08:37 -0700, Paul Rubin wrote:

> If the args are strings, the above is a quadratic time algorithm and
> it's better to throw an error than create such a trap for an unwary user.

That's a nonsense argument. There is no shortage of slow algorithms, and
some of them are built into Python. When did O(n**2) become an error
condition? And overhead matters: if I'm only doing a few concatenations,
it is significantly faster to add the strings using + than to use join()
(at least in Python 2.3 -- YMMV):

>>> s = timeit.Timer("''.join(L)", "L=['a'*50, 'b'*50, 'c'*50]")
>>> s.timeit()
1.3470098972320557
>>> t = timeit.Timer("a+b+c", "a,b,c = 'a'*50, 'b'*50, 'c'*50")
>>> t.timeit()
1.0698421001434326

There's a word for optimizations that actually result in code running 25%
slower.

I applaud that Python's language developers care about efficiency. I
applaud that the documentation warns people about traps and potential
areas of inefficiency. But I think it is a shame that sum() does a
special type-check to avoid something which is only sometimes slow. It
doesn't protect against O(n**2) performance; it merely protects against
just one of an infinite number of possible "traps for the unwary".

I would have thought it would be better for sum() to raise a warning, not
an exception. If we take seriously the argument that sum implies
addition, and that string concatenation isn't really addition, then sum()
should also refuse to operate on lists and tuples and any other
non-numeric class. Either would be better than sum() "protecting" the user
from himself, except when it doesn't.



-- 
Steven D'Aprano 




More information about the Python-list mailing list