sum for sequences?

Steven D'Aprano steve at REMOVE-THIS-cybersource.com.au
Thu Mar 25 17:11:50 EDT 2010


On Thu, 25 Mar 2010 14:02:05 +0100, Alf P. Steinbach wrote:

> * Neil Cerutti:
>> On 2010-03-25, Steven D'Aprano <steven at REMOVE.THIS.cybersource.com.au>
>> wrote:
>>>> You might not want to be so glib.  The sum doc sure doesn't sound
>>>> like it should work on lists.
>>>>
>>>>     Returns the sum of a sequence of numbers (NOT strings) plus the
>>>>     value of parameter 'start' (which defaults to 0).
>>> What part of that suggested to you that sum might not be polymorphic?
>>> Sure, it says numbers (which should be changed, in my opinion), but it
>>> doesn't specify what sort of numbers -- ints, floats, or custom types
>>> that have an __add__ method.
>> 
>> WTF.
> 
> I think Steven's argument is that it would be pointless for 'sum' to
> distinguish between user-defined numerical types and other types that
> happen to support '+'.

Before Python2.6, which introduced a numeric tower, Python *couldn't* 
reliably distinguish between numeric types and other types that 
overloaded +. Since Python discourages type-checking in favour of duck-
typing and try...except, this is seen as a good thing.

My argument is that sum isn't hard-coded to only work on the built-ins 
ints or floats, but it supports any object that you can use the + 
operator on. The *sole* exceptions are str and unicode (not even 
UserString), and even there it is very simple to overcome the restriction:

>>> sum(['a', 'b'], '')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: sum() can't sum strings [use ''.join(seq) instead]
>>> class S:
...     def __add__(self, other):
...             return other
...
>>> sum(['a', 'b'], S())
'ab'


[...]
> However, given that it isn't restricted to numbers, the restriction wrt.
> strings is a bit perplexing in the context of modern CPython. But for
> Python implementations that don't offer the '+=' optimization it might
> help to avoid gross inefficiencies, namely quadratic time string
> concatenation.

I agree -- the Python philosophy is to allow the user to shoot themselves 
in the foot if they wish to. You're responsible for the Big Oh behaviour 
of your code, not the compiler.


[...]
> However, if that hypothesis about the rationale is correct, then 'sum'
> should also be restricted to not handle tuples or lists, so forth, but
> at least the CPython implementation does.

The reasoning is that naive users are far, far more likely to try summing 
a large list of strings than to try summing a large list of lists, and 
therefore in practical terms the consequences of allowing sum on lists is 
slight enough and rare enough to not be worth the check.

I suspect that this is just an after the fact rationalisation, and that 
the real reason is that those responsible for the hand-holding in sum 
merely forgot, or didn't know, that repeated addition of lists and tuples 
is also O(N**2). But I've never cared enough to dig through the archives 
to find out.



-- 
Steven



More information about the Python-list mailing list