sum for sequences?

Mon Mar 29 23:34:17 EDT 2010

On Mar 29, 8:01 pm, Steven D'Aprano
<ste... at REMOVE.THIS.cybersource.com.au> wrote:
> You don't define symmetry. You don't even give a sensible example of
> symmetry. Consequently I reject your argument that because sum is the
> obvious way to sum a lot of integers, "symmetry" implies that it should
> be the obvious way to concatenate a lot of lists.
>

You are not rejecting my argument; you are rejecting an improper
paraphrase of my argument.

My argument was that repeated use of "+" is spelled "sum" for
integers, so it's natural to expect the same name for repeated use of
"+" on lists.  Python already allows for this symmetry, just SLOWLY.

>
> You are correct that building intermediate lists isn't *compulsory*,
> there are alternatives, but the alternatives themselves have costs.
> Complexity itself is a cost. sum currently has nice simple semantics,
> which means you can reason about it: sum(sequence, start) is the same as
>
> total = start
> for item in sequence:
>     total = total + start
> return total
>

I could just as reasonably expect these semantics:

 total = start
 for item in sequence:
   total += start
 return total

Python does not contradict my expectations here:

 >>> start = []
 >>> x = sum([], start)
 >>> x.append(1)
 >>> start
 [1]

> You don't have to care what the items in sequence are, you don't have to
> make assumptions about what methods sequence and start have (beyond
> supporting iteration and addition).

The only additional assumption I'm making is that Python can take
advantage of in-place addition, which is easy to introspect.

> Adding special cases to sum means it
> becomes more complex and harder to reason about. If you pass some other
> sequence type in the middle of a bunch of lists, what will happen? Will
> sum suddenly break, or perhaps continue to work but inefficiently?

This is mostly a red herring, as I would tend to use sum() on
sequences of homogenous types.

Python already gives me the power to shoot myself in the foot for
strings.

 >>> list = [1, 2]
 >>> list += "foo"
 >>> list
 [1, 2, 'f', 'o', 'o']

 >>> lst = [1,2]
 >>> lst.extend('foo')
 >>> lst
 [1, 2, 'f', 'o', 'o']

I'd prefer to get an exception for cases where += would do the same.

>>> start = []
>>> bogus_example = [[1, 2], None, [3]]
>>> for item in bogus_example: start += item
...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'NoneType' object is not iterable

> You still need to ask these questions with existing sum, but it is
> comparatively easy to answer them: you only need to consider how the
> alternative behaves when added to a list. You don't have to think about
> the technicalities of the sum algorithm itself -- sometimes it calls +,
> sometimes extend, sometimes +=, sometimes something else

I would expect sum() to support the same contract as +=, which already
works for numerics (so no backward incompatibility), and which already
works for lists.  For custom-designed classes, I would rely on the
promise that augmented assignment falls back to normal methods.

> ... which of the
> various different optimized branches will I fall into this time? Who
> knows? sum already has two branches. In my opinion, three branches is one
> too many.

As long as it falls into the branch that works, I'm happy. :)

>
> "Aggregating" lists? Not summing them? I think you've just undercut your
> argument that sum is the "obvious" way of concatenating lists.
>
> In natural language, we don't talk about "summing" lists, we talk about
> joining, concatenating or aggregating them. You have just done it
> yourself, and made my point for me.

Nor do you use "chain" or "extend."

> And this very thread started because
> somebody wanted to know what the equivalent to sum for sequences.
>
> If sum was the obvious way to concatenate sequences, this thread wouldn't
> even exist.

This thread is entitled "sum for sequences."  I think you just made my
point.