[Python-ideas] Fast sum() for non-numbers

Joshua Landau joshua.landau.ws at gmail.com
Thu Jul 4 00:02:00 CEST 2013


On 3 July 2013 18:58, Steven D'Aprano <steve at pearwood.info> wrote:
> On 04/07/13 02:58, Joshua Landau wrote:
>
>> What's wrong with sum, if sum is fast?
>
> sum simply cannot *always* be fast. E.g. summing tuples will still be slow
> even with your suggestion.
>
> Using sum() on anything other than numbers is conceptually dubious; yes,
> sum() is intended for addition, but it's intended for *numeric* addition.

Sum: A quantity obtained by addition or aggregation.
[http://en.wiktionary.org/wiki/sum]

I don't see why it should be constrained.

> It's unfortunate that Python uses + for concatenation,

Is it? I'm very happy with it myself.

> we're stuck with it
> until Python 4000, but if you're using sum() to add lists, tuples, or other
> non-numbers, you're living in a state of sin.

Why? There are a lot of things it makes sense to take the sum of - a
lot of which have constant-ish performance on addition.

> (A venal sin rather than a
> mortal sin.) It's allowed, but we shouldn't encourage it, and treating sum()
> as if it were the One Obvious Way to concatenate data is, in my strong
> opinion, the wrong thing to do.

No, no. A fast sum() is not the one obvious way to concatenate data,
much as sum() is not the one obvious way to add data. It just means
that if conceptually you're looking for the sum, you'd be able to do
it without shooting yourself in the foot.

I'd also point out there are a *lot* of times when iadd is faster than
add, not just contancation.

> In my opinion, the One Obvious Way to concatenate a lot of arbitrary data is
> list.extend, not sum.

Is it? Maybe for lists. Often itertools.chain is better. It really
depends on your data. I tend to use itertools.chain a lot more than
list.extend, because I sum iterators more than I sum lists. Maybe I'm
just weird.

> I can't gather any enthusiasm for optimizing sum to speed up concatenation.
> I'm at best indifferent towards the specific proposal to speed up sum of
> lists; I'm hostile to any attempt to encourage people to treat sum() as the
> preferred way to concatenate large amounts of data, because that will surely
> lead them into bad habits and will end with them trying to sum() a lot of
> tuples or linked lists or something and getting O(n**2) performance.

Maybe the difference is that I look at this as an optimisation to
every use of sum where iadd is faster than add, not just list
contancation.

If you're summing tuples, either you know what to do (sum into a list
and then convert back) or you're going to do it wrong *anyway*. I
can't remember the last time I had to sum tuples.

Also, if you're summing linked lists, why would it be O(n**2)
performance? Surely it's still O(n), given the new implementation?


More information about the Python-ideas mailing list