[Python-ideas] Fast sum() for non-numbers

Tue Jul 2 20:12:09 CEST 2013

Hello, python-ideas. Trying to cover everything in one shot, so
the message is long, sorry.

sum() is a great function. It is the "obvious way" to add things.
Unfortunately sometimes it's slower than it could be.

The problem is that code:
  sum([[1,2,3]]*1000000, [])
takes forever to complete. Let's fix that!

This problem was there since the sum() introduction, but it is almost
unknown among python-beginners.

When people look at sum(seq,start=0) signature they most probably
expect it to be like this:
  def sum(seq, start = 0):
    for item in seq:
      start += item
    return start

But it is not, because in cases like:
  empty = []
  result = sum(list_of_lists, empty)
such implementation would modify content of "empty". This use-case is
known, it is checked in test_builtin.py, and explicitly mentioned in
comments inside sum() code.

So actually sum() looks like this:
  def sum(seq, start = 0):
    for item in seq:
      start = start + item
    return start
it creates a copy of the partial result on every "start + item".

What I suggest is instead of making a copy for every item make just one
copy and reuse it as many times as needed. For example:
  def sum(seq, start = 0):
    start = start + seq[0]
    for item in seq[1:]:
      start += item
    return start

Patch implementing this idea attached to issue18305. Patch is simple, it
just rearranges existing code. It should work for python 2.7, 3.3 and
hg-tip. Except sum() becoming faster there should be no behavior change.
Can this implementation break anything?

Advantages:
* Performance boost is like 200000% [1], nothing else becomes slower

Disadvantages (inspired by Terry J. Reedy):
* Other pythons (pypy, jython, older cpython versions) do not have
  such optimisation, people will move code that depends on the internal
  optimization to pythons that do not have it. And they will complain
  about their program 'freezing'.

* It discourages people from carefully thinking about whether they
  actually need a concrete list or merely the iterator for a virtual
  list.

Alternatives to sum() for this use-case:
* list comprehension is 270% slower than patched sum [2]
* itertools.chain is 50%-100% slower than patched sum [3]

Main questions:
* Whether to accept the patch
* If yes, whether it should go to the new version or to a bugfix release

Alternatives to this patch:
* Reject the idea as a whole. Intentionally keep the sum() slow.
* Accept the idea, but reject this particular implementation, instead use
  something different, e.g. implement special case optimization or use
  different approach (I thought about start=copy.copy(start))

In my opinion (looking though python changelog) performance patches were
accepted for bugfix releases many times before. So as long as this patch
does not break anything, it may go even to python 2.7.

I think bad performance is a bug that should be fixed. Rejecting
performance fixes just because older versions don't have it would mean
no performance improvements to anything, ever.

Sum is a great basic function. It keeps the simple code simple. This
patch puts it on par with more advanced features like itertools, and
allows people to use sum for simple things and itertools for complex
memory-efficient algorithms.

Questions that may arise if the patch is accepted:
* sum() was rejecting strings because of this bug. If the bug gets fixed
  should another patch allow sum() to accept strings?
* maybe in some distant future drop the second parameter (or make it
  None by default) and allow calling sum for everything, making sum()
  "the one obvious way" to sum up things?

It would be nice if sum "just worked" for everything (e.g. sum() of
empty sequence would return None, i.e. if there's nothing to sum then
nothing is returned). But I think it needs more work for that, because
even with this patch sum() is still ~20 times slower than "".join() [4]

That's all. Any suggestions welcome.

-- 

[1] Python 2.7.5
Before patch:
  $ ./python -mtimeit --setup="x=[[1,2,3]]*10000" "sum(x,[])"
  10 loops, best of 3: 885 msec per loop
After patch:
  $ ./python -mtimeit --setup="x=[[1,2,3]]*10000" "sum(x,[])"
  1000 loops, best of 3: 524 usec per loop

[2] Python 2.7.5 with patch:
  $ ./python -mtimeit --setup="x=[[1,2,3]]*10000" "[i for l in x for i in l]"
  1000 loops, best of 3: 1.94 msec per loop

[3] Python 2.7.5 with patch:
  $ ./python -mtimeit --setup="x=[[1,2,3]]*10000" --setup="from itertools import chain" "list(chain.from_iterable(x))"
  1000 loops, best of 3: 821 usec per loop

  $ ./python -mtimeit --setup="x=[[1,2,3]]*10000" --setup="from itertools import chain" "list(chain(*x))"
  1000 loops, best of 3: 1.03 msec per loop

[4] Python 2.7.5 with patch and string check removed:
  $ ./python -mtimeit --setup="x=['a']*10000" "sum(x,'')"
  100 loops, best of 3: 3.98 msec per loop

  $ ./python -mtimeit --setup="x=['a']*10000" "''.join(x)"
  10000 loops, best of 3: 170 usec per loop