[Python-ideas] Fast sum() for non-numbers

Wed Jul 10 18:10:07 CEST 2013

On Jul 9, 2013 Steven D'Aprano wrote:

> The fact that sum(lists) has had quadratic performance since sum
> was first introduced in Python 2.3, and I've *never* seen anyone
> complain about it being slow, suggests very strongly that this is not
> a use-case that matters.

Never seen? Are you sure? ;)
> http://article.gmane.org/gmane.comp.python.general/658630
> From: Steven D'Aprano @ 2010-03-29
> In practical terms, does anyone actually ever use sum on more than a 
> handful of lists? I don't believe this is more than a hypothetical 
> problem.

This is definitely not the first time people ask to fix this O(N**2)
"surprise". Maybe if the problem appears year after year it is not so
hypothetical?

sum() was the first answer suggested for many list-of-lists questions
[1], and sometimes it wasn't even obvious for people why it might be
slow [2]. IMHO, many of those who actually spot the slowness, will
just think "heh, python is so slow", and won't blame sum() for bug.
Why do you think sum() code has explicit comment about using sum for
list of lists? Why test_builtin.py checks this case? Isn't it because
this is the common case?

Or do you mean that nobody suggested a patch before? Well, think
about it like that:
1. How many people in the world use python?
2. How many of them need to add lists?
3. How many of them are careful enough to notice that sum is slow?
4. How many of them are experienced enough to blame sum for that?
5. How many of those are smart enough to understand how it can be
fixed, not just workarounded by using another function?
6. How many of those are skilled enough to dig into python code and
able to fix the bug there?
7. How many of those have enough free time to come here and start
asking to accept the patch?

Not too many, right? And among those someone should be the first.
Well, I am. :) How many do you need?

> No no no. The objection is that complicating the implementation of
> a function in order to optimize a use-case that doesn't come up in
> real-world use is actually harmful. Maintaining sum will be harder,
> for the sake of some benefit that very possibly nobody will actually
> receive.

Are you sure it complicates the implementation?

Here's how sum function looks NOW, without my patches
(well, it looks different in C, but idea is the same):

def sum(seq, start = 0):
  it = iter(seq)
  if isinstance(start, str):
    raise TypeError(
      "sum() can't sum strings [use ''.join(seq) instead]")
  if isinstance(start, bytes):
    raise TypeError(
      "sum() can't sum bytearray [use b''.join(seq) instead]")
  # SPECIAL CASES
  if isinstance(start, int):
    i_result = int(start, overflow)
    if not overflow:
      try:
        start = None
        while start is None:
          item = next(it)
          if isinstance(item, int):
            b = int(item, overflow)
            x = i_result + b
            if not overflow and ((x^i_result) >= 0 or (x^b) >= 0)
              i_result = x
              continue
          start = i_result
          start = start + item
      except StopIteration:
        return i_result
  if isinstance(start, float):
    f_result = float(start)
    try:
      start = None
      while start is None:
        item = next(it)
        if isinstance(item, float):
          f_result += float(item)
          continue
        if isinstance(item, int):
          value = int(item, overflow)
          if not overflow:
            f_result += float(value);
            continue
        start = f_result
        start = start + item
    except StopIteration:
      return f_result
  # END OF SPECIAL CASES
  try:
    while True:
      item = next(it)
      result = result + item
  except StopIteration:
    return start

So simple and obvious. :)

My original patch changes the part:
    while True:
      item = next(it)
      result = result + item
to:
    result = result + item
    while True:
      item = next(it)
      result += item

(In python that effectively adds one line, in C that's 6 lines)
This does not really complicate the 

Alternative to that patch is one more special case for "known slow
types" i.e. lists and tuples. That was my second patch. In python that
would be like:
  if isinstance(start, list) or isinstance(start, tuple):
    optimize_for = type(start)
    l_result = list(start)
    try:
      while True:
        item = next(it)
        if not isinstance(item, optimize_for):
          start = optimize_for(l_result)
          start = start + item
          break
        l_result.extend(item)
    except StopIteration:
      return optimize_for(l_result)

Yes, that's not just one line, but does it really complicates
existing code that much?

Theoretically this code could be extended to any iterable. So,
*theoretically* it could be:
  if start_is_iterable:
    optimize_for = type(start)
    ... same code here ...

but there's a technical problem in the line:
          start = optimize_for(l_result)
This line works for lists and tuples. It will even work for set and
frozenset, if needed. But the problem is that I cannot guarantee that
it will work for arbitrary type. For example it does not work for
strings (even worse, it works, but not as I would want it to).
In my patch I also showed how strings can be handled for that case.

Basically, this very single line is the only line holding me from
making my "special case" into "general case" for all iterables in
the world. So if somebody knows how to solve this problem — any
suggestions welcome.

> I don't care that sum() is O(N**2) on strings, linked lists,
> tuples, lists. I don't think we should care. Sergey thinks we should
> care, and is willing to change the semantics of sum AND include as
> many special cases as needed in order to "guarantee" that sum will be
> "always fast". I don't believe that guarantee can possibly hold, and

Just ONE special case is needed — the one for iterables. And yes,
the "guarantee" can hold, because it only affects built-in types,
and my patch covers all of them, even those that are not supported
by sum() yet. But it's also good for third-party types, because it
gives them an option to implement a fast __iadd__. They don't have
this option now.

> Flattening sequences is not sum. You have to consider ...

Yet people think [1] that sum() is useful for that. Every year
somebody comes and tries to use sum(), and often someone else says
"Hey, don't use sum() it's slow". "BECAUSE IT'S SLOW!" All that talks
about "sum() is not designed for...", "it's just because + is used
for concatenation...", "sum() should not be used...", "you have to
consider..." — all of these are just excuses to explain why the bug
is there. Maybe it's time to stop searching for excuses and finally
fix the bug?

Especially if it's so easy to fix it.

-- 

[1] Some questions about lists flattening:
  http://stackoverflow.com/questions/716477/join-list-of-lists-in-python
  http://stackoverflow.com/questions/952914/making-a-flat-list-out-of-list-of-lists-in-python
  http://stackoverflow.com/questions/406121/flattening-a-shallow-list-in-python

[2] "explain for a noob python learner please: Is O(n^2) good or bad
in this case?"

http://article.gmane.org/gmane.comp.python.general/441831
> A fast implementation would probably allocate the output list just
> once and then stream the values into place with a simple index.
That's what I hoped "sum" would do, but instead it barfs with a type
error.