[Python-ideas] Intermediate Summary: Fast sum() for non-numbers

Sergey sergemp at mail.ru
Mon Jul 15 02:24:26 CEST 2013


On Jul 14, 2013 Paul Moore wrote:

> sum is currently simple.

Sum currently looks like:

def sum(seq, start = 0):
  it = iter(seq)
  if isinstance(start, str):
    raise TypeError(
      "sum() can't sum strings [use ''.join(seq) instead]")
  if isinstance(start, bytes):
    raise TypeError(
      "sum() can't sum bytes [use b''.join(seq) instead]")
  if isinstance(start, bytearray):
    raise TypeError(
      "sum() can't sum bytearray [use b''.join(seq) instead]")
  # SPECIAL CASES
  if type(start) is int:
    i_result = int(start, overflow)
    if not overflow:
      try:
        start = None
        while start is None:
          item = next(it)
          if isinstance(item, int):
            b = int(item, overflow)
            x = i_result + b
            if not overflow and ((x^i_result) >= 0 or (x^b) >= 0)
              i_result = x
              continue
          start = i_result
          start = start + item
      except StopIteration:
        return i_result
  if type(start) is float:
    f_result = float(start)
    try:
      start = None
      while start is None:
        item = next(it)
        if isinstance(item, float):
          f_result += float(item)
          continue
        if isinstance(item, int):
          value = int(item, overflow)
          if not overflow:
            f_result += float(value);
            continue
        start = f_result
        start = start + item
    except StopIteration:
      return f_result
  # END OF SPECIAL CASES
  try:
    while True:
      item = next(it)
      result = result + item
  except StopIteration:
    return start

>     result = start
>     for elem in iterable:
>         result = result + elem
> 
> The performance complexity follows directly from that definition.

First, nothing says that sum is implemented like that. And not
everyone assume it is:
  Paul Rubin @ 2006-01-12
  > A fast implementation would probably allocate the output list just
  > once and then stream the values into place with a simple index.
  That's what I hoped "sum" would do, but instead it barfs with a type
  error.  So much for duck typing.

> It is no more surprising that numbers are O(N) and containers
> O(N*N) than it is that the loop I show above has that complexity.

Second, line "result = result + elem" does not mean that performance
should be O(N*N) for containers. FastTuple container is a simple O(N)
proof of that concept (see #1 and #2 in "intermediate summary").

> That is to say, it *is* surprising, if you don't know complexity
> calculations very well, but it's a fundamental and trivial result
> once you do.

I do know. And I do understand that containers MAY be O(N*N) but do
NOT have to, as my fasttuple example shows. But many beginners may
not know that, they may just reasonably expect that python is a good
language, and it's a language with dynamic typing, so its creators
are probably smart guys, and their functions work similarly fast
for all the types.

No, really, if you put aside all your deep python knowledge and take
a fresh look at line "result = result + item", what part of that line
makes you think that you MUST walk through the items of "result"?
I.e. nothing stops a "smart python interpreter" to spot result on
both sides and optimize it to perform operation in-place. That's how
a beginner could think about it.

Hey, even in this list of experienced people it wasn't obvious to
many that inplace modification (i.e. "+=") could be so much different
from "+" in commonly used code! Beginners may not even know about
__[i]add__, even less they would expect them to be much different.

> Changing the performance of sum() makes reasoning about its complexity
> *harder* because you can't refer back to that very basic equivalence above
> to inform your assumptions.

Yes, I understand your reasons, you're trying to say that now you
have a simple explanation of sum performance: "sum is O(N) for numbers
and O(N*N) for containers". But you're wrong in both statements.
First: sum is O(N) for many types, not just for numbers. E.g. it's
O(N) for timedeltas, and for numpy arrays (yes, it's O(N) for numpy
arrays, meaning that sum of 1000 arrays is 10 times slower than sum
of 100 arrays). It's also not O(N*N) for all containers, and my
fasttuple example is an evidence [1].

So your can't refer to that reasoning, because it's already wrong!

Hm... If my patch shows that wrong explanation is wrong, does it adds
points to the patch or takes them? ;)

> The suggestion of using += has merit because it keeps the equivalence
> simple, so that people can still reason about complexity in a
> straightforward manner. But there are backward compatibility concerns
> to review and assess.

Yeah, I liked it too, but Oscar's numpy example have killed my
optimism. So my current favourite is #2, I wish it was easy to do.
And from practical point of view #3 should cover most use cases I
could find and was easy to implement, so if "practicality beats
purity" then #3 is the winner.

> Performance is not the only consideration. But this is the underlying
> situation around the performance debates, as I see it. (And people saying
> "don't change anything" are quite possibly doing so because they view
> understandable performance guarantees as more important than the
> "attractive nuisance" of something that is often fast, but it's hard
> to be sure exactly when.)

Well, if they say that because of that guarantee, it means they're
wrong — they're trying to keep something they don't have.

I guess after 10 years of slow sum people got used to it and don't
believe there can be anything good any more. That's why I'm here, I'm
trying to find something good and bring the faith back to people. :)

E.g. they probably don't assume that faster sum could be a side-effect
of tuple being faster. I.e. would you reject the patch making tuple
much faster in many cases, just because one of those cases is sum()?

-- 
[1] http://bugs.python.org/file30917/fasttuple.py
  Adding 100000 of fasttuples
  took 0.23242688179 seconds
  Adding 100000 of built-in tuples
  took 25.2749021053 seconds


More information about the Python-ideas mailing list