[Python-ideas] Fast sum() for non-numbers

Sergey sergemp at mail.ru
Wed Jul 3 13:02:53 CEST 2013


On Jul 2, 2013 Andrew Barnert wrote:

I'm trying to make sum() more SIMPLE, not more complex.

Currently sum() "just works" only for numbers. So I was thinking how
to make it "just work" for everything.

Check this:
  sum([something])
  sum([something1, something2])
and this:
  sum([], something)
  sum([something2], something1)
First one looks obvious even for those who does not know python,
while to understand the second one, you have to actually read
the manuals. And even after that it may be unclear why second
parameter is mandatory for non-numbers.

"Readability counts."

But, anyway, that's not the point of THIS bugreport/patch.
This patch is to fix the slowness, that is easy to fix, without
introducing any changes.

> Does it actually speed up strings?

Unfortunately it does not, sum() is still much slower than join().
That's because join() was optimised for strings, it walks through
the list twice, first time to calculate total size of all the strings
and allocate the entire string in one shot, and second time to copy
all the strings into allocated memory.

> sum can't guess the unified type of all of the elements in the
> iterable. The best it could do is what reduce does in that case:
> start with the first element, and add from there. That's not always
> the magical DWIM you seem to be expecting.

Why not? I wouldn't expect anything else from sum().

Well, sum() is so common and useful, I was just thinking of how to
turn it into "the one obvious way", something that "just works",
something like:
  def justsum(seq):
    result = None
    try:
      seq = iter(seq)
      result = next(seq)
      result = result + next(seq)
      while True:
        result += next(seq)
    except StopIteration:
      return result

It would work for everything:
>>> justsum(['a', 'bb', 'ccc'])
'abbccc'
>>> justsum([1, 2, 3])
6
>>> justsum([[1,2], [3,4], [5,6]])
[1, 2, 3, 4, 5, 6]
>>> justsum([3, 0.1, 0.04])
3.14

It would still work even for weird cases. For example, what should be
the sum of one element? It should be the element itself, no matter
what element would it be, right? For example:
>>> justsum([list])
<type 'list'>
>>> g = (i*i for i in range(10))
>>> justsum([g])
<generator object <genexpr> at 0xa93280>

I know, these are unusual use-cases, but they work as expected.

> Most importantly, how could it possibly work for iterables that
> might be empty? ''.join(lines), or sum(lines, ''), will work when
> there are no lines; sum(lines) can't possibly know that you expected
> a string.

If there're no elements to sum it would return "nothing" i.e. None.
That looks rather obvious to me. However I'm not Dutch, so I can be
wrong. :)

> Meanwhile, if you're going to add optional "start from the first
> item" functionality, I think you'll also want to make the
> operator/function overridable. 
>
> And then you've just re-invented reduce, with a slightly different signature:
>     sum(iterable, start=None, function=operator.iadd):
>         return reduce(function, iterable, start)

Does not work even for:
  sum([1, 2, 3, 4])
not to mention:
  lists = [[1, 2], [3, 4]]
  sum(lists)
where reduce-based code fails, because it modifies original list.

> You could always special case it when start is a str (or when
> start is defaulted and the first value is a str). But then what
> happens if the second value is something that can be added to a str,
> but not actually a str? Or, for that matter, the start value?

Fallback to general code. This is how sum() works right now. First it
attempts to store result in integer. If that fails it tries float. If
that fails too, even if it fails in the middle of the list, sum()
falls back to general PyNumber_Add().

My patch only adjusts that general part so that it first tries
PyNumber_InPlaceAdd(), and only if that fails uses PyNumber_Add().
Well, PyNumber_InPlaceAdd() does that for me.

-- 


More information about the Python-ideas mailing list