[Python-Dev] fixing sum's performance bug

Sat Oct 25 08:57:05 EDT 2003

On Saturday 25 October 2003 11:32 am, Alex Martelli wrote:
   ...
> Indeed, that's a spot where even 'sum' can be a performance trap;
> consider the following z.py:
>
> lol = [ [x] for x in range(1000) ]
>
> def flat1(lol=lol):
>     return sum(lol, [])
>
> def flat2(lol=lol):
>     result = []
>     for sub in lol: result.extend(sub)
>     return result
>
> and the measurements:
>
> [alex at lancelot pop]$ timeit.py -c -s'import z' 'z.flat1()'
> 100 loops, best of 3: 8.5e+03 usec per loop
>
> [alex at lancelot pop]$ timeit.py -c -s'import z' 'z.flat2()'
> 1000 loops, best of 3: 940 usec per loop
>
> sum looks cooler, but it can be an order of magnitude slower
> than the humble loop of result.extend calls.  We could fix this
> specific performance trap by specialcasing in sum those cases
> where the result has a += method -- hmmm... would a patch for
> this performance bug be accepted for 2.3.* ...?  (I understand and
> approve that we're keen on avoiding adding functionality in any
> 2.3.*, but fixed-functionality performance enhancements should
> be just as ok as fixes to functionality bugs, right?)

Ah well -- it's the most trivial fix one can possibly think of, just
changing PyNumber_Add to PyNumber_InPlaceAdd -- so the
semantics are _guaranteed_ to be equal in all _sane_ cases, i.e.
excepting only weird user-coded types that have an __iadd__
with a weirdly different semantic than __add__ -- and DOES make
sum's CPU time drop to 490 usec in the above (making it roughly
twice as fast as the loop, as it generally tends to be in typical cases
of summing lots of numbers).  So I wend ahead and committed the
tiny change on both the 2.4 and 2.3 maintenance branches (easy
enough to revert if the "insane" cases must keep working in the
same [not sane:-)] way in 2.3.*)...

Alex