[Python-Dev] fixing sum's performance bug

Oct. 25, 2003


      On Saturday 25 October 2003 11:32 am, Alex Martelli wrote:
   ...
...
Indeed, that's a spot where even 'sum' can be a performance trap;
consider the following z.py:
lol = [ [x] for x in range(1000) ]
def flat1(lol=lol):
    return sum(lol, [])
def flat2(lol=lol):
    result = []
    for sub in lol: result.extend(sub)
    return result
and the measurements:
[alex@lancelot pop]$ timeit.py -c -s'import z' 'z.flat1()'
100 loops, best of 3: 8.5e+03 usec per loop
[alex@lancelot pop]$ timeit.py -c -s'import z' 'z.flat2()'
1000 loops, best of 3: 940 usec per loop
sum looks cooler, but it can be an order of magnitude slower
than the humble loop of result.extend calls.  We could fix this
specific performance trap by specialcasing in sum those cases
where the result has a += method -- hmmm... would a patch for
this performance bug be accepted for 2.3.* ...?  (I understand and
approve that we're keen on avoiding adding functionality in any
2.3.*, but fixed-functionality performance enhancements should
be just as ok as fixes to functionality bugs, right?)
Ah well -- it's the most trivial fix one can possibly think of, just
changing PyNumber_Add to PyNumber_InPlaceAdd -- so the
semantics are _guaranteed_ to be equal in all _sane_ cases, i.e.
excepting only weird user-coded types that have an __iadd__
with a weirdly different semantic than __add__ -- and DOES make
sum's CPU time drop to 490 usec in the above (making it roughly
twice as fast as the loop, as it generally tends to be in typical cases
of summing lots of numbers).  So I wend ahead and committed the
tiny change on both the 2.4 and 2.3 maintenance branches (easy
enough to revert if the "insane" cases must keep working in the
same [not sane:-)] way in 2.3.*)...


Alex