
On Saturday 25 October 2003 11:32 am, Alex Martelli wrote: ...
Indeed, that's a spot where even 'sum' can be a performance trap; consider the following z.py:
lol = [ [x] for x in range(1000) ]
def flat1(lol=lol): return sum(lol, [])
def flat2(lol=lol): result = [] for sub in lol: result.extend(sub) return result
and the measurements:
[alex@lancelot pop]$ timeit.py -c -s'import z' 'z.flat1()' 100 loops, best of 3: 8.5e+03 usec per loop
[alex@lancelot pop]$ timeit.py -c -s'import z' 'z.flat2()' 1000 loops, best of 3: 940 usec per loop
sum looks cooler, but it can be an order of magnitude slower than the humble loop of result.extend calls. We could fix this specific performance trap by specialcasing in sum those cases where the result has a += method -- hmmm... would a patch for this performance bug be accepted for 2.3.* ...? (I understand and approve that we're keen on avoiding adding functionality in any 2.3.*, but fixed-functionality performance enhancements should be just as ok as fixes to functionality bugs, right?)
Ah well -- it's the most trivial fix one can possibly think of, just changing PyNumber_Add to PyNumber_InPlaceAdd -- so the semantics are _guaranteed_ to be equal in all _sane_ cases, i.e. excepting only weird user-coded types that have an __iadd__ with a weirdly different semantic than __add__ -- and DOES make sum's CPU time drop to 490 usec in the above (making it roughly twice as fast as the loop, as it generally tends to be in typical cases of summing lots of numbers). So I wend ahead and committed the tiny change on both the 2.4 and 2.3 maintenance branches (easy enough to revert if the "insane" cases must keep working in the same [not sane:-)] way in 2.3.*)... Alex