
On Jan 13, 2004, at 12:14 PM, Guido van Rossum wrote:
[Guido]
Just in case nobody had thought of it before, couldn't the realloc() call be avoided when roundupsize(oldsize) == roundupsize(newsize)???
[Tim]
Followup: it's not quite that easy, because (at least) PyList_New(int size) can create a list whose allocated size hasn't been rounded up.
If I "fix" that, then it's possible to get away with just one roundupsize() call on most list.insert() calls (including list.append()), while avoiding the realloc() call too. Patch attached.
I timed it like so:
def punchit(): from time import clock as now a = [] push = a.append start = now() for i in xrange(1000000): push(i) finish = now() return finish - start
for i in range(10): print punchit()
and got these elapsed times (this is Windows, so this is elapsed wall-clock time):
before after ------------- -------------- 1.05227710823 1.02203188119 1.05532442716 0.569660068053 1.05461539751 0.568627533147 1.0564449622 0.569336562799 1.05964146231 0.573247959235 1.05679528655 0.568503494862 1.05569402772 0.569745553898 1.05383177727 0.569499991619 1.05528000805 0.569163914916 1.05636618113 0.570356526258
So pretty consistent here, apart from the glitch on the first "after" run, and about an 80% speedup.
Yawn <wink>.
If someone wants to run with this, be my guest. There may be other holes (c.f. PyList_New above), and it could sure use a comment about why it's correct (if it in fact is <0.9 wink>).
Awesome! We need timing results on some other platforms (Linux, FreeBSD, MacOSX).
Volunteers?
Mac OS X 10.3.2, gcc (GCC) 3.3 20030304 (Apple Computer, Inc. build 1495), dual g5 2ghz w/ 1gb ram, ./configure --disable-framework before after 0.38 0.29 0.36 0.26 0.36 0.26 0.36 0.27 0.36 0.26 0.36 0.26 0.36 0.26 0.36 0.26 0.36 0.27 0.36 0.26 Mac OS X 10.3.2, gcc (GCC) 3.3 20030304 (Apple Computer, Inc. build 1495), dvi titanium powerbook g4 1ghz w/ 1gb ram, ./configure --disable-framework before after 0.78 0.57 0.74 0.53 0.73 0.53 0.74 0.52 0.73 0.52 0.76 0.53 0.73 0.53 0.75 0.53 0.75 0.53 0.75 0.52 So it seems that the 2ghz g5 is about twice as fast as the 1ghz g4 for Python (when running TIm's test), and that either way you save about 25% with Tim's patch. -bob