[Python-Dev] big performance hit in the past few days

Skip Montanaro skip@pobox.com
Wed, 3 Apr 2002 18:33:53 -0600


After Guido checked in the bool() stuff I cvs up'd and rebuilt.  A few days
ago I spent some time trying to quantify the effect of changes to
SMALL_REQUEST_THRESHOLD on pymalloc performance.  The "benchmark" consists
of using the compiler package to compile Lib/*.py and three runs of the
pystone main program with LOOPS set to 100000 (10x the usual value).

On March 31, I got the following output with and without pymalloc enabled
and a SMALL_REQUEST_THRESHOLD of 256:

    w/ pymalloc

    Pystone(1.1) time for 100000 passes = 16.81
    This machine benchmarks at 5948.84 pystones/second
    Pystone(1.1) time for 100000 passes = 16.82
    This machine benchmarks at 5945.3 pystones/second
    Pystone(1.1) time for 100000 passes = 16.83
    This machine benchmarks at 5941.77 pystones/second
    243.84user 0.23system 4:12.73elapsed 96%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (402major+4432minor)pagefaults 0swaps

    w/o pymalloc

    Pystone(1.1) time for 100000 passes = 17.66
    This machine benchmarks at 5662.51 pystones/second
    Pystone(1.1) time for 100000 passes = 17.67
    This machine benchmarks at 5659.31 pystones/second
    Pystone(1.1) time for 100000 passes = 17.66
    This machine benchmarks at 5662.51 pystones/second
    277.88user 0.21system 4:48.10elapsed 96%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (400major+3943minor)pagefaults 0swaps

Running the same benchmark just now I got:

    w/ pymalloc

    Pystone(1.1) time for 100000 passes = 25.1
    This machine benchmarks at 3984.06 pystones/second
    Pystone(1.1) time for 100000 passes = 24.99
    This machine benchmarks at 4001.6 pystones/second
    Pystone(1.1) time for 100000 passes = 24.74
    This machine benchmarks at 4042.04 pystones/second
    352.33user 0.97system 6:51.40elapsed 85%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (499major+4181minor)pagefaults 0swaps

    w/o pymalloc

    Pystone(1.1) time for 100000 passes = 25.01
    This machine benchmarks at 3998.4 pystones/second
    Pystone(1.1) time for 100000 passes = 25.09
    This machine benchmarks at 3985.65 pystones/second
    Pystone(1.1) time for 100000 passes = 25.18
    This machine benchmarks at 3971.41 pystones/second
    374.38user 0.26system 6:37.71elapsed 94%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (401major+3950minor)pagefaults 0swaps

All files were compiled using gcc 3.0.4 with OPT set at -O3.

The fact that the tests slowed down dramatically both with and without
pymalloc enabled suggests that recent changes to obmalloc are not to blame.
(On March 31, I was using obmalloc.c 2.24.  Today I'm using 2.27.)

Any thoughts on the possible cause?  It's tough to casually suggest a
particular culprit because the bool() stuff touched a lot of files.  I can't
simply identify a few files that changed in the past few days.  I count 66
.c[ch] files new or updated since mid-afternoon April 1.

Skip