[Python-Dev] Re: Are we collecting benchmark results across machines

Fri Jan 2 19:31:07 EST 2004

Guido van Rossum <guido at python.org> writes:

> Hm...  My IBM T40 with 1.4 GHz P(M) reports 15.608.  I bet the caches
> are more similar, and affect performance more than CPU speed...

You have a 32K L1 and a 1024K (!) L2.  What a great machine!

As an example of the other end of the spectrum, I'm running current,
but low end hardware: a 2.2 GHz Celeron, 400 MHz FSB.  256MB DDR SDRAM.
Verified working as expected by Intel Processor Speed Test.

My L1 is 12K trace, 8K data.
My L2 is only 128K, and when there's a miss there's no L3 to fall back on.
Cost for processor and mobo, new, $120.

I find this setup pretty snappy for what I do on it: development and
home server.  It's definitely not my game machine :-)

Python 2.2.1 (#1, Oct 17 2003, 16:36:36) 
[GCC 2.95.3 20010125 (prerelease, propolice)] on openbsd3
[best of 3]:
Pystone(1.1) time for 10000 passes = 0.93
This machine benchmarks at 10752.7 pystones/second

Python 2.3.3 (#15, Jan  2 2004, 14:39:36) 
[best of 3]:
Pystone(1.1) time for 50000 passes = 3.46
This machine benchmarks at 14450.9 pystones/second

Python 2.4a0 (#40, Jan  1 2004, 22:22:45) [current cvs]
[best of 3]:
Pystone(1.1) time for 50000 passes = 2.91
This machine benchmarks at 17182.1 pystones/second

(but see the p.s. below)

Now the parrotbench, version 1.04.  [make extra passes to get .pyo
first]

First, python 2.3.3:
best 3: 31.1/31.8/32.3

Next, python 2.4a0, current cvs:
best 3: 31.8/31.9/32.1

Since I noticed quite different ratios between the individual tests
compared to what was posted by Seo Sanghyeon on the pypy list, here's
my numbers (2.4a0):

hydra /home/kbk/PYTHON/python/nondist/sandbox/parrotbench$ make times
for i in 0 1 2 3 4 5 6; do  echo b$i.py;  time /home/kbk/PYSRC/python b$i.py >@out$i;  cmp @out$i out$i;  done
b0.py
    5.48s real     5.30s user     0.05s system
b1.py
    1.36s real     1.22s user     0.10s system
b2.py
    0.44s real     0.42s user     0.04s system
b3.py
    2.01s real     1.94s user     0.04s system
b4.py
    1.69s real     1.63s user     0.05s system
b5.py
    4.80s real     4.73s user     0.02s system
b6.py
    1.84s real     1.56s user     0.26s system

I notice that some of these tests are a little faster on 2.3.3 while
others are faster on 2.4, resulting in the overall time being about
the same on both releases.

N.B. compiling Python w/o the stack protector doesn't make a
noticeable difference ;-)

There may be some other problem with this box that I haven't yet
discovered, but right now I'm blaming the tiny cache for performance
being 2 - 3 x lower than expected from the clock rate, compared
to what others are getting.

-- 
KBK

p.s. I saw quite a large outlier on 2.4 pystone when I first tried
it.  I didn't believe it, but was able to scroll back and clip it:

Python 2.4a0 (#40, Jan  1 2004, 22:22:45) 
[GCC 2.95.3 20010125 (prerelease, propolice)] on openbsd3
Type "help", "copyright", "credits" or "license" for more information.
>>> from test.pystone import main
>>> main(); main(); main()
Pystone(1.1) time for 50000 passes = 4.22
This machine benchmarks at 11848.3 pystones/second
Pystone(1.1) time for 50000 passes = 4.21
This machine benchmarks at 11876.5 pystones/second
Pystone(1.1) time for 50000 passes = 4.21
This machine benchmarks at 11876.5 pystones/second

This is 30% lower than the rate quoted above.  I haven't been able
to duplicate it.  Maybe the OS or X was doing something which
tied up the cache.  This is a fairly lightly loaded machine 
running X, Ion, and emacs.

I've also seen 20% variations in the 2.2.1 pystone benchmark.

It seems to me that this benchmark is pretty cache sensitive and
should be done on an unloaded system, preferable x/o X, and with
the results averaged over many random trials if comparisions are
desired, especially if the cache is small.

I don't see the same variation in the parrotbench.  It's just
consistently low for this box.