[Python-Dev] Re: Are we collecting benchmark results across machines

Fri Jan 2 05:46:10 EST 2004

On Thursday 01 January 2004 07:22 am, Anthony Baxter wrote:
   ...
> (Hm: should the default number of passes for pystone be raised? One and
> a half seconds seems rather a short time for a benchmark...)

An old rule of thumb, when benchmarking something on many machines that
exhibit a wide range of performance, is to try to organize the benchmarks in
terms, not of completing a set number of repetitions, but rather of running as
many repetitions as feasible within a (more or less) set time.  Since, at the 
end, you'll be reporting in terms of "<whatever>s per second", it should make
no real difference BUT it makes it more practical to run the "same" benchmark
on machines (or more generally implementations) spanning orders of magnitude
in terms of the performance they exhibit.  (If the check for "how long have I
been running so far" is very costly, you'll of course do it only "once every N 
repetitions" rather than doing it at every single pass).

Wanting to run pystone on pypy, but not wanting to alter it TOO much from
the CPython original so as to keep the results easily comparable, I just made
the number of loops an optional command line parameter of the pystone
script (as checked in within the pypy project).  Since the laptops we had
around at the Amsterdam pypy sprint ran about 0.8 to 3 pystones/sec with
pypy (a few thousand times slower than with CPython 2.3.*), running all of
the 50,000 iterations was just not practical at all.  Anyway, the ability of
optionally passing in the number of iterations on the command line would 
also help with your opposite problem of too-fast machines -- if 50k loops
just aren't enough for a reasonably-long run, you could use more.

Alex