[Python-Dev] Python Benchmarks

Sat Jun 3 01:44:07 CEST 2006

[MAL]
>>> Using the minimum looks like the way to go for calibration.

[Terry Reedy]
>> Or possibly the median.

[Andrew Dalke]
> Why?  I can't think of why that's more useful than the minimum time.

A lot of things get mixed up here ;-)  The _mean_ is actually useful
if you're using a poor-resolution timer with a fast test.  For
example, suppose a test takes 1/10th the time of the span between
counter ticks.  Then, "on average", in 9 runs out of 10 the reported
elapsed time is 0 ticks, and in 1 run out of 10 the reported time is 1
tick.  0 and 1 are both wrong, but the mean (1/10) is correct.

So there _can_ be sense to that.  Then people vaguely recall that the
median is more robust than the mean, and all sense goes out the window
;-)

My answer is to use the timer with the best resolution the machine
has.  Using the mean is a way to worm around timer quantization
artifacts, but it's easier and clearer to use a timer with resolution
so fine that quantization doesn't make a lick of real difference.
Forcing a test to run for a long time is another way to make timer
quantization irrelevant, but then you're also vastly increasing
chances for other processes to disturb what you're testing.

I liked benchmarking on Crays in the good old days.  No time-sharing,
no virtual memory, and the OS believed to its core that its primary
purpose was to set the base address once at the start of a job so the
Fortran code could scream.  Test times were reproducible to the
nanosecond with no effort.  Running on a modern box for a few
microseconds at a time is a way to approximate that, provided you
measure the minimum time with a high-resolution timer :-)