[Python-Dev] Python Benchmarks

Tim Peters tim.peters at gmail.com
Sat Jun 3 01:44:07 CEST 2006

>>> Using the minimum looks like the way to go for calibration.

[Terry Reedy]
>> Or possibly the median.

[Andrew Dalke]
> Why?  I can't think of why that's more useful than the minimum time.

A lot of things get mixed up here ;-)  The _mean_ is actually useful
if you're using a poor-resolution timer with a fast test.  For
example, suppose a test takes 1/10th the time of the span between
counter ticks.  Then, "on average", in 9 runs out of 10 the reported
elapsed time is 0 ticks, and in 1 run out of 10 the reported time is 1
tick.  0 and 1 are both wrong, but the mean (1/10) is correct.

So there _can_ be sense to that.  Then people vaguely recall that the
median is more robust than the mean, and all sense goes out the window

My answer is to use the timer with the best resolution the machine
has.  Using the mean is a way to worm around timer quantization
artifacts, but it's easier and clearer to use a timer with resolution
so fine that quantization doesn't make a lick of real difference.
Forcing a test to run for a long time is another way to make timer
quantization irrelevant, but then you're also vastly increasing
chances for other processes to disturb what you're testing.

I liked benchmarking on Crays in the good old days.  No time-sharing,
no virtual memory, and the OS believed to its core that its primary
purpose was to set the base address once at the start of a job so the
Fortran code could scream.  Test times were reproducible to the
nanosecond with no effort.  Running on a modern box for a few
microseconds at a time is a way to approximate that, provided you
measure the minimum time with a high-resolution timer :-)

More information about the Python-Dev mailing list