[Python-Dev] Stop using timeit, use perf.timeit!
steve at pearwood.info
Fri Jun 10 09:20:51 EDT 2016
On Fri, Jun 10, 2016 at 01:13:10PM +0200, Victor Stinner wrote:
> Last weeks, I made researchs on how to get stable and reliable
> benchmarks, especially for the corner case of microbenchmarks. The
> first result is a serie of article, here are the first three:
Thank you for this! I am very interested in benchmarking.
I strongly question your statement in the third:
But how can we compare performances if results are random?
Take the minimum?
No! You must never (ever again) use the minimum for
benchmarking! Compute the average and some statistics like
the standard deviation:
While I'm happy to see a real-world use for the statistics module, I
disagree with your logic.
The problem is that random noise can only ever slow the code down, it
cannot speed it up. To put it another way, the random errors in the
timings are always positive.
Suppose you micro-benchmark some code snippet and get a series of
timings. We can model the measured times as:
measured time t = T + ε
where T is the unknown "true" timing we wish to estimate, and ε is some
variable error due to noise in the system. But ε is always positive,
never negative, and we always measure something larger than T.
Let's suppose we somehow (magically) know what the epsilons are:
measurements = [T + 0.01, T + 0.02, T + 0.04, T + 0.01]
The average is (4*T + 0.08)/4 = T + 0.02
But the minimum is T + 0.01, which is a better estimate than the
average. Taking the average means that *worse* epsilons will effect your
estimate, while the minimum means that only the smallest epsilon effects
Taking the average is appropriate is if the error terms can be positive
or negative, e.g. if they are *measurement error* rather than noise:
measurements = [T + 0.01, T - 0.02, T + 0.04, T - 0.01]
The average is (4*T + 0.02)/4 = T + 0.005
The minimum is T - 0.02, which is worse than the average.
Unless you have good reason to think that the timing variation is mostly
caused by some error which can be both positive and negative, the
minimum is the right statistic to use, not the average. But ask
yourself: what sort of error, noise or external influence will cause the
code snippet to run FASTER than the fastest the CPU can execute it?
More information about the Python-Dev