On Fri, Jun 10, 2016 at 01:13:10PM +0200, Victor Stinner wrote:
Hi,
Last weeks, I made researchs on how to get stable and reliable benchmarks, especially for the corner case of microbenchmarks. The first result is a serie of article, here are the first three:
Thank you for this! I am very interested in benchmarking.
https://haypo.github.io/journey-to-stable-benchmark-system.html https://haypo.github.io/journey-to-stable-benchmark-deadcode.html https://haypo.github.io/journey-to-stable-benchmark-average.html
I strongly question your statement in the third: [quote] But how can we compare performances if results are random? Take the minimum? No! You must never (ever again) use the minimum for benchmarking! Compute the average and some statistics like the standard deviation: [end quote] While I'm happy to see a real-world use for the statistics module, I disagree with your logic. The problem is that random noise can only ever slow the code down, it cannot speed it up. To put it another way, the random errors in the timings are always positive. Suppose you micro-benchmark some code snippet and get a series of timings. We can model the measured times as: measured time t = T + ε where T is the unknown "true" timing we wish to estimate, and ε is some variable error due to noise in the system. But ε is always positive, never negative, and we always measure something larger than T. Let's suppose we somehow (magically) know what the epsilons are: measurements = [T + 0.01, T + 0.02, T + 0.04, T + 0.01] The average is (4*T + 0.08)/4 = T + 0.02 But the minimum is T + 0.01, which is a better estimate than the average. Taking the average means that *worse* epsilons will effect your estimate, while the minimum means that only the smallest epsilon effects your estimate. Taking the average is appropriate is if the error terms can be positive or negative, e.g. if they are *measurement error* rather than noise: measurements = [T + 0.01, T - 0.02, T + 0.04, T - 0.01] The average is (4*T + 0.02)/4 = T + 0.005 The minimum is T - 0.02, which is worse than the average. Unless you have good reason to think that the timing variation is mostly caused by some error which can be both positive and negative, the minimum is the right statistic to use, not the average. But ask yourself: what sort of error, noise or external influence will cause the code snippet to run FASTER than the fastest the CPU can execute it? -- Steve