Re: [Speed] Median +- MAD or Mean +- std dev?
On Tue, 7 Mar 2017 01:03:23 +0100 Victor Stinner victor.stinner@gmail.com wrote:
Another example on the same computer. It's interesting:
- MAD and std dev is the half of result 1
- the benchmark is less unstable
- median is very close to result 1
- mean changed much more than median
Benchmark result 1:
Median +- MAD: 276 ns +- 10 ns Mean +- std dev: 371 ns +- 196 ns
Benchmark result 2:
Median +- MAD: 278 ns +- 5 ns Mean +- std dev: 303 ns +- 103 ns
If the goal is to get reproductible results, Median +- MAD seems better.
Getting reproducible results is only half of the goal. Getting meaningful (i.e. informative) results is the other half.
The mean approximates the expected performance over multiple runs (note "expected" is a rigorously defined term in statistics here: see https://en.wikipedia.org/wiki/Expected_value). The median doesn't tell you anything about the expected value (*). So the mean is more informative for the task at hand.
Additionally, while mean and std dev are generally quite well understood, the properties of the median absolute deviation are generally little known.
So my vote goes to mean +/- std dev.
(*) Quick example: let's say your runtimes in seconds are [1, 1, 1, 1, 1, 1, 10, 10, 10, 10]. Evidently, there are four outliers (over 10 measurements) that indicate a huge performance regression occurring at random points. However, the median here is 1 and the median absolute deviation (the median of absolute deviations from the median, i.e. the median of [0, 0, 0, 0, 0, 0, 9, 9, 9, 9]) is 0: the information about possible performance regressions is entirely lost, and the numbers (median +/- MAD) make it look like the benchmark reliably takes 1 s. to run.
Regards
Antoine.
participants (4)
-
Antoine Pitrou
-
Nick Coghlan
-
Serhiy Storchaka
-
Victor Stinner