Mailman 3 Re: [Speed] Median +- MAD or Mean +- std dev? - Speed

14 Mar 2017

      On Tue, 7 Mar 2017 01:03:23 +0100
Victor Stinner victor.stinner@gmail.com
wrote:
...
Another example on the same computer. It's interesting:

MAD and std dev is the half of result 1
the benchmark is less unstable
median is very close to result 1
mean changed much more than median

Benchmark result 1:
Median +- MAD: 276 ns +- 10 ns
Mean +- std dev: 371 ns +- 196 ns
Benchmark result 2:
Median +- MAD: 278 ns +- 5 ns
Mean +- std dev: 303 ns +- 103 ns
If the goal is to get reproductible results, Median +- MAD seems better.
Getting reproducible results is only half of the goal. Getting
meaningful (i.e. informative) results is the other half.
The mean approximates the expected performance over multiple runs (note
"expected" is a rigorously defined term in statistics here: see
https://en.wikipedia.org/wiki/Expected_value).  The median doesn't tell
you anything about the expected value (*).  So the mean is more
informative for the task at hand.
Additionally, while mean and std dev are generally quite well
understood, the properties of the median absolute deviation are
generally little known.
So my vote goes to mean +/- std dev.
(*) Quick example: let's say your runtimes in seconds are
[1, 1, 1, 1, 1, 1, 10, 10, 10, 10].
Evidently, there are four outliers (over 10 measurements) that indicate
a huge performance regression occurring at random points.  However, the
median here is 1 and the median absolute deviation (the median of
absolute deviations from the median, i.e. the median of [0, 0, 0, 0, 0,
0, 9, 9, 9, 9]) is 0: the information about possible performance
regressions is entirely lost, and the numbers (median +/- MAD) make it
look like the benchmark reliably takes 1 s. to run.
Regards
Antoine.

Re: [Speed] Median +- MAD or Mean +- std dev?

Antoine Pitrou

tags

participants (4)