[Speed] Median +- MAD or Mean +- std dev?

Wed Mar 15 12:32:49 EDT 2017

2017-03-13 21:38 GMT+01:00 Antoine Pitrou <solipsis at pitrou.net>:
>> If the goal is to get reproductible results, Median +- MAD seems better.
>
> Getting reproducible results is only half of the goal. Getting
> meaningful (i.e. informative) results is the other half.

If the system is tuned for benchmarks (run "python3 -m perf system
tune"), you get almost no outlier on CPU-bound functions. In this
case, mean/median and stdev/MAD are similar.

The problem is when people don't tune their system to run benchmarks,
which is likely the most common case. In this case, the distribution
is never normal :-) It's always skewed (positive skew, the right part
contains more points).

Reproductibility is a very concrete and practical issue for me.

> Additionally, while mean and std dev are generally quite well
> understood, the properties of the median absolute deviation are
> generally little known.

A friend suggested me to display sigma = 1.48 * MAD, instead of
displaying directly MAD, to get a value close to the standard
deviation without outliers. I don't know if it makes sense :-)

Victor