[Speed] Median +- MAD or Mean +- std dev?

Mon Mar 6 18:37:03 EST 2017

Hi,

Serhiy Storchaka opened a bug report in my perf module: perf displays
Median +- std dev, whereas median absolute deviation (MAD) should be
displayed instead:
https://github.com/haypo/perf/issues/20

I just modified perf to display Median +- MAD, but I'm not sure that
it's better than Mean +- std dev.

The question is important when a benchmark is unstable (has a lot of
outliers). There is good example below with "Median +- MAD: 276 ns +-
10 ns" and "Mean +- std dev: 371 ns +- 196 ns".

The goal of perf is to get reproductible benchmark results. So the
question is what should be displayed (median or mean?) to get the most
reproductible output?

Median +- MAD "hides" outliers. In my experience, outliers are not
"reproductible", but caused by "noise" of the system and other
applications.

I feel that Median +- MAD is what I want, but I would feel more
confortable if someone can confirm with his/her experience :-)

-----------------
haypo at selma$ PYTHONPATH=~/prog/GIT/perf ./python -m perf show --hist
--stats bench.json.gz

234 ns:   3 #
264 ns: 114 ##################################################
293 ns:   9 ####
322 ns:   2 #
351 ns:   0 |
381 ns:   0 |
410 ns:   0 |
439 ns:   1 |
469 ns:   0 |
498 ns:   1 |
527 ns:   1 |
557 ns:   0 |
586 ns:   1 |
615 ns:   1 |
644 ns:   1 |
674 ns:   2 #
703 ns:   1 |
732 ns:   1 |
762 ns:   2 #
791 ns:  15 #######
820 ns:   5 ##

Total duration: 1 min 14.5 sec
Start date: 2017-03-06 23:30:49
End date: 2017-03-06 23:33:11
Raw sample minimum: 137 ms
Raw sample maximum: 444 ms

Number of runs: 42
Total number of samples: 160
Number of samples per run: 4
Number of warmups per run: 2
Loop iterations per sample: 2^19 (128 outer-loops x 4096 inner-loops)

Minimum: 262 ns (-5%)
Median +- MAD: 276 ns +- 10 ns
Mean +- std dev: 371 ns +- 196 ns
Maximum: 847 ns (+207%)

ERROR: the benchmark is very unstable, the standard deviation is very
high (stdev/mean: 53%)!
Try to rerun the benchmark with more runs, samples and/or loops

Median +- MAD: 276 ns +- 10 ns
-----------------

See attached bench.json.gz for full data.

Victor
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bench.json.gz
Type: application/x-gzip
Size: 6108 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/speed/attachments/20170307/22e8b400/attachment.bin>