New subject: New CPython benchmark suite based on perf

5 Jul 2016 · *probably*

      On Mon, 4 Jul 2016 22:51:11 +0200
Victor Stinner victor.stinner@gmail.com
wrote:
...
2016-07-04 19:49 GMT+02:00 Antoine Pitrou solipsis@pitrou.net:
...
...
Median +- Std dev: 256 ms +- 3 ms -> 262 ms +- 4 ms: 1.03x slower
That doesn't sound like a terrific idea. Why do you think the median
gives a more interesting figure here?
When the distribution is uniform, mean and median are the same. In my
experience with Python benchmarks, usually the curse is skewed: the
right tail is much longer.
When the system noise is high, the skewness is much larger. In this
case, median looks "more correct".
It "looks" more correct?
Let's say your Python implementation has a flaw: it is almost always
fast, but every 10 runs, it becomes 3x slower.  Taking the mean will
reflect the occasional slowness.  Taking the median will completely
hide it.
Then of course, since you have several processes and several runs per
process, you could try something more convoluted, such as
mean-of-medians or mean-of-mins or...
However, if you're concerned by system noise, there may be other ways
to avoid it. For example, measure both CPU time and wall time, and if
CPU time < 0.9 * wall time (for example), ignore the number and take
another measurement.
(this assumes all benchmarks are CPU-bound - which they should be here

and single-threaded - which they *probably* are, except in a
hypothetical parallelizing Python implementation ;-)))

Regards
Antoine.

Re: [Speed] New CPython benchmark suite based on perf

Antoine Pitrou

Victor Stinner

Someone suggested me to compare the minimum and the maximum to the median. You get already see that using perf stats:

Median +- std dev: 26.9 ms +- 0.2 ms

tags

participants (2)