Re: [Speed] New CPython benchmark suite based on perf
On Mon, 4 Jul 2016 16:17:23 +0200 Victor Stinner <victor.stinner@gmail.com> wrote:
Changes:
use statistics.median() rather than mean() to compute of "average" of samples. Example:
Median +- Std dev: 256 ms +- 3 ms -> 262 ms +- 4 ms: 1.03x slower
That doesn't sound like a terrific idea. Why do you think the median gives a more interesting figure here?
(please note that median() doesn't compute an "average" at all...)
- replace compat.py with external six dependency
I would suggest vendoring six, to avoid adding dependencies.
- use JSON format to exchange timings between benchmarks and runner.py
That's a very nice improvement.
TODO:
- Right now the calibration in done twice: in the reference python and in the changed python. It should only be once in the reference python
I think doing calibration in each interpreter is the right thing to do, because the two interpreters may have very different performance characteristics (say one is 10x faster than the other).
Regards
Antoine.
2016-07-04 19:49 GMT+02:00 Antoine Pitrou <solipsis@pitrou.net>:
Median +- Std dev: 256 ms +- 3 ms -> 262 ms +- 4 ms: 1.03x slower
That doesn't sound like a terrific idea. Why do you think the median gives a more interesting figure here?
When the distribution is uniform, mean and median are the same. In my experience with Python benchmarks, usually the curse is skewed: the right tail is much longer.
When the system noise is high, the skewness is much larger. In this case, median looks "more correct". IMO it helps to reduce the system noise. See graphics and the discussion for the detail: https://github.com/haypo/perf/issues/1
- replace compat.py with external six dependency
I would suggest vendoring six, to avoid adding dependencies.
Ah, that's a different topic. I'm more in favor of dropping of vendor copies of libraries, and rather get them from PyPI using a virtualenv. It should make the benchmark repository smaller and allow to upgrade dependencies more easily.
What do you think?
TODO:
- Right now the calibration in done twice: in the reference python and in the changed python. It should only be once in the reference python
I think doing calibration in each interpreter is the right thing to do, because the two interpreters may have very different performance characteristics (say one is 10x faster than the other).
Ah yes, maybe. It's true that telco benchmark is *much* faster on Python 3.
Anyway, the result is normalized per loop iteration: raw sample / loops. By the way, perf has an "inner-loops" parameter for micro-benchmarks which duplicates an instruction N times to reduce the overhead of loops.
Victor
participants (2)
-
Antoine Pitrou -
Victor Stinner