[Speed] New CPython benchmark suite based on perf

Mon Jul 4 10:17:23 EDT 2016

Hi,

I modified the CPython benchmark suite to use my perf module:
https://hg.python.org/sandbox/benchmarks_perf

Changes:

* use statistics.median() rather than mean() to compute of "average"
of samples. Example:

   Median +- Std dev: 256 ms +- 3 ms -> 262 ms +- 4 ms: 1.03x slower

* replace compat.py with external six dependency
* replace util.py with perf
* replace explicit warmups with perf automatic warmup
* add name metadata
* for benchmark taking parameters, save parameters in metadata
* avoid nested loops, prefer a single level of loop: perf is
responsible to call the sample function enough times to collect enough
samples
* store django and mako version in metadata
* use JSON format to exchange timings between benchmarks and runner.py

perf adds more features:

* run each benchmark in multiple processes (25 by default, 50 in rigorous mode)
* calibrate each benchmark to compute the number of loops to get a
sample between 100 ms and 1 second

TODO:

* Right now the calibration in done twice: in the reference python and
in the changed python. It should only be once in the reference python
* runner.py should write results in a JSON file. Currently, data are
not written on disk (a pipe is used with child processes)
* Drop external dependencies and create a virtual environment per python
* Port more Python 2-only benchmarks to Python 3
* Add more benchmarks from PyPy, Pyston and Pyjion benchmark suites:
unify again the benchmark suites :-)

perf has builtin tools to analyze the distribution of samples:

* add --hist option to a benchmark to display an histogram in text mode
* add --stats option to a benchmark to display statistics: number of
samples, shortest raw sample, min, max, etc.
* "python3 -m perf" CLI allows has many commands to analyze a benchmark:
http://perf.readthedocs.io/en/latest/cli.html

Right now, perf JSON format is only able to store one benchmark. I
will extend the format to be able to store a list of benchmarks. So it
will be possible to store all results of a python version into a
single file.

By the way, I also want to change runner.py CLI to be able to run the
benchmarks on a single python version and then use a second command to
compare two files. Rather than always running each benchmark twice
(reference python, changed python). PyPy runner also works like that
if I recall correctly.

Victor