On Fri, Jun 10, 2016 at 1:13 PM, Victor Stinner <victor.stinner@gmail.com> wrote:
Hi,

Last weeks, I made researchs on how to get stable and reliable
benchmarks, especially for the corner case of microbenchmarks. The
first result is a serie of article, here are the first three:

https://haypo.github.io/journey-to-stable-benchmark-system.html
https://haypo.github.io/journey-to-stable-benchmark-deadcode.html
https://haypo.github.io/journey-to-stable-benchmark-average.html

The second result is a new perf module which includes all "tricks"
discovered in my research: compute average and standard deviation,
spawn multiple worker child processes, automatically calibrate the
number of outter-loop iterations, automatically pin worker processes
to isolated CPUs, and more.

The perf module allows to store benchmark results as JSON to analyze
them in depth later. It helps to configure correctly a benchmark and
check manually if it is reliable or not.

The perf documentation also explains how to get stable and reliable
benchmarks (ex: how to tune Linux to isolate CPUs).

perf has 3 builtin CLI commands:

* python -m perf: show and compare JSON results
* python -m perf.timeit: new better and more reliable implementation of timeit
* python -m metadata: display collected metadata

Python 3 is recommended to get time.perf_counter(), use the new
accurate statistics module, automatic CPU pinning (I will implement it
on Python 2 later), etc. But Python 2.7 is also supported, fallbacks
are implemented when needed.

Example with the patched telco benchmark (benchmark for the decimal
module) on a Linux with two isolated CPUs.

First run the benchmark:
---
$ python3 telco.py --json-file=telco.json
.........................
Average: 26.7 ms +- 0.2 ms
---


Then show the JSON content to see all details:
---
$ python3 -m perf -v show telco.json
Metadata:
- aslr: enabled
- cpu_affinity: 2, 3
- cpu_count: 4
- cpu_model_name: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
- hostname: smithers
- loops: 10
- platform: Linux-4.4.9-300.fc23.x86_64-x86_64-with-fedora-23-Twenty_Three
- python_executable: /usr/bin/python3
- python_implementation: cpython
- python_version: 3.4.3

Run 1/25: warmup (1): 26.9 ms; samples (3): 26.8 ms, 26.8 ms, 26.7 ms
Run 2/25: warmup (1): 26.8 ms; samples (3): 26.7 ms, 26.7 ms, 26.7 ms
Run 3/25: warmup (1): 26.9 ms; samples (3): 26.8 ms, 26.9 ms, 26.8 ms
(...)
Run 25/25: warmup (1): 26.8 ms; samples (3): 26.7 ms, 26.7 ms, 26.7 ms

Average: 26.7 ms +- 0.2 ms (25 runs x 3 samples; 1 warmup)
---

Note: benchmarks can be analyzed with Python 2.

I'm posting my email to python-dev because providing timeit results is
commonly requested in review of optimization patches.

The next step is to patch the CPython benchmark suite to use the perf
module. I already forked the repository and started to patch some
benchmarks.

If you are interested by Python performance in general, please join us
on the speed mailing list!
https://mail.python.org/mailman/listinfo/speed

Victor
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/g.rodola%40gmail.com

This is very interesting and also somewhat related to psutil. I wonder... would increasing process priority help isolating benchmarks even more? By this I mean "os.nice(-20)".
Extra: perhaps even IO priority: https://pythonhosted.org/psutil/#psutil.Process.ionice ?


--