2016-05-18 10:45 GMT+02:00 Armin Rigo firstname.lastname@example.org:
On 17 May 2016 at 23:11, Victor Stinner email@example.com wrote:
with PYTHONHASHSEED=1 to test the same hash function. A more generic solution is to use multiple processes to test multiple hash seeds to get a better uniform distribution.
What you say in the rest of the mail just shows that this "generic solution" should be applied not only to PYTHONHASHSEED, but also to other variables that seem to introduce deterministic noise.
... or ensure that these other parameters are not changed when testing two versions of the code ;-) perf.py already starts the process with an empty environment and set PYTHONHASHSEED: the environment is fixed (constant).
I noticed the difference of performance with the environment because I failed to reproduce the benchmark (I got different numbers) when I ran again the benchmark manually.
You've just found three more: the locale, the size of the command line, and the working directory. I guess the mere size of the environment also plays a role. So I guess, ideally, you'd run a large number of times with random values in all these parameters. (In practice it might be enough to run a smaller fixed number of times with known values in the parameters.)
Right, I have to think about that, try to find a way to randomize these "parameters" (or find a way to make them constants): directories, name of the binary, etc.
As I wrote, the environment is easy to control. The working directory and the command line, it's more complex. It's convenient to be able to pass links to two different Python binaries compiled in two different directories.
FYI I'm using a "reference python" compiled in one directory, and my "patched python" in a different directory. Both are compiled using the same compiler options (I'm using -O0 for debug, -O3 for quick benchmark, -O3 with PGO and LTO for reliable benchmarks).
Another option for microbenchmarks would be to *ignore* (hide) differences smaller than +/- 10%, since such kind of benchmark depends too much on external parameters. I did that in my custom microbenchmark runner, it helps to ignore noise and focus on major speedup (or slowdown!).