2016-05-17 23:11 GMT+02:00 Victor Stinner email@example.com:
(*) System load => CPU isolation, disable ASLR, set CPU affinity on IRQs, etc. work around this issue -- http://haypo-notes.readthedocs.io/microbenchmark.html
(*) Locale, size of the command line and/or the current working directory => WTF?! (...) => My bet is that the locale, current working directory, command line, etc. impact how the heap memory is allocated, and this specific benchmark depends on the locality of memory allocated on the heap... (...)
I tried to find a tool to "randomize" memory allocations, but I failed to find a popular and simple tool. I found the following tool, but it seems overkill and not realistic to me: https://emeryberger.com/research/stabilizer/
This tool randomizes everything and "re-randomize" the code at runtime, every 500 ms. IMHO it's not realistic because PGO+LTO use a specific link order to group "hot code" to make hot functions close.
It seems like (enabling) ASLR "hides" the effects of the comand line, current working directory, environment variables, etc. Using ASLR + statistics (compute mean + standard deviation, use multiple processes to get a better distribution) fixes my issue.
Slowly, I understand better why using the minimum and disabling legit sources of randomness is wrong. I mean that slowly I'm able to explain why :-) It looks like disabling ASLR and focusing on the minimum timing is just wrong.
I'm surprised because disabling ASLR is a common practice in benchmarking. For example, on this mailing list, 2 months ago, Alecsandru Patrascu from Intel suggested to disable ASLR: https://mail.python.org/pipermail/speed/2016-February/000289.html
(and also to disable Turbo, Hyper Threading and use a fixed CPU frequency which are good advices ;-))
By the way, I'm interested to know how the server running speed.python.org is tuned: CPU tuning, OS tuning, etc. For example, Zachary Ware wrote that perf.py was not run with --rigorous when he launched the website.
I will probably write a blog post to explain my issues with benchmarks. Later, I will propose more concrete changes to perf.py and write doc explaining how perf.py should be used (give advices how to get reliable results).