[Speed] Disable hash randomization to get reliable benchmarks

Victor Stinner victor.stinner at gmail.com
Tue Apr 26 05:46:49 EDT 2016


Hi,

2016-04-26 10:56 GMT+02:00 Armin Rigo <arigo at tunes.org>:
> Hi,
>
> On 25 April 2016 at 08:25, Maciej Fijalkowski <fijall at gmail.com> wrote:
>> The problem with disabled ASLR is that you change the measurment from
>> a statistical distribution, to one draw from a statistical
>> distribution repeatedly. There is no going around doing multiple runs
>> and doing an average on that.
>
> You should mention that it is usually enough to do the following:
> instead of running once with PYTHONHASHSEED=0, run five or ten times
> with PYTHONHASHSEED in range(5 or 10).  In this way, you get all
> benefits: not-too-long benchmarking, no randomness, but still some
> statistically relevant sampling.

I guess that the number of required runs to get a nice distribution
depends on the size of the largest dictionary in the benchmark. I
mean, the dictionaries that matter in performance.

The best would be to handle this transparently in perf.py. Either
disable all source of randomness, or run mutliple processes to have an
uniform distribution, rather than on only having one sample for one
specific config. Maybe it could be an option: by default, run multiple
processes, but have an option to only run one process using
PYTHONHASHSEED=0.

By the way, timeit has a very similar issue. I'm quite sure that most
Python developers run "python -m timeit ..." at least 3 times and take
the minimum. "python -m timeit" could maybe be modified to also spawn
child processes to get a better distribution, and maybe also modified
to display the minimum, the average and the standard deviation? (not
only the minimum)

Well, the question is also if it's a good thing to have such really
tiny microbenchmark like bm_call_simple in the Python benchmark suite.
I spend 2 or 3 days to analyze CPython running bm_call_simple with
Linux perf tool, callgrind and cachegrind. I'm still unable to
understand the link between my changes on the C code and the result.
IMHO this specific benchmark depends on very low-level things like the
CPU L1 cache.  Maybe bm_call_simple helps in some very specific use
cases, like trying to make Python function calls faster. But in other
cases, it can be a source of noise, confusion and frustration...

Victor


More information about the Speed mailing list