Mailman 3 Re: [Speed] Disable hash randomization to get reliable benchmarks - Speed

newer
When CPython performance depends...

Re: [Speed] Disable hash randomization to get reliable benchmarks

older
Re: [Speed] Disable hash...

Antoine Pitrou

26 Apr 2016 26 Apr '16

4:36 p.m.

On Tue, 26 Apr 2016 18:28:32 +0200 Maciej Fijalkowski <fijall@gmail.com> wrote:

...

taking the minimum is a terrible idea anyway, none of the statistical discussion makes sense if you do that

The minimum is a reasonable metric for quick throwaway benchmarks as timeit is designed for, as it has a better hope of alleviating the impact of system load (as such throwaway benchmarks are often run on the developer's workstation).

For a persistent benchmarks suite, where we can afford longer benchmark runtimes and are able to keep system noise to a minimum, we might prefer another metric.

Regards

Antoine.

Show replies by date

Maciej Fijalkowski

26 Apr 26 Apr

5:21 p.m.

New subject: Disable hash randomization to get reliable benchmarks

On Tue, Apr 26, 2016 at 6:36 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:

...

On Tue, 26 Apr 2016 18:28:32 +0200 Maciej Fijalkowski <fijall@gmail.com> wrote:

...
taking the minimum is a terrible idea anyway, none of the statistical discussion makes sense if you do that

The minimum is a reasonable metric for quick throwaway benchmarks as timeit is designed for, as it has a better hope of alleviating the impact of system load (as such throwaway benchmarks are often run on the developer's workstation).

For a persistent benchmarks suite, where we can afford longer benchmark runtimes and are able to keep system noise to a minimum, we might prefer another metric.

Regards

Antoine.

No, it's not Antoine. Minimum is not better than one random measurment.

We had this discussion before, but you guys are happily dismissing all the papers written on the subject. It *does* get rid of random system stuff, but it *also* does get rid of all the effects related to gc/malloc/caches and infinite details that are not working in the same predictable fashion.

Victor Stinner

7:11 p.m.

New subject: Disable hash randomization to get reliable benchmarks

2016-04-26 18:36 GMT+02:00 Antoine Pitrou <solipsis@pitrou.net>:

...

The minimum is a reasonable metric for quick throwaway benchmarks as timeit is designed for, as it has a better hope of alleviating the impact of system load (as such throwaway benchmarks are often run on the developer's workstation).

IMHO we must at least display the standard deviation. Maybe we can do better and provide 4 numbers:

Average
Standard deviation
Minimum
Maximum

The maximum helps to detect rare events like Maciej said (something in the OS, GC collection, etc.).

For example, we can use this format:

Average: 293.5 ms +/- 143.2 ms (min: 213.9 ms, max: 629.7 ms)

It's the result of still the same microbenchmark, bm_call_simple.py, run on my laptop. As you can see, there is a large deviation: 143 ms / 293 ms is 49%, the benchmark is unstable. Maybe we should say explicitly that the result is not significant? Example:

Average: 293.5 ms +/- 143.2 ms (min: 213.9 ms, max: 629.7 ms) -- not significant The benchmark is unstable, maybe the system is heavily loaded?

By the way, "293.5 ms +/- 143.2 ms" is misleading. Maybe we should display it as "0.3 sec +/- 0.1 sec" to not show inaccurate digits?

Another example, same laptop but using CPU isolation:

Average: 219.5 ms +/- 1.6 ms (min: 215.9 ms, max: 223.8 ms)

In this example, we can see that "+/- 1.6" is is the standard deviation, it's unrelated to minimum and maximum.

Victor

3106

Age (days ago)

3106

Last active (days ago)

List overview

Download

2 comments

3 participants

participants (3)

Antoine Pitrou
Maciej Fijalkowski
Victor Stinner

Re: [Speed] Disable hash randomization to get reliable benchmarks

Antoine Pitrou

Maciej Fijalkowski

Victor Stinner

tags

participants (3)