[New-bugs-announce] [issue26275] perf.py: bm_regex_v8 doesn't seem reliable even with --rigorous

Wed Feb 3 05:38:49 EST 2016

New submission from STINNER Victor:

Hi, I'm working on some optimizations projects like FAT Python (PEP 509: issue #26058, PEP 510: issue #26098, and PEP 511: issue #26145) and faster memory allocators (issue #26249).

I have the *feeling* that perf.py output is not reliable even if it takes more than 20 minutes :-/ Maybe because Yury told that I must use -r (--rigorous) :-)

Example with 5 runs of "python3 perf.py ../default/python ../default/python.orig -b regex_v8":

---------------
Report on Linux smithers 4.3.3-300.fc23.x86_64 #1 SMP Tue Jan 5 23:31:01 UTC 2016 x86_64 x86_64
Total CPU cores: 8

### regex_v8 ###
Min: 0.043237 -> 0.050196: 1.16x slower
Avg: 0.043714 -> 0.050574: 1.16x slower
Significant (t=-19.83)
Stddev: 0.00171 -> 0.00174: 1.0178x larger

### regex_v8 ###
Min: 0.042774 -> 0.051420: 1.20x slower
Avg: 0.043843 -> 0.051874: 1.18x slower
Significant (t=-14.46)
Stddev: 0.00351 -> 0.00176: 2.0009x smaller

### regex_v8 ###
Min: 0.042673 -> 0.048870: 1.15x slower
Avg: 0.043726 -> 0.050474: 1.15x slower
Significant (t=-8.74)
Stddev: 0.00283 -> 0.00467: 1.6513x larger

### regex_v8 ###
Min: 0.044029 -> 0.049445: 1.12x slower
Avg: 0.044564 -> 0.049971: 1.12x slower
Significant (t=-13.97)
Stddev: 0.00175 -> 0.00211: 1.2073x larger

### regex_v8 ###
Min: 0.042692 -> 0.049084: 1.15x slower
Avg: 0.044295 -> 0.050725: 1.15x slower
Significant (t=-7.00)
Stddev: 0.00421 -> 0.00494: 1.1745x larger
---------------

I'm only care of the "Min", IMHO it's the most interesting information here.

The slowdown is betwen 12% and 20%, for me it's a big difference.

It looks like some benchmarks have very short iterations compare to others. For example, bm_json_v2 takes around 3 seconds, whereas bm_regex_v8 only takes less than 0.050 second (50 ms).

$ python3 performance/bm_json_v2.py -n 3 --timer perf_counter
3.310384973010514
3.3116717970115133
3.3077902760123834

$ python3 performance/bm_regex_v8.py -n 3 --timer perf_counter
0.0670697659952566
0.04515827298746444
0.045114840992027894

Do you think that bm_regex_v8 is reliable? I see that there is an "iteration scaling" to use run the benchmarks with more iterations. Maybe we can start to increase the "iteration scaling" for bm_regex_v8?

Instead of a fixed number of iterations, we should redesign benchmarks to use time. For example, one iteration must take at least 100 ms and should not take more than 1 second (but take longer to get more reliable results). Then the benchmark is responsible to ajust internal parameters.

I used this design for my "benchmark.py" script which is written to get "reliable" microbenchmarks:
https://bitbucket.org/haypo/misc/src/tip/python/benchmark.py?fileviewer=file-view-default

The script is based on time and calibrate a benchmark. It also uses the *effictive* resolution of the clock used by the benchmark to calibrate the benchmark.

I will maybe work on such patch, but it would be good to know first your opinion on such change.

I guess that we should use the base python to calibrate the benchmark and then pass the same parameters to the modified python.

----------
components: Benchmarks
messages: 259469
nosy: brett.cannon, haypo, pitrou, yselivanov
priority: normal
severity: normal
status: open
title: perf.py: bm_regex_v8 doesn't seem reliable even with --rigorous
type: performance
versions: Python 3.6

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue26275>
_______________________________________