
Hi, I'm working on speed.python.org, CPython benchmark. I reworked the benchmark suite which is now called "performance": http://pyperformance.readthedocs.io/ performance contains 54 benchmarks and works on Python 2.7 and 3.x. It creates a virtual environment with pinned versions of requirements to "isolate" the benchmark from the system, to get more reproductible results. I added a few benchmarks from the PyPy benchmark suite but I didn't add all of them yet. performance is now based on my perf module. The perf module is a toolkit to run, analyze and compare benchmarks: http://perf.readthedocs.io/ I would like to know how to adapt perf and performance to handle correctly PyPy JIT compiler: I would like to measure the performance when code has been optimized by the JIT compiler and ignore the warmup phase. I already made a few changes in perf and performance when a JIT is detected, but I'm not sure that I did them correctly. My final goal would be to have PyPy benchmark results on speed.python.org, to easily compare CPython and PyPy (using the same benchmark runner, same physical server). The perf module calibrates a benchmark based on time: it computes the number of outer loops to get a timing of at least 100 ms. Basically, a single value is computed as: t0 = perf.perf_counter() for _ in range(loops): func() value = perf.perf_counter() - t0 perf spawn a process only to calibrate the benchmark. On PyPy, it now (in the master branch) spawns a second process only computing warmup samples to validate the calibration. If a value becomes less than 100 ms, it doubles each time the number of loops. The opereation is repeated until the number of loops doesn't change. After the calibration, perf spawns worker processes sequentially: each worker computes warmup samples and then compute values. By default, each worker computes 1 warmup sample and 3 samples on CPython, and 10 warmup samples an 10 samples on PyPy. The configuration for PyPy is kind of arbitrary, wheras it was finely tuned for CPython. At the end, perf ignores all warmup samples and only computes the mean and standard deviations of other values. For example, on CPython 21 processes are spawned: 1 calibration + 20 workers, each worker computes 1 warmup + 3 values: compute the mean of 60 values. perf stores all data in a JSON file: metadata (hostname, CPU speed, system load, etc.), number of loops, warmup samples, samples, etc. It provides an API to access all data. perf also contains a lot of tools to analyze data: statistics (min, max, median/MAD, percentiles, ...), render an histogram, compare results and check if the difference is significant, detect unstable benchmark, etc. perf also contains a documentation explaining how to: run benchmark, analyze benchmarks, get stable/reproductible results, tune your system to run a benchmark, etc. To tune your system for benchmarks, run the "sudo python3 -m perf system tune" command. It configures the CPU (disable Turbo Boost, set a fixed frequency, ...), check that the power cable is plugged, set CPU affinity on IRQs, disable Linux perf events, etc. The command reduces the operating system jitter. Victor

Ok, let's be more concrete: I ran benchmarks with PyPy2 v5.7.1 on the speed-python server. See attached pypy.json.gz file. "perf check pypy.json.gz" detected the 10 benchmarks as "unstable". Let's use pathlib as an example (Mean +- std dev: 28.2 ms +- 5.4 ms). If you look closer, I confirm that performance is unstable. It seems like the distribution is multi-modal with 3 ranges around: 21.7 ms, 27.4 ms and 33.1 ms. It's very hard to summarize such distribution with a single mean or even median value. I should now check if pathlib becomes more stable if it runs longer. perf 1.0 now displays results using mean and standard deviation. See perf doc for the rationale: http://perf.readthedocs.io/en/latest/analyze.html#statistics logging_silent -------------- WARNING: the benchmark result may be unstable * the shortest raw value is only 9.78 us pathlib ------- WARNING: the benchmark result may be unstable * the standard deviation (5.42 ms) is 19% of the mean (28.2 ms) regex_compile ------------- WARNING: the benchmark result may be unstable * the standard deviation (14.0 ms) is 12% of the mean (120 ms) scimark_sparse_mat_mult ----------------------- WARNING: the benchmark result may be unstable * the standard deviation (19.9 us) is 11% of the mean (188 us) spambayes --------- WARNING: the benchmark result may be unstable * the standard deviation (16.4 ms) is 19% of the mean (85.2 ms) * the maximum (133 ms) is 56% greater than the mean (85.2 ms) sqlalchemy_imperative --------------------- WARNING: the benchmark result may be unstable * the standard deviation (51.5 ms) is 39% of the mean (134 ms) * the minimum (42.7 ms) is 68% smaller than the mean (134 ms) * the maximum (267 ms) is 100% greater than the mean (134 ms) sympy_integrate --------------- WARNING: the benchmark result may be unstable * the standard deviation (21.6 ms) is 14% of the mean (150 ms) sympy_sum --------- WARNING: the benchmark result may be unstable * the standard deviation (19.5 ms) is 13% of the mean (151 ms) sympy_str --------- WARNING: the benchmark result may be unstable * the standard deviation (23.4 ms) is 13% of the mean (174 ms) xml_etree_process ----------------- WARNING: the benchmark result may be unstable * the standard deviation (7.64 ms) is 12% of the mean (62.9 ms) haypo@selma$ python3 -m perf stats pypy.json.gz -b pathlib -q Total duration: 32.8 sec Start date: 2017-04-05 21:16:25 End date: 2017-04-05 21:17:08 Raw value minimum: 169 ms Raw value maximum: 284 ms Number of runs: 9 Total number of values: 60 Number of values per run: 10 Number of warmups per run: 10 Loop iterations per value: 8 Minimum: 21.2 ms Median +- MAD: 29.9 ms +- 4.0 ms Mean +- std dev: 28.2 ms +- 5.4 ms Maximum: 35.4 ms 0th percentile: 21.2 ms (-25% of the mean) -- minimum 5th percentile: 21.4 ms (-24% of the mean) 25th percentile: 22.2 ms (-21% of the mean) 50th percentile: 29.9 ms (+6% of the mean) -- median 75th percentile: 33.3 ms (+18% of the mean) 95th percentile: 34.1 ms (+21% of the mean) 100th percentile: 35.4 ms (+26% of the mean) -- maximum haypo@selma$ python3 -m perf hist pypy.json.gz -b pathlib -q 21.1 ms: 7 ########################### 21.7 ms: 12 ############################################### 22.3 ms: 2 ######## 22.8 ms: 3 ############ 23.4 ms: 0 | 24.0 ms: 0 | 24.5 ms: 0 | 25.1 ms: 0 | 25.7 ms: 0 | 26.3 ms: 1 #### 26.8 ms: 1 #### 27.4 ms: 4 ################ 28.0 ms: 0 | 28.5 ms: 0 | 29.1 ms: 0 | 29.7 ms: 0 | 30.2 ms: 0 | 30.8 ms: 0 | 31.4 ms: 1 #### 32.0 ms: 4 ################ 32.5 ms: 5 #################### 33.1 ms: 13 ################################################### 33.7 ms: 5 #################### 34.2 ms: 1 #### 34.8 ms: 0 | 35.4 ms: 1 #### $ python3 -m perf dump pypy.json.gz -b pathlib -q Run 4: values (10): 33.1 ms (+17%), 21.8 ms (-23%), 33.5 ms (+19%), 22.2 ms (-21%), 33.5 ms (+19%), 27.8 ms, 22.9 ms (-19%), 32.4 ms (+15%), 21.4 ms (-24%), 33.1 ms (+17%) Run 5: values (10): 33.1 ms (+17%), 21.7 ms (-23%), 33.4 ms (+18%), 22.2 ms (-21%), 33.5 ms (+19%), 27.8 ms, 22.7 ms (-20%), 32.2 ms (+14%), 21.3 ms (-24%), 33.3 ms (+18%) Run 6: values (10): 33.0 ms (+17%), 21.8 ms (-23%), 33.5 ms (+19%), 22.2 ms (-21%), 33.9 ms (+20%), 27.4 ms, 32.0 ms (+13%), 23.0 ms (-19%), 21.3 ms (-25%), 34.1 ms (+21%) Run 7: values (10): 32.9 ms (+17%), 21.8 ms (-23%), 33.4 ms (+18%), 22.2 ms (-21%), 34.0 ms (+20%), 27.5 ms, 32.0 ms (+13%), 22.9 ms (-19%), 21.5 ms (-24%), 34.0 ms (+21%) Run 8: values (10): 33.0 ms (+17%), 21.9 ms (-22%), 34.2 ms (+21%), 22.2 ms (-21%), 35.4 ms (+26%), 26.7 ms (-5%), 33.1 ms (+17%), 22.2 ms (-21%), 21.5 ms (-24%), 34.6 ms (+23%) Run 9: values (10): 33.1 ms (+17%), 21.7 ms (-23%), 33.3 ms (+18%), 22.0 ms (-22%), 33.5 ms (+19%), 27.8 ms, 22.6 ms (-20%), 32.3 ms (+15%), 21.2 ms (-25%), 33.2 ms (+18%) haypo@selma$ python3 -m perf dump pypy.json.gz -b pathlib Run 1: calibrate - 1 loop: 135 ms (raw: 135 ms) Run 2: calibrate - 1 loop: 136 ms (raw: 136 ms) - 1 loop: 32.2 ms (raw: 32.2 ms) - 2 loops: 48.2 ms (raw: 96.5 ms) - 4 loops: 22.6 ms (raw: 90.4 ms) - 8 loops: 26.8 ms (raw: 214 ms) - 8 loops: 22.7 ms (raw: 181 ms) - 8 loops: 29.4 ms (raw: 235 ms) - 8 loops: 22.7 ms (raw: 182 ms) - 8 loops: 34.0 ms (raw: 272 ms) - 8 loops: 22.4 ms (raw: 179 ms) - 8 loops: 21.7 ms (raw: 174 ms) - 8 loops: 33.1 ms (raw: 265 ms) - 8 loops: 21.9 ms (raw: 175 ms) Run 3: calibrate - 8 loops: 42.9 ms (raw: 343 ms) - 8 loops: 26.9 ms (raw: 215 ms) - 8 loops: 22.3 ms (raw: 179 ms) - 8 loops: 29.3 ms (raw: 235 ms) - 8 loops: 22.9 ms (raw: 183 ms) - 8 loops: 31.2 ms (raw: 250 ms) - 8 loops: 23.9 ms (raw: 191 ms) - 8 loops: 21.8 ms (raw: 174 ms) - 8 loops: 32.5 ms (raw: 260 ms) - 8 loops: 21.8 ms (raw: 174 ms) Run 4: warmups (10): 42.3 ms (+50%), 26.8 ms, 22.2 ms (-21%), 29.4 ms, 22.5 ms (-20%), 30.8 ms (+9%), 23.8 ms (-16%), 21.8 ms (-23%), 32.4 ms (+15%), 21.7 ms (-23%); values (10): 33.1 ms (+17%), 21.8 ms (-23%), 33.5 ms (+19%), 22.2 ms (-21%), 33.5 ms (+19%), 27.8 ms, 22.9 ms (-19%), 32.4 ms (+15%), 21.4 ms (-24%), 33.1 ms (+17%) Run 5: warmups (10): 42.2 ms (+50%), 26.9 ms, 22.6 ms (-20%), 29.6 ms, 22.7 ms (-19%), 31.3 ms (+11%), 23.7 ms (-16%), 22.2 ms (-21%), 32.6 ms (+16%), 21.7 ms (-23%); values (10): 33.1 ms (+17%), 21.7 ms (-23%), 33.4 ms (+18%), 22.2 ms (-21%), 33.5 ms (+19%), 27.8 ms, 22.7 ms (-20%), 32.2 ms (+14%), 21.3 ms (-24%), 33.3 ms (+18%) Run 6: warmups (10): 42.3 ms (+50%), 26.8 ms (-5%), 22.2 ms (-21%), 29.2 ms, 22.3 ms (-21%), 30.6 ms (+8%), 23.9 ms (-15%), 21.6 ms (-24%), 32.5 ms (+15%), 21.5 ms (-24%); values (10): 33.0 ms (+17%), 21.8 ms (-23%), 33.5 ms (+19%), 22.2 ms (-21%), 33.9 ms (+20%), 27.4 ms, 32.0 ms (+13%), 23.0 ms (-19%), 21.3 ms (-25%), 34.1 ms (+21%) Run 7: warmups (10): 42.5 ms (+51%), 26.8 ms, 22.2 ms (-21%), 29.3 ms, 22.7 ms (-19%), 31.1 ms (+10%), 23.8 ms (-16%), 21.7 ms (-23%), 32.3 ms (+15%), 21.6 ms (-23%); values (10): 32.9 ms (+17%), 21.8 ms (-23%), 33.4 ms (+18%), 22.2 ms (-21%), 34.0 ms (+20%), 27.5 ms, 32.0 ms (+13%), 22.9 ms (-19%), 21.5 ms (-24%), 34.0 ms (+21%) Run 8: warmups (10): 43.4 ms (+54%), 26.9 ms, 22.5 ms (-20%), 29.8 ms (+5%), 22.9 ms (-19%), 33.4 ms (+19%), 23.2 ms (-18%), 22.0 ms (-22%), 32.2 ms (+14%), 21.8 ms (-23%); values (10): 33.0 ms (+17%), 21.9 ms (-22%), 34.2 ms (+21%), 22.2 ms (-21%), 35.4 ms (+26%), 26.7 ms (-5%), 33.1 ms (+17%), 22.2 ms (-21%), 21.5 ms (-24%), 34.6 ms (+23%) Run 9: warmups (10): 42.2 ms (+50%), 27.3 ms, 22.6 ms (-20%), 29.4 ms, 22.3 ms (-21%), 30.8 ms (+9%), 23.7 ms (-16%), 21.4 ms (-24%), 32.9 ms (+16%), 21.8 ms (-23%); values (10): 33.1 ms (+17%), 21.7 ms (-23%), 33.3 ms (+18%), 22.0 ms (-22%), 33.5 ms (+19%), 27.8 ms, 22.6 ms (-20%), 32.3 ms (+15%), 21.2 ms (-25%), 33.2 ms (+18%) WARNING: the benchmark result may be unstable * the standard deviation (5.42 ms) is 19% of the mean (28.2 ms) Try to rerun the benchmark with more runs, values and/or loops. Run 'python3 -m perf system tune' command to reduce the system jitter. Use perf stats, perf dump and perf hist to analyze results. Use --quiet option to hide these warnings. Victor

Hi Victor, On 6 April 2017 at 11:43, Victor Stinner <victor.stinner@gmail.com> wrote:
Ok, let's be more concrete: I ran benchmarks with PyPy2 v5.7.1 on the speed-python server. See attached pypy.json.gz file.
Note that unless your mails contain precise questions, you're unlikely to get much answers, because we don't know about what you need feedback. In this case, I have a strong hint about that:
These numbers are too short to be precisely relevant. A major garbage collection occurs more rarely than that. So my guess is something like: you get 33ms when a major GC occurs and 21ms when it does not. Moreover, our GC is incremental, so a single major GC occurs over some non-null period of progress---possibly partly in one run and partly in the next one. (There are also minor GCs but these ones should occur every few millisecond.) A bientôt, Armin.

Ok, let's be more concrete: I ran benchmarks with PyPy2 v5.7.1 on the speed-python server. See attached pypy.json.gz file. "perf check pypy.json.gz" detected the 10 benchmarks as "unstable". Let's use pathlib as an example (Mean +- std dev: 28.2 ms +- 5.4 ms). If you look closer, I confirm that performance is unstable. It seems like the distribution is multi-modal with 3 ranges around: 21.7 ms, 27.4 ms and 33.1 ms. It's very hard to summarize such distribution with a single mean or even median value. I should now check if pathlib becomes more stable if it runs longer. perf 1.0 now displays results using mean and standard deviation. See perf doc for the rationale: http://perf.readthedocs.io/en/latest/analyze.html#statistics logging_silent -------------- WARNING: the benchmark result may be unstable * the shortest raw value is only 9.78 us pathlib ------- WARNING: the benchmark result may be unstable * the standard deviation (5.42 ms) is 19% of the mean (28.2 ms) regex_compile ------------- WARNING: the benchmark result may be unstable * the standard deviation (14.0 ms) is 12% of the mean (120 ms) scimark_sparse_mat_mult ----------------------- WARNING: the benchmark result may be unstable * the standard deviation (19.9 us) is 11% of the mean (188 us) spambayes --------- WARNING: the benchmark result may be unstable * the standard deviation (16.4 ms) is 19% of the mean (85.2 ms) * the maximum (133 ms) is 56% greater than the mean (85.2 ms) sqlalchemy_imperative --------------------- WARNING: the benchmark result may be unstable * the standard deviation (51.5 ms) is 39% of the mean (134 ms) * the minimum (42.7 ms) is 68% smaller than the mean (134 ms) * the maximum (267 ms) is 100% greater than the mean (134 ms) sympy_integrate --------------- WARNING: the benchmark result may be unstable * the standard deviation (21.6 ms) is 14% of the mean (150 ms) sympy_sum --------- WARNING: the benchmark result may be unstable * the standard deviation (19.5 ms) is 13% of the mean (151 ms) sympy_str --------- WARNING: the benchmark result may be unstable * the standard deviation (23.4 ms) is 13% of the mean (174 ms) xml_etree_process ----------------- WARNING: the benchmark result may be unstable * the standard deviation (7.64 ms) is 12% of the mean (62.9 ms) haypo@selma$ python3 -m perf stats pypy.json.gz -b pathlib -q Total duration: 32.8 sec Start date: 2017-04-05 21:16:25 End date: 2017-04-05 21:17:08 Raw value minimum: 169 ms Raw value maximum: 284 ms Number of runs: 9 Total number of values: 60 Number of values per run: 10 Number of warmups per run: 10 Loop iterations per value: 8 Minimum: 21.2 ms Median +- MAD: 29.9 ms +- 4.0 ms Mean +- std dev: 28.2 ms +- 5.4 ms Maximum: 35.4 ms 0th percentile: 21.2 ms (-25% of the mean) -- minimum 5th percentile: 21.4 ms (-24% of the mean) 25th percentile: 22.2 ms (-21% of the mean) 50th percentile: 29.9 ms (+6% of the mean) -- median 75th percentile: 33.3 ms (+18% of the mean) 95th percentile: 34.1 ms (+21% of the mean) 100th percentile: 35.4 ms (+26% of the mean) -- maximum haypo@selma$ python3 -m perf hist pypy.json.gz -b pathlib -q 21.1 ms: 7 ########################### 21.7 ms: 12 ############################################### 22.3 ms: 2 ######## 22.8 ms: 3 ############ 23.4 ms: 0 | 24.0 ms: 0 | 24.5 ms: 0 | 25.1 ms: 0 | 25.7 ms: 0 | 26.3 ms: 1 #### 26.8 ms: 1 #### 27.4 ms: 4 ################ 28.0 ms: 0 | 28.5 ms: 0 | 29.1 ms: 0 | 29.7 ms: 0 | 30.2 ms: 0 | 30.8 ms: 0 | 31.4 ms: 1 #### 32.0 ms: 4 ################ 32.5 ms: 5 #################### 33.1 ms: 13 ################################################### 33.7 ms: 5 #################### 34.2 ms: 1 #### 34.8 ms: 0 | 35.4 ms: 1 #### $ python3 -m perf dump pypy.json.gz -b pathlib -q Run 4: values (10): 33.1 ms (+17%), 21.8 ms (-23%), 33.5 ms (+19%), 22.2 ms (-21%), 33.5 ms (+19%), 27.8 ms, 22.9 ms (-19%), 32.4 ms (+15%), 21.4 ms (-24%), 33.1 ms (+17%) Run 5: values (10): 33.1 ms (+17%), 21.7 ms (-23%), 33.4 ms (+18%), 22.2 ms (-21%), 33.5 ms (+19%), 27.8 ms, 22.7 ms (-20%), 32.2 ms (+14%), 21.3 ms (-24%), 33.3 ms (+18%) Run 6: values (10): 33.0 ms (+17%), 21.8 ms (-23%), 33.5 ms (+19%), 22.2 ms (-21%), 33.9 ms (+20%), 27.4 ms, 32.0 ms (+13%), 23.0 ms (-19%), 21.3 ms (-25%), 34.1 ms (+21%) Run 7: values (10): 32.9 ms (+17%), 21.8 ms (-23%), 33.4 ms (+18%), 22.2 ms (-21%), 34.0 ms (+20%), 27.5 ms, 32.0 ms (+13%), 22.9 ms (-19%), 21.5 ms (-24%), 34.0 ms (+21%) Run 8: values (10): 33.0 ms (+17%), 21.9 ms (-22%), 34.2 ms (+21%), 22.2 ms (-21%), 35.4 ms (+26%), 26.7 ms (-5%), 33.1 ms (+17%), 22.2 ms (-21%), 21.5 ms (-24%), 34.6 ms (+23%) Run 9: values (10): 33.1 ms (+17%), 21.7 ms (-23%), 33.3 ms (+18%), 22.0 ms (-22%), 33.5 ms (+19%), 27.8 ms, 22.6 ms (-20%), 32.3 ms (+15%), 21.2 ms (-25%), 33.2 ms (+18%) haypo@selma$ python3 -m perf dump pypy.json.gz -b pathlib Run 1: calibrate - 1 loop: 135 ms (raw: 135 ms) Run 2: calibrate - 1 loop: 136 ms (raw: 136 ms) - 1 loop: 32.2 ms (raw: 32.2 ms) - 2 loops: 48.2 ms (raw: 96.5 ms) - 4 loops: 22.6 ms (raw: 90.4 ms) - 8 loops: 26.8 ms (raw: 214 ms) - 8 loops: 22.7 ms (raw: 181 ms) - 8 loops: 29.4 ms (raw: 235 ms) - 8 loops: 22.7 ms (raw: 182 ms) - 8 loops: 34.0 ms (raw: 272 ms) - 8 loops: 22.4 ms (raw: 179 ms) - 8 loops: 21.7 ms (raw: 174 ms) - 8 loops: 33.1 ms (raw: 265 ms) - 8 loops: 21.9 ms (raw: 175 ms) Run 3: calibrate - 8 loops: 42.9 ms (raw: 343 ms) - 8 loops: 26.9 ms (raw: 215 ms) - 8 loops: 22.3 ms (raw: 179 ms) - 8 loops: 29.3 ms (raw: 235 ms) - 8 loops: 22.9 ms (raw: 183 ms) - 8 loops: 31.2 ms (raw: 250 ms) - 8 loops: 23.9 ms (raw: 191 ms) - 8 loops: 21.8 ms (raw: 174 ms) - 8 loops: 32.5 ms (raw: 260 ms) - 8 loops: 21.8 ms (raw: 174 ms) Run 4: warmups (10): 42.3 ms (+50%), 26.8 ms, 22.2 ms (-21%), 29.4 ms, 22.5 ms (-20%), 30.8 ms (+9%), 23.8 ms (-16%), 21.8 ms (-23%), 32.4 ms (+15%), 21.7 ms (-23%); values (10): 33.1 ms (+17%), 21.8 ms (-23%), 33.5 ms (+19%), 22.2 ms (-21%), 33.5 ms (+19%), 27.8 ms, 22.9 ms (-19%), 32.4 ms (+15%), 21.4 ms (-24%), 33.1 ms (+17%) Run 5: warmups (10): 42.2 ms (+50%), 26.9 ms, 22.6 ms (-20%), 29.6 ms, 22.7 ms (-19%), 31.3 ms (+11%), 23.7 ms (-16%), 22.2 ms (-21%), 32.6 ms (+16%), 21.7 ms (-23%); values (10): 33.1 ms (+17%), 21.7 ms (-23%), 33.4 ms (+18%), 22.2 ms (-21%), 33.5 ms (+19%), 27.8 ms, 22.7 ms (-20%), 32.2 ms (+14%), 21.3 ms (-24%), 33.3 ms (+18%) Run 6: warmups (10): 42.3 ms (+50%), 26.8 ms (-5%), 22.2 ms (-21%), 29.2 ms, 22.3 ms (-21%), 30.6 ms (+8%), 23.9 ms (-15%), 21.6 ms (-24%), 32.5 ms (+15%), 21.5 ms (-24%); values (10): 33.0 ms (+17%), 21.8 ms (-23%), 33.5 ms (+19%), 22.2 ms (-21%), 33.9 ms (+20%), 27.4 ms, 32.0 ms (+13%), 23.0 ms (-19%), 21.3 ms (-25%), 34.1 ms (+21%) Run 7: warmups (10): 42.5 ms (+51%), 26.8 ms, 22.2 ms (-21%), 29.3 ms, 22.7 ms (-19%), 31.1 ms (+10%), 23.8 ms (-16%), 21.7 ms (-23%), 32.3 ms (+15%), 21.6 ms (-23%); values (10): 32.9 ms (+17%), 21.8 ms (-23%), 33.4 ms (+18%), 22.2 ms (-21%), 34.0 ms (+20%), 27.5 ms, 32.0 ms (+13%), 22.9 ms (-19%), 21.5 ms (-24%), 34.0 ms (+21%) Run 8: warmups (10): 43.4 ms (+54%), 26.9 ms, 22.5 ms (-20%), 29.8 ms (+5%), 22.9 ms (-19%), 33.4 ms (+19%), 23.2 ms (-18%), 22.0 ms (-22%), 32.2 ms (+14%), 21.8 ms (-23%); values (10): 33.0 ms (+17%), 21.9 ms (-22%), 34.2 ms (+21%), 22.2 ms (-21%), 35.4 ms (+26%), 26.7 ms (-5%), 33.1 ms (+17%), 22.2 ms (-21%), 21.5 ms (-24%), 34.6 ms (+23%) Run 9: warmups (10): 42.2 ms (+50%), 27.3 ms, 22.6 ms (-20%), 29.4 ms, 22.3 ms (-21%), 30.8 ms (+9%), 23.7 ms (-16%), 21.4 ms (-24%), 32.9 ms (+16%), 21.8 ms (-23%); values (10): 33.1 ms (+17%), 21.7 ms (-23%), 33.3 ms (+18%), 22.0 ms (-22%), 33.5 ms (+19%), 27.8 ms, 22.6 ms (-20%), 32.3 ms (+15%), 21.2 ms (-25%), 33.2 ms (+18%) WARNING: the benchmark result may be unstable * the standard deviation (5.42 ms) is 19% of the mean (28.2 ms) Try to rerun the benchmark with more runs, values and/or loops. Run 'python3 -m perf system tune' command to reduce the system jitter. Use perf stats, perf dump and perf hist to analyze results. Use --quiet option to hide these warnings. Victor

Hi Victor, On 6 April 2017 at 11:43, Victor Stinner <victor.stinner@gmail.com> wrote:
Ok, let's be more concrete: I ran benchmarks with PyPy2 v5.7.1 on the speed-python server. See attached pypy.json.gz file.
Note that unless your mails contain precise questions, you're unlikely to get much answers, because we don't know about what you need feedback. In this case, I have a strong hint about that:
These numbers are too short to be precisely relevant. A major garbage collection occurs more rarely than that. So my guess is something like: you get 33ms when a major GC occurs and 21ms when it does not. Moreover, our GC is incremental, so a single major GC occurs over some non-null period of progress---possibly partly in one run and partly in the next one. (There are also minor GCs but these ones should occur every few millisecond.) A bientôt, Armin.
participants (2)
-
Armin Rigo
-
Victor Stinner