New benchmarks results on speed.python.org
Hi,
Good news, I regenerated all benchmark results of CPython using the latest versions of perf and perfomance and the results look much more reliable. Sadly, I didn't kept a screenshot of old benchmarks, so you should trust me, I cannot show you the old unstable timeline.
--
I regenerated all benchmark results of speed.python.org using performance 0.3.2. I now have an (almost) fully automated script to run benchmarks (compile python, run benchmarks, etc.) using a list of Python revisions and/or branches. Only the last step, upload the JSON, is still manual, but it's nothing to automate this part ;-)
https://github.com/python/performance/tree/master/scripts
Python is compiled using LTO, but not PGO. The compilation with PGO fails with an internal GCC bug, speed-python uses Ubuntu 14.04, the GCC bug seems to be known (and fixed upstream...).
Because of various bugs (including a bug in the Linux kernel ;-) NOHZ_FULL+intel_pstate), I didn't have time to analyze the impact of compilation options (-O2, -O3, LTO, PGO, etc.) on the stability of benchmark results.
I isolated all CPUs of the NUMA node 1 (the CPU has two NUMA nodes): I added the following parameters to the the Linux kernel command line of the speed-python server:
isolcpus=1,3,5,7,9,11,13,15,17,19,21,23 rcu_nocbs=1,3,5,7,9,11,13,15,17,19,21,23
Before running the benchmarks, I used the "python3 -m perf system tune" command (of the development version of perf) to tune the server. Results of the tuning:
$ sudo python3 -m perf system System state
ASLR: Full randomization Linux scheduler: Isolated CPUs (12/24): 1,3,5,7,9,11,13,15,17,19,21,23 Linux scheduler: RCU disabled on CPUs (12/24): 1,3,5,7,9,11,13,15,17,19,21,23 CPU Frequency: 0,2,4,6,8,10,12,14,16,18,20,22=min=1600 MHz, max=3333 MHz; 1,3,5,7,9,11,13,15,17,19,21,23=min=max=3333 MHz Turbo Boost (MSR): CPU 0,2,4,6,8,10,12,14,16,18,20,22: enabled, CPU 1,3,5,7,9,11,13,15,17,19,21,23: disabled IRQ affinity: irqbalance service: inactive IRQ affinity: Default IRQ affinity: CPU 0,2,4,6,8,10,12,14,16,18,20,22 IRQ affinity: IRQ affinity: 0,2=0-23, 1,3-15,17,20,22-23,67-82=0,2,4,6,8,10,12,14,16,18,20,22
I don't well yet the hardware of the speed-python server. The CPU is a "Intel(R) Xeon(R) CPU X5680 @ 3.33GHz":
I only disabled Turbo Boost on the CPUs used to run benchmarks. Maybe I should disable Turbo Boost on all CPUs? On my computers using intel_pstate, Turbo Boost is disabled globally (for all CPUs) using an option of the intel_pstate driver.
I didn't tune the CPU scaling governor yet: all CPUs use "ondemand"
Maybe I should use a fixed CPU frequency on all CPUs and use the "userland" scaling governor?
Results seem more stable, but it's still not perfect yet (see below). See [Timeline] (x) Display all in grid:
https://speed.python.org/timeline/#/?exe=4&ben=grid&env=1&revs=50&equid=off&quarts=on&extr=on
There are still some hiccups:
(*) call_method: temporary peak of 29 ms for October 19, whereas all other revisions are around 17 ms:
https://speed.python.org/timeline/#/?exe=4&ben=call_method&env=1&revs=50&equid=off&quarts=on&extr=on
(*) python_startup increased from 21 ms to 27.5 ms between Sept 9 and Sept 15... The problem is that this one is not a temporary hiccup, but seems like a real performance regression: there are 4 points at 21 ms (Sept 4-Sept 9) and 7 points at 27.5 ms (Sept 15-Nov 3). But I was unable yet to reproduce the slowndown on my laptop.
Victor
participants (1)
-
Victor Stinner