[Speed] New benchmarks results on speed.python.org

Victor Stinner victor.stinner at gmail.com
Fri Nov 4 08:12:42 EDT 2016


Good news, I regenerated all benchmark results of CPython using the
latest versions of perf and perfomance and the results look much more
reliable. Sadly, I didn't kept a screenshot of old benchmarks, so you
should trust me, I cannot show you the old unstable timeline.


I regenerated all benchmark results of speed.python.org using
performance 0.3.2. I now have an (almost) fully automated script to
run benchmarks (compile python, run benchmarks, etc.) using a list of
Python revisions and/or branches. Only the last step, upload the JSON,
is still manual, but it's nothing to automate this part ;-)


Python is compiled using LTO, but not PGO. The compilation with PGO
fails with an internal GCC bug, speed-python uses Ubuntu 14.04, the
GCC bug seems to be known (and fixed upstream...).

Because of various bugs (including a bug in the Linux kernel ;-)
NOHZ_FULL+intel_pstate), I didn't have time to analyze the impact of
compilation options (-O2, -O3, LTO, PGO, etc.) on the stability of
benchmark results.

I isolated all CPUs of the NUMA node 1 (the CPU has two NUMA nodes): I
added the following parameters to the the Linux kernel command line of
the speed-python server:


Before running the benchmarks, I used the "python3 -m perf system
tune" command (of the development version of perf) to tune the server.
Results of the tuning:
$ sudo python3 -m perf system
System state

ASLR: Full randomization
Linux scheduler: Isolated CPUs (12/24): 1,3,5,7,9,11,13,15,17,19,21,23
Linux scheduler: RCU disabled on CPUs (12/24): 1,3,5,7,9,11,13,15,17,19,21,23
CPU Frequency: 0,2,4,6,8,10,12,14,16,18,20,22=min=1600 MHz, max=3333
MHz; 1,3,5,7,9,11,13,15,17,19,21,23=min=max=3333 MHz
Turbo Boost (MSR): CPU 0,2,4,6,8,10,12,14,16,18,20,22: enabled, CPU
1,3,5,7,9,11,13,15,17,19,21,23: disabled
IRQ affinity: irqbalance service: inactive
IRQ affinity: Default IRQ affinity: CPU 0,2,4,6,8,10,12,14,16,18,20,22
IRQ affinity: IRQ affinity: 0,2=0-23,

I don't well yet the hardware of the speed-python server. The CPU is a
"Intel(R) Xeon(R) CPU X5680  @ 3.33GHz":

* I only disabled Turbo Boost on the CPUs used to run benchmarks.
Maybe I should disable Turbo Boost on all CPUs? On my computers using
intel_pstate, Turbo Boost is disabled globally (for all CPUs) using an
option of the intel_pstate driver.

* I didn't tune the CPU scaling governor yet: all CPUs use "ondemand"

* Maybe I should use a fixed CPU frequency on all CPUs and use the
"userland" scaling governor?

Results seem more stable, but it's still not perfect yet (see below).
See [Timeline] (x) Display all in grid:


There are still some hiccups:

(*) call_method: temporary peak of 29 ms for October 19, whereas all
other revisions are around 17 ms:


(*) python_startup increased from 21 ms to 27.5 ms between Sept 9 and
Sept 15... The problem is that this one is not a temporary hiccup, but
seems like a real performance regression: there are 4 points at 21 ms
(Sept 4-Sept 9) and 7 points at 27.5 ms (Sept 15-Nov 3). But I was
unable yet to reproduce the slowndown on my laptop.



More information about the Speed mailing list