[Speed] New benchmarks results on speed.python.org

Fri Nov 4 08:12:42 EDT 2016

Hi,

Good news, I regenerated all benchmark results of CPython using the
latest versions of perf and perfomance and the results look much more
reliable. Sadly, I didn't kept a screenshot of old benchmarks, so you
should trust me, I cannot show you the old unstable timeline.

--

I regenerated all benchmark results of speed.python.org using
performance 0.3.2. I now have an (almost) fully automated script to
run benchmarks (compile python, run benchmarks, etc.) using a list of
Python revisions and/or branches. Only the last step, upload the JSON,
is still manual, but it's nothing to automate this part ;-)

   https://github.com/python/performance/tree/master/scripts

Python is compiled using LTO, but not PGO. The compilation with PGO
fails with an internal GCC bug, speed-python uses Ubuntu 14.04, the
GCC bug seems to be known (and fixed upstream...).

Because of various bugs (including a bug in the Linux kernel ;-)
NOHZ_FULL+intel_pstate), I didn't have time to analyze the impact of
compilation options (-O2, -O3, LTO, PGO, etc.) on the stability of
benchmark results.

I isolated all CPUs of the NUMA node 1 (the CPU has two NUMA nodes): I
added the following parameters to the the Linux kernel command line of
the speed-python server:

   isolcpus=1,3,5,7,9,11,13,15,17,19,21,23
rcu_nocbs=1,3,5,7,9,11,13,15,17,19,21,23

Before running the benchmarks, I used the "python3 -m perf system
tune" command (of the development version of perf) to tune the server.
Results of the tuning:
-------------------------
$ sudo python3 -m perf system
System state
============

ASLR: Full randomization
Linux scheduler: Isolated CPUs (12/24): 1,3,5,7,9,11,13,15,17,19,21,23
Linux scheduler: RCU disabled on CPUs (12/24): 1,3,5,7,9,11,13,15,17,19,21,23
CPU Frequency: 0,2,4,6,8,10,12,14,16,18,20,22=min=1600 MHz, max=3333
MHz; 1,3,5,7,9,11,13,15,17,19,21,23=min=max=3333 MHz
Turbo Boost (MSR): CPU 0,2,4,6,8,10,12,14,16,18,20,22: enabled, CPU
1,3,5,7,9,11,13,15,17,19,21,23: disabled
IRQ affinity: irqbalance service: inactive
IRQ affinity: Default IRQ affinity: CPU 0,2,4,6,8,10,12,14,16,18,20,22
IRQ affinity: IRQ affinity: 0,2=0-23,
1,3-15,17,20,22-23,67-82=0,2,4,6,8,10,12,14,16,18,20,22
-------------------------

I don't well yet the hardware of the speed-python server. The CPU is a
"Intel(R) Xeon(R) CPU X5680  @ 3.33GHz":

* I only disabled Turbo Boost on the CPUs used to run benchmarks.
Maybe I should disable Turbo Boost on all CPUs? On my computers using
intel_pstate, Turbo Boost is disabled globally (for all CPUs) using an
option of the intel_pstate driver.

* I didn't tune the CPU scaling governor yet: all CPUs use "ondemand"

* Maybe I should use a fixed CPU frequency on all CPUs and use the
"userland" scaling governor?

Results seem more stable, but it's still not perfect yet (see below).
See [Timeline] (x) Display all in grid:

https://speed.python.org/timeline/#/?exe=4&ben=grid&env=1&revs=50&equid=off&quarts=on&extr=on

There are still some hiccups:

(*) call_method: temporary peak of 29 ms for October 19, whereas all
other revisions are around 17 ms:

https://speed.python.org/timeline/#/?exe=4&ben=call_method&env=1&revs=50&equid=off&quarts=on&extr=on

(*) python_startup increased from 21 ms to 27.5 ms between Sept 9 and
Sept 15... The problem is that this one is not a temporary hiccup, but
seems like a real performance regression: there are 4 points at 21 ms
(Sept 4-Sept 9) and 7 points at 27.5 ms (Sept 15-Nov 3). But I was
unable yet to reproduce the slowndown on my laptop.

https://speed.python.org/timeline/#/?exe=4&ben=python_startup&env=1&revs=50&equid=off&quarts=on&extr=on

Victor