Code layout matters a lot and you can get lucky or unlucky with it.  I wasn't able to make it to this talk but the slides look quite interesting:
https://llvmdevelopersmeetingbay2016.sched.org/event/8YzY/causes-of-performance-instability-due-to-code-placement-in-x86

I'm not sure how much us mere mortals can debug this sort of thing, but I know the intel folks have at one point expressed interest in making sure that Python runs quickly on their processors so they might be willing to give advice (the deck even says "if all else fails, ask Intel").

On Fri, Nov 4, 2016 at 3:35 PM, Victor Stinner <victor.stinner@gmail.com> wrote:
Hi,

I noticed a temporary performance peak in the call_method:

https://speed.python.org/timeline/#/?exe=4&ben=call_method&env=1&revs=50&equid=off&quarts=on&extr=on

The difference is major: 17 ms => 29 ms, 70% slower!

I expected a temporary issue on the server used to run benchmarks,
but... I reproduced the result on the server.

Recently, the performance of call_method() changed in CPython default
from 17 ms to 28 ms (well, the exact value is variable: 25 ms, 28 ms,
29 ms, ...) and then back to 17 ms:

(1) ce85a1f129e3: 17 ms => 83877018ef97 (Oct 18): 25 ms

https://hg.python.org/cpython/rev/83877018ef97

(2) 3e073e7b4460: 28 ms => 204a43c452cc (Oct 22): 17 ms

https://hg.python.org/cpython/rev/204a43c452cc

None of these revisions modify code used in the call_method()
benchmark, so I guess that it's yet another compiler joke.


On my laptop and my desktop PC, I'm unable to reproduce the issue: the
performance is the same (I tested ce85a1f129e3, 83877018ef97,
204a43c452cc). These PC uses Fedora 24, GCC 6.2.1. CPUs:

* laptop: Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz
* desktop: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz


The speed-python runs Ubuntu 14.04, GCC 4.8.4-2ubuntu1~14.04. CPU:
"Intel(R) Xeon(R) CPU X5680  @ 3.33GHz".


call_method() benchmark is a microbenchmark which seems to depend a
lot of very low level stuff like CPU L1 cache. Maybe the impact on the
compiler is more important on speed-python which has an older CPU,
than my more recent hardware. Maybe GCC 6.2 produces more efficient
machine code than GCC 4.8.


I expect that PGO would "fix" the call_method() performance issue, but
PGO compilation fails on Ubuntu 14.04 with a compiler error :-p A
solution would be to upgrade the OS of this server.

Victor
_______________________________________________
Speed mailing list
Speed@python.org
https://mail.python.org/mailman/listinfo/speed