On 2019-02-26, Raymond Hettinger wrote:
That said, I'm only observing the effect when building with the Mac default Clang (Apple LLVM version 10.0.0 (clang-1000.11.45.5). When building GCC 8.3.0, there is no change in performance.
My guess is that the code in _PyEval_EvalFrameDefault() got changed enough that Clang started emitting a bit different machine code. If the conditional jumps are a bit different, I understand that could have a significant difference on performance. Are you compiling with --enable-optimizations (i.e. PGO)? In my experience, that is needed to get meaningful results. Victor also mentions that on his "how-to-get-stable-benchmarks" page. Building with PGO is really (really) slow so I supect you are not doing it when bisecting. You can speed it up greatly by using a simpler command for PROFILE_TASK in Makefile.pre.in. E.g. PROFILE_TASK=$(srcdir)/my_benchmark.py Now that you have narrowed it down to a single commit, it would be worth doing the comparison with PGO builds (assuming Clang supports that).
That said, it seems to be compiler specific and only affects the Mac builds, so maybe we can decide that we don't care.
I think the key question is if the ceval loop got a bit slower due to logic changes or if Clang just happened to generate a bit worse code due to source code details. A PGO build could help answer that. I suppose trying to compare machine code is going to produce too large of a diff. Could you try hoisting the eval_breaker expression, as suggested by Antoine: https://discuss.python.org/t/profiling-cpython-with-perf/940/2 If you think a slowdown affects most opcodes, I think the DISPATCH change looks like the only cause. Maybe I missed something though. Also, maybe there would be some value in marking key branches as likely/unlikely if it helps Clang generate better machine code. Then, even if you compile without PGO (as many people do), you still get the better machine code. Regards, Neil