[Python-Dev] Possible performance regression

Neil Schemenauer nas-python at python.ca
Tue Feb 26 17:28:14 EST 2019

On 2019-02-26, Raymond Hettinger wrote:
> That said, I'm only observing the effect when building with the
> Mac default Clang (Apple LLVM version 10.0.0 (clang-1000.11.45.5).
> When building GCC 8.3.0, there is no change in performance.

My guess is that the code in _PyEval_EvalFrameDefault() got changed
enough that Clang started emitting a bit different machine code.  If
the conditional jumps are a bit different, I understand that could
have a significant difference on performance.

Are you compiling with --enable-optimizations (i.e. PGO)?  In my
experience, that is needed to get meaningful results.  Victor also
mentions that on his "how-to-get-stable-benchmarks" page.  Building
with PGO is really (really) slow so I supect you are not doing it
when bisecting.  You can speed it up greatly by using a simpler
command for PROFILE_TASK in Makefile.pre.in.  E.g.


Now that you have narrowed it down to a single commit, it would be
worth doing the comparison with PGO builds (assuming Clang supports

> That said, it seems to be compiler specific and only affects the
> Mac builds, so maybe we can decide that we don't care.

I think the key question is if the ceval loop got a bit slower due
to logic changes or if Clang just happened to generate a bit worse
code due to source code details.  A PGO build could help answer
that.  I suppose trying to compare machine code is going to produce
too large of a diff.

Could you try hoisting the eval_breaker expression, as suggested by


If you think a slowdown affects most opcodes, I think the DISPATCH
change looks like the only cause.  Maybe I missed something though.

Also, maybe there would be some value in marking key branches as
likely/unlikely if it helps Clang generate better machine code.
Then, even if you compile without PGO (as many people do), you still
get the better machine code.



More information about the Python-Dev mailing list