[Python-Dev] Possible performance regression
vstinner at redhat.com
Tue Feb 26 18:17:33 EST 2019
PGO compilation is very slow. I tried very hard to avoid it.
I started to annotate the C code with various GCC attributes like
"inline", "always_inline", "hot", etc.. I also experimented
likely/unlikely Linux macros which use __builtin_expect(). At the
end... my efforts were worthless. I still had *major* issue (benchmark
*suddenly* 68% slower! WTF?) with code locality and I decided to give
up. You can still find some macros like _Py_HOT_FUNCTION and
_Py_NO_INLINE in Python ;-) (_Py_NO_INLINE is used to reduce stack
memory usage, that's a different story.)
My sad story with code placement:
tl; dr Use PGO.
Since that time, I removed call_method from pyperformance to fix the
root issue: don't waste your time on micro-benchmarks ;-) ... But I
kept these micro-benchmarks in a different project:
For some specific needs (take a decision on a specific optimizaton),
sometimes micro-benchmarks are still useful ;-)
Le mar. 26 févr. 2019 à 23:31, Neil Schemenauer <nas-python at python.ca> a écrit :
> On 2019-02-26, Raymond Hettinger wrote:
> > That said, I'm only observing the effect when building with the
> > Mac default Clang (Apple LLVM version 10.0.0 (clang-1000.11.45.5).
> > When building GCC 8.3.0, there is no change in performance.
> My guess is that the code in _PyEval_EvalFrameDefault() got changed
> enough that Clang started emitting a bit different machine code. If
> the conditional jumps are a bit different, I understand that could
> have a significant difference on performance.
> Are you compiling with --enable-optimizations (i.e. PGO)? In my
> experience, that is needed to get meaningful results. Victor also
> mentions that on his "how-to-get-stable-benchmarks" page. Building
> with PGO is really (really) slow so I supect you are not doing it
> when bisecting. You can speed it up greatly by using a simpler
> command for PROFILE_TASK in Makefile.pre.in. E.g.
> Now that you have narrowed it down to a single commit, it would be
> worth doing the comparison with PGO builds (assuming Clang supports
> > That said, it seems to be compiler specific and only affects the
> > Mac builds, so maybe we can decide that we don't care.
> I think the key question is if the ceval loop got a bit slower due
> to logic changes or if Clang just happened to generate a bit worse
> code due to source code details. A PGO build could help answer
> that. I suppose trying to compare machine code is going to produce
> too large of a diff.
> Could you try hoisting the eval_breaker expression, as suggested by
> If you think a slowdown affects most opcodes, I think the DISPATCH
> change looks like the only cause. Maybe I missed something though.
> Also, maybe there would be some value in marking key branches as
> likely/unlikely if it helps Clang generate better machine code.
> Then, even if you compile without PGO (as many people do), you still
> get the better machine code.
> Python-Dev mailing list
> Python-Dev at python.org
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/vstinner%40redhat.com
Night gathers, and now my watch begins. It shall not end until my death.
More information about the Python-Dev