[Python-Dev] Branch Prediction And The Performance Of Interpreters - Don't Trust Folklore

Mon Aug 10 20:57:22 CEST 2015

On Mon, Aug 10, 2015 at 4:44 PM, Larry Hastings <larry at hastings.org> wrote:
>
>
> This just went by this morning on reddit's /r/programming.  It's a paper
> that analyzed Python--among a handful of other languages--to answer the
> question "are branch predictors still that bad at the big switch statement
> approach to interpreters?"  Their conclusion: no.
>
> Our simulations [...] show that, as long as the payload in the bytecode
> remains limited and do not feature significant amount of extra indirect
> branches, then the misprediction rate on the interpreter can be even become
> insignificant (less than 0.5 MPKI).
>
> (MPKI = missed predictions per thousand instructions)
>
> Their best results were on simulated hardware with state-of-the-art
> prediction algorithms ("TAGE" and "ITTAGE"), but they also demonstrate that
> branch predictors in real hardware are getting better quickly.  When running
> the Unladen Swallow test suite on Python 3.3.2, compiled with
> USE_COMPUTED_GOTOS turned off, Intel's Nehalem experienced an average of
> 12.8 MPKI--but Sandy Bridge drops that to 3.5 MPKI, and Haswell reduces it
> further to a mere *1.4* MPKI.  (AFAICT they didn't compare against Python
> 3.3.2 using computed gotos, either in terms of MPKI or in overall
> performance.)
>
> The paper is here:
>
> https://hal.inria.fr/hal-01100647/document
>
>
> I suppose I wouldn't propose removing the labels-as-values opcode dispatch
> code yet.  But perhaps that day is in sight!
>
>
> /arry
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/fijall%40gmail.com
>

Hi Larry

Please also note that as far as I can tell this mostly applies to x86.
The ARM branch prediction is significantly dumber these days and as
long as python performance is considered on such platforms such tricks
do make the situation better. We found it out doing CPython/PyPy
comparison, where the difference PyPy vs cPython was bigger on ARM and
smaller on x86, despite our ARM assembler that we produce being less
well optimized.

Cheers,
fijal