On Thu, Oct 23, 2008 at 8:13 AM, Antoine Pitrou <span dir="ltr">&lt;<a href="mailto:solipsis@pitrou.net" target="_blank">solipsis@pitrou.net</a>&gt;</span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">


Is this kind of optimization that useful on modern CPUs? It helps remove a<br>

memory access to the switch/case lookup table, which should shave off the 3 CPU<br>

cycles of latency of a modern L1 data cache, but it won&#39;t remove the branch<br>

misprediction penalty of the indirect jump itself, which is more in the order of<br>

10-20 CPU cycles depending on pipeline depth.</blockquote><div><br>I searched around for information on how threaded code interacts with branch prediction, and here&#39;s what I found.&nbsp; The short answer is that threaded code significantly improves branch prediction.<br>


<br>Any bytecode interpreter has some kind of dispatch mechanism that jumps to the next opcode handler.&nbsp; With a while(1) {switch() {} } format, there&#39;s one dispatch location in the machine code.&nbsp; Processor branch prediction has a horrible time trying to predict where that dispatch location is going to jump to.&nbsp; Here&#39;s some rough psuedo-assembly:<br>


<br>main_loop:<br>&nbsp;&nbsp;&nbsp; compute next_handler<br>&nbsp;&nbsp;&nbsp; jmp next_handler&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ; abysmal branch prediction<br><br>handler1:<br>&nbsp;&nbsp;&nbsp; ; do stuff<br>&nbsp;&nbsp;&nbsp; jmp main_loop<br><br>handler2:<br>&nbsp;&nbsp;&nbsp; ; do stuff<br>&nbsp;&nbsp;&nbsp; jmp main_loop<br>


<br>With threaded code, every handler ends with its own dispatcher, so the processor can make fine-grained predictions.&nbsp; Since each opcode has its own indirect branch instruction, the processor can track them separately and make better predictions (e.g., it can figure out that opcode X is often followed by opcode Y).<br>


<br>compute next_handler<br>

jmp next_handler&nbsp; ; executed only once<br><br>handler1:<br>&nbsp; &nbsp; ; do stuff<br>&nbsp;&nbsp;&nbsp; compute next_handler<br>

&nbsp; &nbsp; jmp next_handler<br><br>handler2:<br>&nbsp; &nbsp; ; do stuff<br>&nbsp;&nbsp;&nbsp; compute next_handler<br>

&nbsp; &nbsp; jmp next_handler <br></div></div><blockquote style="margin: 1.5em 0pt;">--<br>

Daniel Stutzbach, Ph.D.<br>

President, <a href="http://stutzbachenterprises.com">Stutzbach Enterprises, LLC</a>

</blockquote>