<p dir="ltr">What is the value of HAS_ARG going to be now?</p>

<p dir="ltr">--<br>

Ryan<br>

[ERROR]: Your autotools build scripts are 200 lines longer than your program. Something’s wrong.<br>

<a href="http://kirbyfan64.github.io/">http://kirbyfan64.github.io/</a></p>

<div class="gmail_quote">On Apr 13, 2016 11:26 AM, "Victor Stinner" <<a href="mailto:victor.stinner@gmail.com">victor.stinner@gmail.com</a>> wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,<br>

<br>

In the middle of recent discussions about Python performance, it was<br>

discussed to change the Python bytecode. Serhiy proposed to reuse<br>

MicroPython short bytecode to reduce the disk space and reduce the<br>

memory footprint.<br>

<br>

Demur Rumed proposes a different change to use a regular bytecode<br>

using 16-bit units: an instruction has always one 8-bit argument, it's<br>

zero if the instruction doesn't have an argument:<br>

<br>

   <a href="http://bugs.python.org/issue26647" rel="noreferrer" target="_blank">http://bugs.python.org/issue26647</a><br>

<br>

According to benchmarks, it looks faster:<br>

<br>

  <a href="http://bugs.python.org/issue26647#msg263339" rel="noreferrer" target="_blank">http://bugs.python.org/issue26647#msg263339</a><br>

<br>

IMHO it's a nice enhancement: it makes the code simpler. The most<br>

interesting change is made in Python/ceval.c:<br>

<br>

-        if (HAS_ARG(opcode))<br>

-            oparg = NEXTARG();<br>

+        oparg = NEXTARG();<br>

<br>

This code is the very hot loop evaluating Python bytecode. I expect<br>

that removing a conditional branch here can reduce the CPU branch<br>

misprediction.<br>

<br>

I reviewed first versions of the change, and IMHO it's almost ready to<br>

be merged. But I would prefer to have a review from a least a second<br>

core reviewer.<br>

<br>

Can someone please review the change?<br>

<br>

--<br>

<br>

The side effect of wordcode is that arguments in 0..255 now uses 2<br>

bytes per instruction instead of 3, so it also reduce the size of<br>

bytecode for the most common case.<br>

<br>

Larger argument, 16-bit argument (0..65,535), now uses 4 bytes instead<br>

of 3. Arguments are supported up to 32-bit: 24-bit uses 3 units (6<br>

bytes), 32-bit uses 4 units (8 bytes). MAKE_FUNCTION uses 16-bit<br>

argument for keyword defaults and 24-bit argument for annotations.<br>

Other common instruction known to use large argument are jumps for<br>

bytecode longer than 256 bytes.<br>

<br>

--<br>

<br>

Right now, ceval.c still fetchs opcode and then oparg with two 8-bit<br>

instructions. Later, we can discuss if it would be possible to ensure<br>

that the bytecode is always aligned to 16-bit in memory to fetch the<br>

two bytes using a uint16_t* pointer.<br>

<br>

Maybe we can overallocate 1 byte in codeobject.c and align manually<br>

the memory block if needed. Or ceval.c should maybe copy the code if<br>

it's not aligned?<br>

<br>

Raymond Hettinger proposes something like that, but it looks like<br>

there are concerns about non-aligned memory accesses:<br>

<br>

   <a href="http://bugs.python.org/issue25823" rel="noreferrer" target="_blank">http://bugs.python.org/issue25823</a><br>

<br>

The cost of non-aligned memory accesses depends on the CPU<br>

architecture, but it can raise a SIGBUS on some arch (MIPS and<br>

SPARC?).<br>

<br>

Victor<br>

_______________________________________________<br>

Python-Dev mailing list<br>

<a href="mailto:Python-Dev@python.org">Python-Dev@python.org</a><br>

<a href="https://mail.python.org/mailman/listinfo/python-dev" rel="noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/python-dev</a><br>

Unsubscribe: <a href="https://mail.python.org/mailman/options/python-dev/rymg19%40gmail.com" rel="noreferrer" target="_blank">https://mail.python.org/mailman/options/python-dev/rymg19%40gmail.com</a><br>

</blockquote></div>