2016-04-13 18:24 GMT+02:00 Victor Stinner <victor.stinner@gmail.com>:
Demur Rumed proposes a different change to use a regular bytecode
using 16-bit units: an instruction has always one 8-bit argument, it's
zero if the instruction doesn't have an argument:


According to benchmarks, it looks faster:


IMHO it's a nice enhancement: it makes the code simpler. The most
interesting change is made in Python/ceval.c:

-        if (HAS_ARG(opcode))
-            oparg = NEXTARG();
+        oparg = NEXTARG();

This code is the very hot loop evaluating Python bytecode. I expect
that removing a conditional branch here can reduce the CPU branch

Correct. The old bytecode format wasn't so much predictable for the CPU.

Right now, ceval.c still fetchs opcode and then oparg with two 8-bit
instructions. Later, we can discuss if it would be possible to ensure
that the bytecode is always aligned to 16-bit in memory to fetch the
two bytes using a uint16_t* pointer.

Maybe we can overallocate 1 byte in codeobject.c and align manually
the memory block if needed. Or ceval.c should maybe copy the code if
it's not aligned?

Raymond Hettinger proposes something like that, but it looks like
there are concerns about non-aligned memory accesses:


The cost of non-aligned memory accesses depends on the CPU
architecture, but it can raise a SIGBUS on some arch (MIPS and


It should not be a problem, since every PyObject is allocated with PyAlloc (however I don't remember if it's the correct name) which AFAIK guarantees a base 8 bytes alignment.

So, it's safe to use an unsigned int for keeping/referencing a word at the time.

The only problem with such approach is related to the processor endianess, but it can be solved with proper macros (like I did with WPython).