What is the value of HAS_ARG going to be now?

[ERROR]: Your autotools build scripts are 200 lines longer than your program. Something’s wrong.

On Apr 13, 2016 11:26 AM, "Victor Stinner" <victor.stinner@gmail.com> wrote:

In the middle of recent discussions about Python performance, it was
discussed to change the Python bytecode. Serhiy proposed to reuse
MicroPython short bytecode to reduce the disk space and reduce the
memory footprint.

Demur Rumed proposes a different change to use a regular bytecode
using 16-bit units: an instruction has always one 8-bit argument, it's
zero if the instruction doesn't have an argument:


According to benchmarks, it looks faster:


IMHO it's a nice enhancement: it makes the code simpler. The most
interesting change is made in Python/ceval.c:

-        if (HAS_ARG(opcode))
-            oparg = NEXTARG();
+        oparg = NEXTARG();

This code is the very hot loop evaluating Python bytecode. I expect
that removing a conditional branch here can reduce the CPU branch

I reviewed first versions of the change, and IMHO it's almost ready to
be merged. But I would prefer to have a review from a least a second
core reviewer.

Can someone please review the change?


The side effect of wordcode is that arguments in 0..255 now uses 2
bytes per instruction instead of 3, so it also reduce the size of
bytecode for the most common case.

Larger argument, 16-bit argument (0..65,535), now uses 4 bytes instead
of 3. Arguments are supported up to 32-bit: 24-bit uses 3 units (6
bytes), 32-bit uses 4 units (8 bytes). MAKE_FUNCTION uses 16-bit
argument for keyword defaults and 24-bit argument for annotations.
Other common instruction known to use large argument are jumps for
bytecode longer than 256 bytes.


Right now, ceval.c still fetchs opcode and then oparg with two 8-bit
instructions. Later, we can discuss if it would be possible to ensure
that the bytecode is always aligned to 16-bit in memory to fetch the
two bytes using a uint16_t* pointer.

Maybe we can overallocate 1 byte in codeobject.c and align manually
the memory block if needed. Or ceval.c should maybe copy the code if
it's not aligned?

Raymond Hettinger proposes something like that, but it looks like
there are concerns about non-aligned memory accesses:


The cost of non-aligned memory accesses depends on the CPU
architecture, but it can raise a SIGBUS on some arch (MIPS and

Python-Dev mailing list
Unsubscribe: https://mail.python.org/mailman/options/python-dev/rymg19%40gmail.com