[Python-Dev] Wordcode: new regular bytecode using 16-bit units

Wed Apr 13 12:33:34 EDT 2016

Nice work. I think that for CPython, speed is much more important than
memory use for the code. Disk space is practically free for anything
smaller than a video. :-)

On Wed, Apr 13, 2016 at 9:24 AM, Victor Stinner
<victor.stinner at gmail.com> wrote:
> Hi,
>
> In the middle of recent discussions about Python performance, it was
> discussed to change the Python bytecode. Serhiy proposed to reuse
> MicroPython short bytecode to reduce the disk space and reduce the
> memory footprint.
>
> Demur Rumed proposes a different change to use a regular bytecode
> using 16-bit units: an instruction has always one 8-bit argument, it's
> zero if the instruction doesn't have an argument:
>
>    http://bugs.python.org/issue26647
>
> According to benchmarks, it looks faster:
>
>   http://bugs.python.org/issue26647#msg263339
>
> IMHO it's a nice enhancement: it makes the code simpler. The most
> interesting change is made in Python/ceval.c:
>
> -        if (HAS_ARG(opcode))
> -            oparg = NEXTARG();
> +        oparg = NEXTARG();
>
> This code is the very hot loop evaluating Python bytecode. I expect
> that removing a conditional branch here can reduce the CPU branch
> misprediction.
>
> I reviewed first versions of the change, and IMHO it's almost ready to
> be merged. But I would prefer to have a review from a least a second
> core reviewer.
>
> Can someone please review the change?
>
> --
>
> The side effect of wordcode is that arguments in 0..255 now uses 2
> bytes per instruction instead of 3, so it also reduce the size of
> bytecode for the most common case.
>
> Larger argument, 16-bit argument (0..65,535), now uses 4 bytes instead
> of 3. Arguments are supported up to 32-bit: 24-bit uses 3 units (6
> bytes), 32-bit uses 4 units (8 bytes). MAKE_FUNCTION uses 16-bit
> argument for keyword defaults and 24-bit argument for annotations.
> Other common instruction known to use large argument are jumps for
> bytecode longer than 256 bytes.
>
> --
>
> Right now, ceval.c still fetchs opcode and then oparg with two 8-bit
> instructions. Later, we can discuss if it would be possible to ensure
> that the bytecode is always aligned to 16-bit in memory to fetch the
> two bytes using a uint16_t* pointer.
>
> Maybe we can overallocate 1 byte in codeobject.c and align manually
> the memory block if needed. Or ceval.c should maybe copy the code if
> it's not aligned?
>
> Raymond Hettinger proposes something like that, but it looks like
> there are concerns about non-aligned memory accesses:
>
>    http://bugs.python.org/issue25823
>
> The cost of non-aligned memory accesses depends on the CPU
> architecture, but it can raise a SIGBUS on some arch (MIPS and
> SPARC?).
>
> Victor
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org

-- 
--Guido van Rossum (python.org/~guido)