[New-bugs-announce] [issue26300] "unpacked" bytecode

Andrew Barnert report at bugs.python.org
Fri Feb 5 17:17:44 EST 2016


New submission from Andrew Barnert:

Currently, the compiler starts with a list of arrays of instructions, packs them to 1/3/6-bytes-apiece bytecodes, fixes up all the jumps, and then calls PyCode_Optimize on the result. This makes the peephole optimizer much more complicated. Assuming PEP 511 is accepted, it will also make plug-in bytecode optimizers much more complicated (and probably wasteful--they'll each be repeating the same work to re-do the fixups).

The simplest alternative (as suggested by Serhiy on -ideas) is to expose an "unpacked" bytecode to the optimizer (in the code parameter and return value and lnotab_obj in-out parameter for PyCode_Optimize, and similarly for PEP 511) where each instruction takes a fixed 4 bytes. This is much easier to process. After the optimizer returns, the compiler packs opcodes into the usual 1/3/6-byte format, removing NOPs, retargeting jumps, and adjusting the lnotab as it goes. (Note that it already pretty much has code to do all of this except the NOP removal; it's just doing it before the optimizer instead of after.)

Negatives:

 * Arguments can now only go up to 2**23 instead of 2**31. I don't think that's a problem (has anyone ever created a code object with 4 million instructions?).

 * A bit more work for the compiler; we'd need to test to make sure there's no measurable performance impact.

We could also expose this functionality through C API PyCode_Pack/Unpack and Python dis.pack_code/unpack_code functions (and also make the dis module know how to parse unpacked code), which would allow import hooks, post-processing decorators, etc. to be simplified as well. This would remove some, but not all, of the need for things like byteplay. I think this may be worth doing, but I'm not sure until I see how complicated it is.

We could even allow code objects with unpacked bytecode to be executed, but I think that's unnecessary complexity. Nobody should want to do that intentionally, and if an optimizer lets such code escape by accident, a SystemError is fine.

MRAB implied an alternative: exposing some slightly-higher-level label-based format. That would be even nicer to work with. But it's also more complicated for the compiler and for the API, and I think it's already easy enough to handle jumps with fixed-width instructions.

----------
components: Interpreter Core
messages: 259693
nosy: abarnert, benjamin.peterson, georg.brandl, haypo, pitrou, serhiy.storchaka, yselivanov
priority: normal
severity: normal
status: open
title: "unpacked" bytecode
type: enhancement

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue26300>
_______________________________________


More information about the New-bugs-announce mailing list