[Python-ideas] Exposing flat bytecode representation to optimizers

Fri Feb 5 21:40:59 EST 2016

On Feb 5, 2016, at 16:06, Victor Stinner <victor.stinner at gmail.com> wrote:
> 
> 2016-02-05 20:58 GMT+01:00 Andrew Barnert via Python-ideas
> <python-ideas at python.org>:
>>  [^3]: The compiler doesn't actually have exactly what the optimizers would want, but it's pretty close: it has a linked list of block objects, each of which has an array of instruction objects, with jump targets being pointers to blocks.
> 
> This thread was hijacked by discussion the bytecode bytes format. I
> was confused by the discussion on extended arguments and size of
> bytecode instructions in bytes.
> 
> Hopefully, the annoying case of "extended arguments" does not matter
> here! Compiler instructions use 32-bit signed integer (let's say
> "integers without arbitrary limit :-D"), and EXTENDED_ARG instructions
> are only emitted later when the blocks of instructions are compiled to
> effective bytecode.

That's exactly how it works today--except that the optimizer is on the wrong side of the boundary; it gets the emitted EXTENDED_ARG instructions, and had to do jump target and lnotab fixups that way.

So, either we want to move the optimizer across that boundary (meaning we have to expose some of fragile compiler internals), or we have to come up with a public representation that's easier to work on. I think Serhiy's "unpacked bytecode" may be a good enough version of the latter--and it's dead easy to build.