[Python-ideas] Exposing flat bytecode representation to optimizers

Fri Feb 5 16:10:11 EST 2016

On 2016-02-05 19:58:58, "Andrew Barnert via Python-ideas" 
<python-ideas at python.org> wrote:

>The biggest pain with dealing with the peephole optimizer is that it 
>happens after all the complicated flattening and fixup[^1] the compiler 
>does, which means you have to hack up the jump targets as you go along. 
>The future bytecode optimizers that PEP 511 enables will have the same 
>headache.
>
>But this isn't actually necessary. The optimizer could work on a flat 
>array of instructions[^2] instead of an array of bytes, with 
>relocatable jump targets instead of fixed byte offsets, and then the 
>compiler could do the fixup _after_ the optimization.[^3]
>
>It would break the optimizer APIs, but `PyCode_Optimize` isn't public, 
>and the API proposed by PEP 511 is public, but that PEP isn't even 
>finalized, much less accepted yet.
>
>
>I don't think we need to expose the intermediate representation any 
>farther along than the `PyCode_Optimize` step.[^4] Just moving the 
>optimize one step earlier in the chain solves more than enough to be 
>worth it.
>
>
>   [^1]: If you think numbering the offsets and emitting the jump 
>targets is easy: Every time you fix up a jump, that may require adding 
>an `EXTENDED_ARG`, which means you have to redo any fixups that cross 
>the the current instruction. The compiler does this by just looping 
>until it hasn't added any more `EXTENDED_ARG`s.
>
>   [^2]: In case anyone's thinking that wordcode would solve this 
>problem, it doesn't. The `EXTENDED_ARG` jump targets are a much bigger 
>hassle than the 1-or-3-byte-ops, and wordcode not only can't eliminate 
>those, but makes `EXTENDED_ARG` more common.
>
>   [^3]: The compiler doesn't actually have exactly what the optimizers 
>would want, but it's pretty close: it has a linked list of block 
>objects, each of which has an array of instruction objects, with jump 
>targets being pointers to blocks. That's actually even better to work 
>with, but too complicated to expose to optimizers. Flattening it would 
>be trivial. Or, if that's too expensive, we could do something almost 
>as simple and much cheaper: convert it in-line to a deque-like linked 
>list of arrays, with jump targets being indices or pointers into that. 
>Or we could just expose the list of blocks as-is, as an opaque thing 
>with a mutable-deque-of-instructions API around it.
>
>   [^4]: Later stages--import hooks, optimizing decorators, etc.--have 
>the same pain as the peephole optimizer, but they also have code 
>objects that contain other code objects, and they can be written in 
>Python, and so on, so the same solution isn't feasible there. Of course 
>we could add a function to convert bytecode back to a list of 
>instructions, in some format that can be exposed to Python (in fact, 
>`dis` already does 90% of that), and then another one to convert that 
>back to bytecode and redo the fixup (which is basically the third-party 
>`byteplay` module). But that's almost certainly overkill. (If we wanted 
>that, I'd rather add `byteplay` to the stdlib, port it and `dis` to C, 
>and expose a C API for them. And then we could use that for everything, 
>including the peephole optimizer and PEP 511 optimizers. Which would 
>all be really cool, but I think it's more work than we want to do, and 
>I don't know if we'd actually want something like `byteplay` builtin 
>even if it were easy...)
>
What I've done in similar situations is have jumps refer to labels, 
which are also instructions in the code.

There would then be a pass to copy the code, recording the position of 
each label, but not copying it, followed by a pass to change the jumps 
to refer to their target's position. (The jumps were fixed-size, 
though.)