[Python-ideas] Exposing flat bytecode representation to optimizers
MRAB
python at mrabarnett.plus.com
Fri Feb 5 16:10:11 EST 2016
On 2016-02-05 19:58:58, "Andrew Barnert via Python-ideas"
<python-ideas at python.org> wrote:
>The biggest pain with dealing with the peephole optimizer is that it
>happens after all the complicated flattening and fixup[^1] the compiler
>does, which means you have to hack up the jump targets as you go along.
>The future bytecode optimizers that PEP 511 enables will have the same
>headache.
>
>But this isn't actually necessary. The optimizer could work on a flat
>array of instructions[^2] instead of an array of bytes, with
>relocatable jump targets instead of fixed byte offsets, and then the
>compiler could do the fixup _after_ the optimization.[^3]
>
>It would break the optimizer APIs, but `PyCode_Optimize` isn't public,
>and the API proposed by PEP 511 is public, but that PEP isn't even
>finalized, much less accepted yet.
>
>
>I don't think we need to expose the intermediate representation any
>farther along than the `PyCode_Optimize` step.[^4] Just moving the
>optimize one step earlier in the chain solves more than enough to be
>worth it.
>
>
> [^1]: If you think numbering the offsets and emitting the jump
>targets is easy: Every time you fix up a jump, that may require adding
>an `EXTENDED_ARG`, which means you have to redo any fixups that cross
>the the current instruction. The compiler does this by just looping
>until it hasn't added any more `EXTENDED_ARG`s.
>
> [^2]: In case anyone's thinking that wordcode would solve this
>problem, it doesn't. The `EXTENDED_ARG` jump targets are a much bigger
>hassle than the 1-or-3-byte-ops, and wordcode not only can't eliminate
>those, but makes `EXTENDED_ARG` more common.
>
> [^3]: The compiler doesn't actually have exactly what the optimizers
>would want, but it's pretty close: it has a linked list of block
>objects, each of which has an array of instruction objects, with jump
>targets being pointers to blocks. That's actually even better to work
>with, but too complicated to expose to optimizers. Flattening it would
>be trivial. Or, if that's too expensive, we could do something almost
>as simple and much cheaper: convert it in-line to a deque-like linked
>list of arrays, with jump targets being indices or pointers into that.
>Or we could just expose the list of blocks as-is, as an opaque thing
>with a mutable-deque-of-instructions API around it.
>
> [^4]: Later stages--import hooks, optimizing decorators, etc.--have
>the same pain as the peephole optimizer, but they also have code
>objects that contain other code objects, and they can be written in
>Python, and so on, so the same solution isn't feasible there. Of course
>we could add a function to convert bytecode back to a list of
>instructions, in some format that can be exposed to Python (in fact,
>`dis` already does 90% of that), and then another one to convert that
>back to bytecode and redo the fixup (which is basically the third-party
>`byteplay` module). But that's almost certainly overkill. (If we wanted
>that, I'd rather add `byteplay` to the stdlib, port it and `dis` to C,
>and expose a C API for them. And then we could use that for everything,
>including the peephole optimizer and PEP 511 optimizers. Which would
>all be really cool, but I think it's more work than we want to do, and
>I don't know if we'd actually want something like `byteplay` builtin
>even if it were easy...)
>
What I've done in similar situations is have jumps refer to labels,
which are also instructions in the code.
There would then be a pass to copy the code, recording the position of
each label, but not copying it, followed by a pass to change the jumps
to refer to their target's position. (The jumps were fixed-size,
though.)
More information about the Python-ideas
mailing list