[Python-ideas] Exposing flat bytecode representation to optimizers

Fri Feb 5 16:25:29 EST 2016

On Friday, February 5, 2016 1:10 PM, MRAB <python at mrabarnett.plus.com> wrote:

> On 2016-02-05 19:58:58, "Andrew Barnert via Python-ideas" 
> <python-ideas at python.org> wrote:
> 
>> The biggest pain with dealing with the peephole optimizer is that it 
>> happens after all the complicated flattening and fixup[^1] the compiler 
>> does, which means you have to hack up the jump targets as you go along. 
>> The future bytecode optimizers that PEP 511 enables will have the same 
>> headache.
>> 
>> But this isn't actually necessary. The optimizer could work on a flat 
>> array of instructions[^2] instead of an array of bytes, with 
>> relocatable jump targets instead of fixed byte offsets, and then the 
>> compiler could do the fixup _after_ the optimization.[^3]
> What I've done in similar situations is have jumps refer to labels, 
> which are also instructions in the code.
> 
> There would then be a pass to copy the code, recording the position of 
> each label, but not copying it, followed by a pass to change the jumps 
> to refer to their target's position. (The jumps were fixed-size, 
> though.)

Yes, that's basically what byteplay does, and effectively what the compiler is doing by pointing at blocks instead of instructions (because Python has no way to jump to anything that isn't the start or end of a block). But that still doesn't make the fixups trivial, unless either (a) you never insert, delete, or move code so the jumps never change size, or (b) you're willing to punt and revert optimizing any time you push an arg into EXTENDED_ARG territory (which is reasonable for experimentation, but not for something in the core, or that you intend to share and deploy widely).

Making each optimizer do the same work seems kind of silly. Many of them will punt on uncommon (but not _that_ rare) cases--or, worse, will get those cases wrong and never test them. Plus, with PEP 511, it means if you have 6 optimizers plugged in, you're doing these extra passes 6 times instead of once.

We _could_ expose a higher-level label-based representation to the optimizers rather than making them all do it themselves. However, I think the unpack/pack that Serhiy suggested is simpler, and good enough. You _do_ still need fixups if you insert or move code (but not if you delete code, since his suggested pack does NOP removal), but those fixups will be so easy that you won't need labels unless you're doing some really complicated reordering.