[Python-ideas] Exposing flat bytecode representation to optimizers

Fri Feb 5 18:58:06 EST 2016

On Feb 5, 2016, at 15:25, Yury Selivanov <yselivanov.ml at gmail.com> wrote:
> 
> 
> 
>> On 2016-02-05 6:14 PM, Andrew Barnert wrote:
>>> On Friday, February 5, 2016 2:26 PM, Yury Selivanov <yselivanov.ml at gmail.com> wrote:
>>> 
>>> I also had this idea (don't know if it's good or not):
>>> 
>>> 1. have 16 bits per opcode
>>> 2. first 8 bits encode the opcode number
>>> 3. next 7 bits encode the arg (most args don't need more than 7 bits
>>> anyways)
>>> 4. if the 16th bit is 1 then the next opcode is the EXTENDED_ARG
>> 
>> Why? The advantage of Serhiy's idea (or my original one) is that it makes it much easier to write bytecode processors (like the peephole optimizer) when everything is fixed width, because jump target and lnotab fixups become trivial. Your idea leaves jump target and lnotab fixups just as hard as they are today, so we get the cost of having to transform back and forth between two representations, without any benefit.
> 
> Yes, this was related to how Serhiy original proposal on how we can pack opcodes.  It's completely unrelated to the peephole optimizer. Sorry for the confusion.

OK, we're on the wrong thread, then, but no big deal.

>> If you were instead suggesting that as an alternative to the version of wordcode I proposed and prototyped in the other thread, to be actually used by the interpreter, then there _is_ a small benefit to your version: arguments in range [2**17, 2**23) need 4 bytes instead of 6. But I think the cost of having arguments in range [2**7, 2**8) take 4 bytes instead of 2, and the extra complexity and CPU cost in the fetch and peek at the core of the eval loop, and being a bigger change from today, make it a lot less attractive.
> 
> I'm sorry, I can't parse what you said there...

OK, let's compare my idea, from the thread you meant to reply to, and your idea. 

I use 16-bit opcodes, with 8 bits for the op and 8 for the argument. I use the existing EXTENDED_ARG mechanism to handle values that don't fit in 8 bits (each 16-bit EXTENDED_ARGS gets you another 8 bits).

You use 16-bit opcodes, with only 8 bits for the op, 7 for the argument, and 1 for a continuation bit. By using the continuation bit instead of existing EXTENDED_ARGs, you can fit 22 bits worth of argument in 4 bytes, while I can only fit 16 bits worth.

So, your idea uses fewer bytes than mine (4 instead of 6) for any argument in range [65536, 4194303]. But it uses more bytes than mine (4 instead of 2) for any argument in range [128, 255]. Although [128, 255] is a smaller range than [65536, 4194303], it's much more common in arguments, so I think your idea will save less space than mine overall.

Your idea is also more complicated conceptually, would require more changes to the interpreter (and to third-party code like byteplay or decompyle), and will probably be slower in the eval loop, which also all make it less attractive.

As for the idea Serhiy is working on, I suspect it will save more space than either yours or mine, but mine might have performance and/or simplicity benefits that make it a better choice. (Hopefully we'll both get far enough that we can prototype it and see.) He's stealing extra bits from the opcode byte to allow skipping the argument entirely in many cases (e.g., LOAD_CONST_0 would be a no-argument opcode that does the same thing as LOAD_CONST with argument 0). So many common opcodes will be only a single byte.