[Python-ideas] More compact bytecode

Sun Feb 7 20:07:43 EST 2016

On Feb 7, 2016, at 04:53, Antoine Pitrou <solipsis at pitrou.net> wrote:
> 
> On Sat, 6 Feb 2016 16:21:11 -0800
> Andrew Barnert via Python-ideas
> <python-ideas at python.org> wrote:
>> 
>> To be honest, unlike everyone else on this thread, I'm actually more interested in the simplicity gains than the performance gains.
> 
> It is a laudable goal, but what is proposed here wouldn't simplify much
> if anything, since some opcodes need more than 8 bits of arguments
> (typically the opcodes that have two logical 8-bit arguments packed in
> the 16-bit word). So you'll still get variable-sized opcode and need
> an adequate decoding machinery for them.

With "wordcode" (at least as I've implemented it), EXTENDED_ARG code doesn't get any simpler (or more complicated)--but the main loop can just read 2 bytes (and then add in any extended arg value) instead of reading 1 or 3, which is definitely simpler.

(As implemented in my patch, it's not actually much simpler, because I use all the same macros, I just redefine them, but that's because my initial goal was to change as little as possible--which turns out to be very little. If it seems promising, we can do a slightly larger change that will simplify things more, and possible also improve performance, but I don't want to do that until I've proven it's worth trying.)

> So perhaps bytecode should be processed by the compile chain in the
> form of:
> 
> typedef struct {
>    int opcode;
>    int operand;
> } instruction_t;
> 
> instruction_t *bytecode;
> 
> And then get packed at the end, when creating the code object.

Yes, there's another thread on that, with a few different variations. This is actually independent of the wordcode change, so I'm doing it on a separate branch.

The variation I'm currently working on (based on suggestions by Serhiy) just uses a 32-bit "unpacked bytecode" format (1 byte opcode, 3 bytes arg, no EXTENDED_ARG). The compiler flattens its blocks of instruction objects into unpacked bytecode, gives that to the optimizer (currently that's the peephole optimizer, but this is also where Victor's PEP 511 bytecode transformer hooks fit in), and then packs the bytecode, removing NOPs and fixing up jump targets and lnotab. This makes the peephole optimizer a lot simpler. I've also exposed unpack and pack-and-fixup functions by C API and the dis module, which means bytecode-hacking decorators, import hooks, etc. can work on unpacked bytecode, and usually won't need a third-party module like byteplay to be readable.

The big question is whether it's acceptable to limit args to 24 bits. I don't think jumps > 4 million are an issue, but > 255 annotations might be? I'll test it with as much code as possible to see; if not, unpacking to 64 bits (whether with a struct or just an int64) is the obvious answer.

Also, it may turn out that bytecode processing is still too painful even with the "unpacked" format. In which case we'll need something different--labels, a tree of blocks similar to what the compiler already has, or something else. But I think this will be sufficient.

>> (Plus, adding up all these little gains could soon get us to the point where 3.7 finally beats 2.7 in every benchmark, instead of just most of them, which would kill off an annoying source of FUD.)
> 
> I think that FUD is very tired now and very few people pay attention to it.

Well, it spawned a thread on -ideas with dozens of replies just a week or two ago...

> (also, the breadth of new features and improvements in the 3.x line
> massively offsets small degradations on micro-benchmarks)

You don't have to convince me; as far as I'm concerned, the barrier was crossed with 3.2.