[Python-ideas] Standard (portable) bytecode "assembly" format

Fri Feb 26 14:21:59 EST 2016

On Feb 26, 2016, at 03:36, Nick Coghlan <ncoghlan at gmail.com> wrote:
> 
> On 26 February 2016 at 15:27, Andrew Barnert via Python-ideas
> <python-ideas at python.org> wrote:
>> Of course we already have such a format today: dis.Bytecode.
> 
> I wouldn't feel beholden to hewing too closely to the existing dis API

I don't feel too beholden to it. In fact, I only came back to it by accident. 

A few weeks ago, I proposed extending dis into something like byteplay and giving it a C API so the compiler could use it too. I decided that was a mistake, and went back to trying to come up with the simplest portable format that both the compiler and Python code could use, and came up with the iterable of tuples. And then I went and looked at what it would take to make dis support that format, and it turns out to take very little.

So, rather than building a new Python convenience module for portable assembly (which I don't think we want to do), or not having one and just making people deal with the tuples with no convenience features (which is also viable, but I think less desirable), let's just use dis.

> For manipulation though, the redundancy is a problem - you need to
> either declare some fields authoritative and implicitly derive the
> others,

The portable assembly format doesn't use Instructions, just tuples of 2 or more elements where the first three are opcode, argval, and line.

A dis.Instruction object (if we reorder its attributes and make the fact that it's a namedtuple type public) fits that, which means you can use it for convenience when it's convenient (probably leaving the other fields None), but you can also just ignore it and use a plain tuple, and often that's the simplest thing. Or, of course, you can use a library like byteplay that uses dis where it's helpful but provides whatever API it wants.

> I do think it's reasonable to seek to define a standard
> MutableBytecode format specifically to make manipulation easier

I'm not sure it is. After failing to convince Victor that byteplay really can do what he wants, I now think that it's more reasonable to define a standard MutableBytecode format only to make it easier to build third-party libraries that each make manipulation easier in different ways.

> , but I
> don't think it makes sense to couple that to PEP 511's definition of
> bytecode processing.

Remember the starting motivation. What's the biggest stumbling block to switching to wordcode? It breaks code that manipulates bytecode directly. Adding a portable, future-proof format doesn't do any good if, at the same time, we also add something that encourages new code that ignores that format and instead manipulates bytecode directly.

> The reason I feel that way is that I consider it *entirely acceptable*
> for the first generation of bytecode post-processors to be based on
> the disassemble-manipulate-reassemble model that folks already use for
> bytecode manipulating function decorators, and for doing that
> conveniently to be dependent on 3rd party libraries, at least for the
> time being.

My goal is to update byteplay to use the portable iterable-of-tuples format and lean on the built-in assembler if 3.6+. Even though plenty of things are actually simple enough to write with the iterable-of-tuples portable format, byteplay still has the advantage of (a) working back to Python 2.6, and (b) working with all of the existing code I've written for it over the last half-decade, so I will continue to use it. And I'm sure many other people will use it, or other third-party libraries.

But, if most of those libraries (and most code people write without third-party libraries) rely on the portable format, then they'll continue to work throughout the 3.7 development cycle rather than making users wait 3-24 months before upgrading to 3.7. And it will allow us to make more radical changes in 3.7. And it'll allow MicroPython to support most of those libraries despite a slightly different internal format. And so on.

> If we later settle on a standard mutable bytecode format, we may also
> decide to introduce bytecode pre-processors that accept and produce
> the pre-assembly form of the bytecode, but that's something to be done
> as a possible compile time reduction measure *after* folks have
> practical experience with the easier to define post-processing
> approach, rather than before.

If compile-time performance were the issue here, I'd agree. But it's not an issue--or, if it is, it's a distant fourth place behind resilience, portability, and simplicity. And to get resilience and portability, we have to use a resilient and portable format from the start, not bolt one on later as an option.