Standard (portable) bytecode "assembly" format

On -dev (http://article.gmane.org/gmane.comp.python.devel/156543), Demur Rumed asked what it would take to get the "wordcode" patch into Python. Obviously, we need to finish it, benchmark it, etc., but on top of that, as Guido pointed out:
Greg Ewing replied:
I think he's right. Of course we already have such a format today: dis.Bytecode. But it doesn't quite solve the problem, for three reasons: * Not accessible from C. * Not mutable, and no assembler. * A few things (mainly jump arguments) are still in terms of bytecode bytes. But fix that, and we have a format that will be unchanged with wordcode, and that can work out of the box in MicroPython (which has a not-quite-CPython bytecode format), and so on. I think if we do that for 3.6, then it's plausible to consider wordcode for 3.7. And, fix it well enough, and it also solves the problem I brought up a few weeks ago (http://article.gmane.org/gmane.comp.python.ideas/38431): if PEP 511 is going to provide a builtin API for registering bytecode processors, we should make it feasible to write them. I have a somewhat complete proposal (at http://stupidpythonideas.blogspot.com/2016/02/a-standard-assembly-format-for...), but until I actually implement it, most people should only care about this summary: * Iterable of (opcode, argval [, line [, ...]]) tuples. The argval is the actual global name, constant value, etc., not the encoded index, etc. For jumps, the argval is just the target instruction itself. The existing dis.Bytecode (with a few minor changes) already fits this type--but so does, say, a list of 3-tuples, which we can much more easily build in C. * The assemble function from compile.c doesn't need that much work to convert it into a PyCode_Assemble/dis.assemble that takes such an iterable (plus optional name, filename, and first_line) and generates a code object. The compiler can then use the same function as pure Python code. And PyCode_Assemble is the only new C API function needed. * We already have a disassembler for this format in the stdlib since 3.4. It does need a few minor changes, and there are a few simple extensions that I think are worth adding (like making Bytecode a MutableSequence), but that's it. * Assuming the assembler drops NOPs, we can use NOPs as pseudo-instructions for when you want byteplay-like Label and SetLineNo. The disassembler can optionally even generate them. So, we don't need explicit pseudo-instructions. * Any higher-level representations, like a graph of blocks with edges for the jumps between them, are easy enough to build on top of the dis representation (and to flatten back into that representation), so we don't need anything more complicated in the stdlib.

2016-02-26 6:27 GMT+01:00 Andrew Barnert via Python-ideas <python-ideas@python.org>:
Of course we already have such a format today: dis.Bytecode. But it doesn't quite solve the problem, for three reasons:
* Not accessible from C.
I don't think that it's a real issue. The current trend is more to rewrite pieces of CPython in Python. importlib is a great example of that. importlib is also an interested case because it is a Python module to improts modules, but we need importlib to import importlib. Hum. Brett Canon solved this issue by compiling the Python code to a frozen module. It means that we can do something similar if we want to rewrite the peephole optimizer in Python.
* Not mutable, and no assembler.
I looked at Bytecode & Instruction objects of dis. They look nice to "read" bytecode, but not to modify bytecode. dis.Instruction is not mutable and informations are duplicated. For example, the operator is stored as name (LOAD_CONST) and code (100). Argument is stored as int (1), value ("hello") and representation ('"hello"'). It has no methods but attributes like is_jump_target. dis.Instruction doesn't seem extensible to add new features. Adding more items to such namedtuple doesn't seem like a nice API to me.
* A few things (mainly jump arguments) are still in terms of bytecode bytes.
Hum, this is a problem. The dis is already in the stdlib, you cannot modify its API in a backward incompatible way. IMHO it's safer and simpler to add something new (maybe in a new module), not modify the existing Bytecode & Instruction classes. To modify bytecode, you need a different structure. For example, jump targets must be abstracted with labels (or something else). In my bytecode module, I provide 3 different ways to expose bytecode: * ConcreteBytecode: instructions close to raw bytes structure, arguments must be integers * Bytecode: list of abstract instructions using labels * BytecodeBlocks: use blocks, a block is a list of instructions with a label, jumps point to blocks An instruction is an object which contains a line number, has methods like is_jump(). Abstract instructions can be modified (lineno, name, op, arg), they have no size. Concrete instructions have size, attributes cannot be modified. Concrete bytecode & instructions is closer to what we already have in the dis module. I'm not sure that it's useful, maybe I should keep it private. It's an intermediate format to disassemble and assemble code objects. Instr('LOAD_CONST') argument is directly the constant value, so Bytecode has no "consts" attribute. ConcreteInstr('LOAD_CONST') agument is an integer: the index to the consts list of the ConcreteBytecode. BytecodeBlocks is a "flat" control flow graph (CFG). It is required by the peephole optimizer to not modify two instructions which are part of two code paths (two blocks). Side-effect, with blocks, it's trivial to detect dead code. As you wrote, it's also possible to reorder blocks to try to avoid jumps. Note: the current peephole optimizer miss a lot of optimizations on jumps :-/ Python 3.5 is a little bit better.
And, fix it well enough, and it also solves the problem I brought up a few weeks ago (http://article.gmane.org/gmane.comp.python.ideas/38431): if PEP 511 is going to provide a builtin API for registering bytecode processors, we should make it feasible to write them.
Again, you don't need to add anything to the stdlib to write a bytecode optimizer. byteplay, codetransformer, bytecode, etc. projects are already available. By the way, I wrote PEP 511 for AST optimizers, not for bytecode optimizers. Since we can modify AST, bytecode is less interesting. Writing an optimizer on bytecode depends too much on the implementation. It may break if we add new bytecode instructions or modify the format of instructions (as you said). It's a deliberate choice to leave optimizers out of the stdlib. I expect that it will take months or years to stabilize the API of an optimizer, test it with various kinds of applications, etc.
* Iterable of (opcode, argval [, line [, ...]]) tuples. The argval is the actual global name, constant value, etc., not the encoded index, etc. For jumps, the argval is just the target instruction itself. The existing dis.Bytecode (with a few minor changes) already fits this type--but so does, say, a list of 3-tuples, which we can much more easily build in C.
You should not use "be usable in C" constraint, or I expect a bad API :-/ When you modify bytecode, you need many functions which are revelant to be put in instruction objects. See existing codetransformer & bytecode projects for examples of methods. Note: my bytecode.Instr has a simple constructor, it takes 2 or 3 parameters: Instr(lineno, name, arg=UNSET). I don't think that it's hard to write an helper function in C to emit Instr object if you *really* want to write C code.
And PyCode_Assemble is the only new C API function needed.
I don't understand why do you care so much of having a C API. What do you want to do? The only need for CPython is to have the most simple peephole optimizer, basically only optimize jumps. An AST optimizer can do everything else. I would like to experiment such peephole optimizer implemented in pure Python. I'm not sure that writing it in pure Python will kill performances. The cost of import matters, but only in few use cases. In general, applications run longer than 1 second and so the cost of import is negligible. Moreover, .py are only compiled once to .pyc. If .pyc are precompiled, the speed of the optimizer doesn't matter :-)
* Assuming the assembler drops NOPs, we can use NOPs as pseudo-instructions for when you want byteplay-like Label and SetLineNo. The disassembler can optionally even generate them. So, we don't need explicit pseudo-instructions.
For pattern matching, inline Label or SetLineno instrutions are annoying. For example, if you use the pattern "LOAD_CONST <value>; UNARY_NOT", "SetLineno 3; LOAD_CONST <value>; SetLineno 3; UNARY_NOT" will not match. You *can* modify the algorithm to match patterns, but putting line numbers in instructions avoid this issue. Using multiple blocks rather than a single list of instructions avoid the need of inline labels. In my bytecode project, I tried to support both API: inline labels in Bytecode, labels in blocks in BytecodeBlocks. I may add support for Setlineno later. I'm still working on the API.
* Any higher-level representations, like a graph of blocks with edges for the jumps between them, are easy enough to build on top of the dis representation (and to flatten back into that representation), so we don't need anything more complicated in the stdlib.
Yeah, you should start with something simple but extensible. An API generic enough to be usable as a low-level API by existing byteplay, codetransformer, bytecode projects, and then build an higher-level API on top of that. Or maybe I'm right and it's a bad idea :-) codetransformer is more than just an API to modify bytecode. It has an API to match instructions using patterns. Such stuff should be kept in codetransformer. Victor

I just released bytecode 0.1, to discuss a "stable" (released) API :-) https://pypi.python.org/pypi/bytecode Instr constructor is now: Instr(name, arg=UNSET, *, lineno=None). I added SetLineno pseudo-instruction. If Instr is created with no line number, the line number is inherited from previous instructions, from SetLineno, or from the first line number of the bytecode object (default: 1). Mandatory "Hello World" example: from bytecode import Instr, Bytecode bytecode = Bytecode() bytecode.extend([Instr("LOAD_NAME", 'print'), Instr("LOAD_CONST", 'Hello World!'), Instr("CALL_FUNCTION", 1), Instr("POP_TOP"), Instr("LOAD_CONST", None), Instr("RETURN_VALUE")]) code = bytecode.to_code() exec(code) Victor

2016-02-26 17:51 GMT+01:00 Brett Cannon <brett@python.org>:
My API is still a work-in-progress :-) Maybe the API can be changed to: LOAD_CONST('Hello World!') POP_TOP() But it means that your code will probably starts with "from bytecode import *" or "from bytecode import LOAD_CONST, POP_TOP". There are something like 155 opcodes, so I would prefer to not have to write the exhaustive list of imports. Another option is something like: Instr.LOAD_CONST('Hello World!') Instr.POP_TOP() or whatever.LOAD_CONST('Hello World!') whatever.POP_TOP() I don't know what is the best. codetransformers uses instructions.LOAD_CONST("Hello World!") and instructions.LOAD_FAST is a type (it used for pattern matching). Victor

On 02/26/2016 09:19 AM, Victor Stinner wrote:
There is a reason that `from module import *` is still available, and this is one of them. You could also put all the opcodes (and just the opcodes) into their own module to limit the `import *` reach: from bytecode.opcodes import * -- ~Ethan~

Can we change the subject for this? Bikeshedding one of multiple different higher-level APIs that could be used for different kinds of bytecode processing is off-topic from having a simple portable format for representing bytecode, except tangentially in that I hope (and am pretty sure) that all such higher-level APIs can be built on top of the portable format. Sent from my iPhone

On Fri, 26 Feb 2016 at 02:27 Victor Stinner <victor.stinner@gmail.com> wrote:
So one thing to point out (that I know Raymond Hettinger would ;) ), is that Python scripts passed by file path on the command-line are not written out to a .pyc file, and so at least the initial entry point will still have to pay for any optimizer overhead no matter what. And if you write your entire app in a single file for ease-of-shipment then you will pay the penalty 100% of the time and not just for some small __main__ module. Now if the balance between overhead vs. the optimization benefit for everyone else balances out then it's worth the cost, but the question is what exactly that cost works out to be.

On Feb 26, 2016, at 02:27, Victor Stinner <victor.stinner@gmail.com> wrote:
Sure, we could either (a) have duplicate code in C and Python that do virtually the same assembly and fixup work, (b) rewrite the peephole optimizer and part of the compiler in Python and freeze both them and the dis module (or whatever), or (c) use a format that's accessible from both C and Python and change as little as possible to get what we want. I think the last one is clearly the best solution, but it's not because the other two aren't impossible.
Which is exactly why I suggested the very alternative that you're replying to: tuples of (opcode, argval [, line [, ...]]) are trivial to build. Instruction (with a minor, backward-compatible change) is compatible with that, but you don't need to use Instruction. Similarly, an iterable of such tuples is trivial to build; Bytecode is compatible with that, but you don't need to use Bytecode. Here's an example of what a bytecode processor could look like: for opcode, argval, *rest in instructions: if opcode == dis.LOAD_GLOBAL: yield (dis.LOAD_CONST, eval(argval, globals(), *rest) else: yield (opcode, argval, *rest) If you want to use the dis structures instead, you don't have to, but you can: bc = dis.Bytecode(instructions) for i, instr in enumerate(bc): if instr.opcode == dis.LOAD_GLOBAL: bc[i] = instr.replace(opcode=dis.LOAD_CONST, eval(instr.argval, globals())) return bc And notice that, even if you _do_ want to use those structures, the problems you're imagining don't arise. There are more complicated examples on the linked blog post.
And, as I said, you only have to supply opcode, argval, and sometimes line. The other attributes are there for reading existing bytecode, but aren't needed for emitting it. This is the same model that's used successfully in the tokenize module. (Of course that module has some other API nightmares, but _this_ part of it is very nice.) Tokens are a namedtuple with 6 attributes, but you can construct them with just the first 2, 3, or 4, or you can just substitute a tuple of 2, 3, or 4 elements in place of a Token.
dis.Instruction doesn't seem extensible to add new features.
Why not? I added hasjrel to see how easy it is: there's one obvious way to do it, which took a few seconds, and it works exactly as I'd want it to. What kind of new features do you think would be difficult to add?
Again, already covered, and covered in more detail in the blog post.
No it isn't. What we have in the dis module does _not_ have size; it's a flat sequence of instructions. If you've missed that, you probably need to go back and reread the proposal, because it doesn't really make sense if you think this is what it's suggesting.
Here we get to the core of the proposal. As I show in the linked blog post, it takes a handful of lines to go back and forth between the proposed format and a block-graph format. It's just as easy to go back and forth between having pseudo-instructions and not having them. Or any other format you come up with. That's not true for raw bytecode--going back and forth requires writing a complicated disassembler and even more complicated assembler. But, even more important, the proposed format is the same between CPython 3.6 and MicroPython 3.6, and it stays the same even if CPython 3.7 switches to wordcode. And any code you've written that builds a block graph out of the proposed format still works. That's what makes the proposed format a portable, resilient format. And I believe it's the simplest possible portable, resilient format. It's not the ideal format to use for every possible kind of bytecode manipulation. That isn't the goal. The fact that it happens to be good enough for a lot of kinds of bytecode manipulation is a nice side benefit, but it's not the point. The fact that it integrates nicely with dis is also very nice, but it's not the point. So, "let's build yet another third-party assembler and disassembler with a different API" is not a competing solution to this proposal; it's part of the problem I'm trying to solve.
By the way, I wrote PEP 511 for AST optimizers, not for bytecode optimizers.
As I've said before: you included bytecode optimizers in PEP 511, you made the API more complicated so you could allow them, you provide a rationale for why we need to allow them, and you gave an example of one. If the PEP is wrong, you don't have to convince anyone; it's your PEP, go change it. Anyway, from here you go off onto a long tangent arguing that my proposed format is not the ideal once-and-for-all-best format to use for every possible kind of bytecode manipulation. I already granted that above, and I'll grant it again and snip all the arguments.
As explained near the top, I want to share code between the assemble function in the compiler and the assemble function used in Python code. Ideally, I'd like to do this without having to expose any new types or utility functions or anything else to C. And, as it turns out, that's doable. I can write a PyCode_Assemble function that's used by the compiler and by Python code without having to add a single other new thing to the C API.
I don't understand the last sentence. Are you contradicting the rest of the paragraph, and suggesting that a simple but extensible API that can be used by byteplay, etc. and new projects is a bad thing? If so, why? Do you think it would be better to bless one of those projects, and keep all the others as hard to write as they are today?

2016-02-26 19:15 GMT+01:00 Andrew Barnert <abarnert@yahoo.com>:
Currently, Python/compile.c uses specific C structures: * struct instr: opcode, oparg, ... a jump target is pointer to a basicblock * struct basicblock: list of instructions, ... * struct fblockinfo * struct compiler_unit: list of constants, list of names, blocks, etc. * struct compiler: filename, compiler_unit, ... * ... Your proposal looks more like a flat list of instructions, it doesn't fit well with the current code (blocks). The structures contain many information which are specific to the compiler, I'm not sure that it would make sense to put them in your generic API. Or maybe you can rebuild the current structures on top of your API. My opinion on that is that it's not worth to modify Python/compile.c and leave it unchanged.
A tuple cannot be modified. By mutable, I mean being able to replace an attribute without having to create a new instruction: instr.arg = new_arg instead of bytecode[index] = instr.replace_arg(arg) In the first version in my bytecode project, I hesitated between abstract instruction and concrete instruction. I wanted to put checks, so I started with immutable instructions. But it's not really convenient. I would prefer mutable instructions. I left concrete instructions immutable, because arguments depend on a bytecode object. For example, LOAD_CONST uses an index in a list of constants. And jump targets depend on the exact size of other instructions. Maybe concrete bytecode should be made mutable too. But it's not too hard to create a new concrete instruction to replace an existing one.
Here's an example of what a bytecode processor could look like:
for opcode, argval, *rest in instructions:
Hum, "*rest" doesn't look good to me. What is the exact size of an instruction? (how many fields) What if we want to add a new field later? Will it break existing code relying on the minimum/maximum number of fields of an instruction?
Yeah, this API looks better: a single object which contains all information. It's more future-proof. (I just talking about the for "instr in bytecode:" :-))
Hum, I understand that an instruction is a named tuple. So if I create an instruction only with the opcode (ex: 100), the name field is not set, right? Which fields are "mandatory"? Which fields are optional? In my bytecode API, you provide a name, the opcode is computed from the name. You can modify the name, opcode is updated. If you modify opcode, name is updated. There are checks on lineno attributes (must be an int >= 1, or None). ConcreteInstr has strict checks on the argument.
dis.Instruction doesn't seem extensible to add new features.
Why not? I added hasjrel to see how easy it is: there's one obvious way to do it, which took a few seconds, and it works exactly as I'd want it to. What kind of new features do you think would be difficult to add?
In bytecode 0.1, I have the following methods on Instr: * format(labels) * __repr__() * __eq__(): smart comparison. For LOAD_CONST, it understands that -0.0 argument is different than +0.0 for example. * is_jump() * is_cond_jump() ConcreteInstr has additional methods: * assemble() * ConcreteInstr.disassemble() (static method) * get_jump_target(instr_offset) About your hasjrel example: do you mean that you added a new field to the namedtuple? Does the constructor of the instruction have to fill this field manually? What if the field is not set?
The offset attribute doesn't seem revelant for an abstract instruciton. If you remove an instruction before, the offset becomes inconsistent. I chose to not store the offset inside instructions, but recompute it each time that I iterate on concrete instructions (offset += instr.size).
I saw your "def blockify(instructions):" function, but I don't understand how do you store labels. You use a "is_jump_target" attribute. If you remove the target of a jump (an instruction with is_jump_target=True), I guess that you have to mark the following instrution with is_jump_target=True, right? What if the block only contains one instruction? I identified a requirement when you manipulate jumps: being able to "resolve jumps". From a jump, you want to know the target instruction. With bytecode.BytecodeBlocks, you get the target block with: "target_block = bytecode[jump.label]; target_instr = target_block[0]" (with a complexity of O(1), bycode[label] gets an item of an list, it uses a mapping label => block index to get the index.) With bytecode.Bytecode (list of instructions), you have to iterate on all iterations to search for the label. I can maybe optimize that later to build an internal cache, updated when the list is modified. I'm not sure that a label is the most convenient abstraction for blocks. In CFG, a jump points directly to a subtree (the instruction argument is directly the block), there is no indirection like my label object. In bytecode, you can also convert bytecode between the 3 formats (concrete, bytecode, blocks), the 3 classes have 5 conversion methods: * from_code() * to_code() * to_concrete_bytecode() * to_bytecode() * to_bytecode_blocks()
So, "let's build yet another third-party assembler and disassembler with a different API" is not a competing solution to this proposal; it's part of the problem I'm trying to solve.
I wrote the bytecode project to try to implement you idea. It looks like we don't want the same API :-)
Sorry, I wanted to write "maybe I'm wrong and it's a bad idea". Victor

On Feb 26, 2016, at 14:05, Victor Stinner <victor.stinner@gmail.com> wrote:
Yes. But most of that information is only needed earlier in the process--building the blocks, linearizing them, making sure each one ends in a return, etc. It requires a bit of reorganization to cleanly separate out the final assembly/fixup step, but not that much. And, while that last step does use the current structures today, it doesn't actually need anything from them but a way to iterate instructions. (I learned this during an earlier experiment, where I shared all of the compiler structures directly with the peephole optimizer and then tried to limit the sharing as much as possible.)
My opinion on that is that it's not worth to modify Python/compile.c and leave it unchanged.
DRY. With a single assembler rather than two, anyone who wants to change anything about the internal bytecode format or how fixup works or anything else only has to do it once, rather than figuring out how to do the same thing in two completely different pieces of code that are intended to accomplish the same thing.
So what? Even your examples don't mutate the Instr object; they build a new one instead. It's not like "bc[i] = Instruction(LOAD_CONST, constval)" or "yield (LOAD_CONST, constval)" and so on are less readable/Pythonic/concise than "bc[i].opcode = LOAD_CONST; bc[i].argval = constval". If you really think this is important, changing the format to any iterable of iterables instead of iterable of tuples is trivial, so you can use lists instead of tuples, and make Instruction mutable, and so on. But I don't see what it buys you. Also, one more time: I'm not trying to invent the all-singing, all-dancing best-possible interface for all kinds of bytecode manipulation; I'm trying to invent the simplest portable and resilient format that people can build other APIs on top of. The dis module happens to provide a somewhat useful such API for simple manipulations, which is nice, but it will never be the best one for all manipulations, and that's fine. So I don't want to change dis any more than necessary. Your own API can diverge much more radically from dis if it wants. It just has to take the same iterable-of-tuples format as input and output; it can store things internally however it wants, which can include mutable instruction objects if you want.
In my bytecode API, you provide a name, the opcode is computed from the name.
To me, requiring something that duck-types like an int, and then providing an IntEnum of all the opcodes, seems like a much nicer API than requiring strings and providing str<->int maps. That's exactly what enums are for. But if you want to build a string API on top of the portable duck-types-as-int format instead, you can. All you have to do is emit the ints in the iterable you pass to assemble. More generally, I don't think debating all, or even any, of the design decisions of your in-progress library is at all relevant to this discussion. Unless you think you have an example of something you can do with raw bytecode that you can't do with the portable format, it doesn't affect this proposal at all.
I gave examples that show how to do various things with and without storing labels. You're looking at one of the examples without storing labels, and asking where the labels are stored in that example. For that example, I instead make sure I have a complete dis.Bytecode with its is_jump_target fields filled in, and use that, to show that labels aren't always necessary. The next few questions are mostly irrelevant, so I'm skipping most of them. In general, you're asking how your library could do block-related things directly on the portable format. In many cases, what you want to do is actually easy, but it doesn't matter, because your library isn't going to do that; it's going to take the portable format as input and output, and do things on a graph of blocks in between, and the format you use for that graph of blocks is entirely up to you, as long as you can linearize that back to an iterable of tuples as the end.
No you don't. I gave examples that resolve jumps in O(1) time, both with and without labels. But, again, who cares? Your code won't be doing this.
I'm not sure that a label is the most convenient abstraction for blocks.
And it doesn't have to be. As long as it's sufficient for you to build whatever more convenient abstraction you think you need.
OK, then I think you're right, and it was a good idea. :) And that's exactly what I've attempted to do: come up with the simplest API that can support things like byteplay, codetransformer, and bytecode so they don't have to directly manipulate the byte strings anymore, giving us more freedom to change the internal CPython format without breaking every bytecode processor in the world.

On 26 February 2016 at 15:27, Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:
Of course we already have such a format today: dis.Bytecode.
I wouldn't feel beholden to hewing too closely to the existing dis API - that was defined to solve a specific problem with making it easier to test CPython's code generation pipeline, while also providing an improved foundation for the dis text output. For those use cases, the redundancy in the API is a help, rather than a hindrance, since we can easily test and display all the values of interest. For manipulation though, the redundancy is a problem - you need to either declare some fields authoritative and implicitly derive the others, or else expect users to keep things in sync manually (which would be a pretty user hostile API). I do think it's reasonable to seek to define a standard MutableBytecode format specifically to make manipulation easier, but I don't think it makes sense to couple that to PEP 511's definition of bytecode processing. The reason I feel that way is that I consider it *entirely acceptable* for the first generation of bytecode post-processors to be based on the disassemble-manipulate-reassemble model that folks already use for bytecode manipulating function decorators, and for doing that conveniently to be dependent on 3rd party libraries, at least for the time being. If we later settle on a standard mutable bytecode format, we may also decide to introduce bytecode pre-processors that accept and produce the pre-assembly form of the bytecode, but that's something to be done as a possible compile time reduction measure *after* folks have practical experience with the easier to define post-processing approach, rather than before. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Feb 26, 2016, at 03:36, Nick Coghlan <ncoghlan@gmail.com> wrote:
I don't feel too beholden to it. In fact, I only came back to it by accident. A few weeks ago, I proposed extending dis into something like byteplay and giving it a C API so the compiler could use it too. I decided that was a mistake, and went back to trying to come up with the simplest portable format that both the compiler and Python code could use, and came up with the iterable of tuples. And then I went and looked at what it would take to make dis support that format, and it turns out to take very little. So, rather than building a new Python convenience module for portable assembly (which I don't think we want to do), or not having one and just making people deal with the tuples with no convenience features (which is also viable, but I think less desirable), let's just use dis.
The portable assembly format doesn't use Instructions, just tuples of 2 or more elements where the first three are opcode, argval, and line. A dis.Instruction object (if we reorder its attributes and make the fact that it's a namedtuple type public) fits that, which means you can use it for convenience when it's convenient (probably leaving the other fields None), but you can also just ignore it and use a plain tuple, and often that's the simplest thing. Or, of course, you can use a library like byteplay that uses dis where it's helpful but provides whatever API it wants.
I do think it's reasonable to seek to define a standard MutableBytecode format specifically to make manipulation easier
I'm not sure it is. After failing to convince Victor that byteplay really can do what he wants, I now think that it's more reasonable to define a standard MutableBytecode format only to make it easier to build third-party libraries that each make manipulation easier in different ways.
Remember the starting motivation. What's the biggest stumbling block to switching to wordcode? It breaks code that manipulates bytecode directly. Adding a portable, future-proof format doesn't do any good if, at the same time, we also add something that encourages new code that ignores that format and instead manipulates bytecode directly.
My goal is to update byteplay to use the portable iterable-of-tuples format and lean on the built-in assembler if 3.6+. Even though plenty of things are actually simple enough to write with the iterable-of-tuples portable format, byteplay still has the advantage of (a) working back to Python 2.6, and (b) working with all of the existing code I've written for it over the last half-decade, so I will continue to use it. And I'm sure many other people will use it, or other third-party libraries. But, if most of those libraries (and most code people write without third-party libraries) rely on the portable format, then they'll continue to work throughout the 3.7 development cycle rather than making users wait 3-24 months before upgrading to 3.7. And it will allow us to make more radical changes in 3.7. And it'll allow MicroPython to support most of those libraries despite a slightly different internal format. And so on.
If compile-time performance were the issue here, I'd agree. But it's not an issue--or, if it is, it's a distant fourth place behind resilience, portability, and simplicity. And to get resilience and portability, we have to use a resilient and portable format from the start, not bolt one on later as an option.

2016-02-26 6:27 GMT+01:00 Andrew Barnert via Python-ideas <python-ideas@python.org>:
Of course we already have such a format today: dis.Bytecode. But it doesn't quite solve the problem, for three reasons:
* Not accessible from C.
I don't think that it's a real issue. The current trend is more to rewrite pieces of CPython in Python. importlib is a great example of that. importlib is also an interested case because it is a Python module to improts modules, but we need importlib to import importlib. Hum. Brett Canon solved this issue by compiling the Python code to a frozen module. It means that we can do something similar if we want to rewrite the peephole optimizer in Python.
* Not mutable, and no assembler.
I looked at Bytecode & Instruction objects of dis. They look nice to "read" bytecode, but not to modify bytecode. dis.Instruction is not mutable and informations are duplicated. For example, the operator is stored as name (LOAD_CONST) and code (100). Argument is stored as int (1), value ("hello") and representation ('"hello"'). It has no methods but attributes like is_jump_target. dis.Instruction doesn't seem extensible to add new features. Adding more items to such namedtuple doesn't seem like a nice API to me.
* A few things (mainly jump arguments) are still in terms of bytecode bytes.
Hum, this is a problem. The dis is already in the stdlib, you cannot modify its API in a backward incompatible way. IMHO it's safer and simpler to add something new (maybe in a new module), not modify the existing Bytecode & Instruction classes. To modify bytecode, you need a different structure. For example, jump targets must be abstracted with labels (or something else). In my bytecode module, I provide 3 different ways to expose bytecode: * ConcreteBytecode: instructions close to raw bytes structure, arguments must be integers * Bytecode: list of abstract instructions using labels * BytecodeBlocks: use blocks, a block is a list of instructions with a label, jumps point to blocks An instruction is an object which contains a line number, has methods like is_jump(). Abstract instructions can be modified (lineno, name, op, arg), they have no size. Concrete instructions have size, attributes cannot be modified. Concrete bytecode & instructions is closer to what we already have in the dis module. I'm not sure that it's useful, maybe I should keep it private. It's an intermediate format to disassemble and assemble code objects. Instr('LOAD_CONST') argument is directly the constant value, so Bytecode has no "consts" attribute. ConcreteInstr('LOAD_CONST') agument is an integer: the index to the consts list of the ConcreteBytecode. BytecodeBlocks is a "flat" control flow graph (CFG). It is required by the peephole optimizer to not modify two instructions which are part of two code paths (two blocks). Side-effect, with blocks, it's trivial to detect dead code. As you wrote, it's also possible to reorder blocks to try to avoid jumps. Note: the current peephole optimizer miss a lot of optimizations on jumps :-/ Python 3.5 is a little bit better.
And, fix it well enough, and it also solves the problem I brought up a few weeks ago (http://article.gmane.org/gmane.comp.python.ideas/38431): if PEP 511 is going to provide a builtin API for registering bytecode processors, we should make it feasible to write them.
Again, you don't need to add anything to the stdlib to write a bytecode optimizer. byteplay, codetransformer, bytecode, etc. projects are already available. By the way, I wrote PEP 511 for AST optimizers, not for bytecode optimizers. Since we can modify AST, bytecode is less interesting. Writing an optimizer on bytecode depends too much on the implementation. It may break if we add new bytecode instructions or modify the format of instructions (as you said). It's a deliberate choice to leave optimizers out of the stdlib. I expect that it will take months or years to stabilize the API of an optimizer, test it with various kinds of applications, etc.
* Iterable of (opcode, argval [, line [, ...]]) tuples. The argval is the actual global name, constant value, etc., not the encoded index, etc. For jumps, the argval is just the target instruction itself. The existing dis.Bytecode (with a few minor changes) already fits this type--but so does, say, a list of 3-tuples, which we can much more easily build in C.
You should not use "be usable in C" constraint, or I expect a bad API :-/ When you modify bytecode, you need many functions which are revelant to be put in instruction objects. See existing codetransformer & bytecode projects for examples of methods. Note: my bytecode.Instr has a simple constructor, it takes 2 or 3 parameters: Instr(lineno, name, arg=UNSET). I don't think that it's hard to write an helper function in C to emit Instr object if you *really* want to write C code.
And PyCode_Assemble is the only new C API function needed.
I don't understand why do you care so much of having a C API. What do you want to do? The only need for CPython is to have the most simple peephole optimizer, basically only optimize jumps. An AST optimizer can do everything else. I would like to experiment such peephole optimizer implemented in pure Python. I'm not sure that writing it in pure Python will kill performances. The cost of import matters, but only in few use cases. In general, applications run longer than 1 second and so the cost of import is negligible. Moreover, .py are only compiled once to .pyc. If .pyc are precompiled, the speed of the optimizer doesn't matter :-)
* Assuming the assembler drops NOPs, we can use NOPs as pseudo-instructions for when you want byteplay-like Label and SetLineNo. The disassembler can optionally even generate them. So, we don't need explicit pseudo-instructions.
For pattern matching, inline Label or SetLineno instrutions are annoying. For example, if you use the pattern "LOAD_CONST <value>; UNARY_NOT", "SetLineno 3; LOAD_CONST <value>; SetLineno 3; UNARY_NOT" will not match. You *can* modify the algorithm to match patterns, but putting line numbers in instructions avoid this issue. Using multiple blocks rather than a single list of instructions avoid the need of inline labels. In my bytecode project, I tried to support both API: inline labels in Bytecode, labels in blocks in BytecodeBlocks. I may add support for Setlineno later. I'm still working on the API.
* Any higher-level representations, like a graph of blocks with edges for the jumps between them, are easy enough to build on top of the dis representation (and to flatten back into that representation), so we don't need anything more complicated in the stdlib.
Yeah, you should start with something simple but extensible. An API generic enough to be usable as a low-level API by existing byteplay, codetransformer, bytecode projects, and then build an higher-level API on top of that. Or maybe I'm right and it's a bad idea :-) codetransformer is more than just an API to modify bytecode. It has an API to match instructions using patterns. Such stuff should be kept in codetransformer. Victor

I just released bytecode 0.1, to discuss a "stable" (released) API :-) https://pypi.python.org/pypi/bytecode Instr constructor is now: Instr(name, arg=UNSET, *, lineno=None). I added SetLineno pseudo-instruction. If Instr is created with no line number, the line number is inherited from previous instructions, from SetLineno, or from the first line number of the bytecode object (default: 1). Mandatory "Hello World" example: from bytecode import Instr, Bytecode bytecode = Bytecode() bytecode.extend([Instr("LOAD_NAME", 'print'), Instr("LOAD_CONST", 'Hello World!'), Instr("CALL_FUNCTION", 1), Instr("POP_TOP"), Instr("LOAD_CONST", None), Instr("RETURN_VALUE")]) code = bytecode.to_code() exec(code) Victor

2016-02-26 17:51 GMT+01:00 Brett Cannon <brett@python.org>:
My API is still a work-in-progress :-) Maybe the API can be changed to: LOAD_CONST('Hello World!') POP_TOP() But it means that your code will probably starts with "from bytecode import *" or "from bytecode import LOAD_CONST, POP_TOP". There are something like 155 opcodes, so I would prefer to not have to write the exhaustive list of imports. Another option is something like: Instr.LOAD_CONST('Hello World!') Instr.POP_TOP() or whatever.LOAD_CONST('Hello World!') whatever.POP_TOP() I don't know what is the best. codetransformers uses instructions.LOAD_CONST("Hello World!") and instructions.LOAD_FAST is a type (it used for pattern matching). Victor

On 02/26/2016 09:19 AM, Victor Stinner wrote:
There is a reason that `from module import *` is still available, and this is one of them. You could also put all the opcodes (and just the opcodes) into their own module to limit the `import *` reach: from bytecode.opcodes import * -- ~Ethan~

Can we change the subject for this? Bikeshedding one of multiple different higher-level APIs that could be used for different kinds of bytecode processing is off-topic from having a simple portable format for representing bytecode, except tangentially in that I hope (and am pretty sure) that all such higher-level APIs can be built on top of the portable format. Sent from my iPhone

On Fri, 26 Feb 2016 at 02:27 Victor Stinner <victor.stinner@gmail.com> wrote:
So one thing to point out (that I know Raymond Hettinger would ;) ), is that Python scripts passed by file path on the command-line are not written out to a .pyc file, and so at least the initial entry point will still have to pay for any optimizer overhead no matter what. And if you write your entire app in a single file for ease-of-shipment then you will pay the penalty 100% of the time and not just for some small __main__ module. Now if the balance between overhead vs. the optimization benefit for everyone else balances out then it's worth the cost, but the question is what exactly that cost works out to be.

On Feb 26, 2016, at 02:27, Victor Stinner <victor.stinner@gmail.com> wrote:
Sure, we could either (a) have duplicate code in C and Python that do virtually the same assembly and fixup work, (b) rewrite the peephole optimizer and part of the compiler in Python and freeze both them and the dis module (or whatever), or (c) use a format that's accessible from both C and Python and change as little as possible to get what we want. I think the last one is clearly the best solution, but it's not because the other two aren't impossible.
Which is exactly why I suggested the very alternative that you're replying to: tuples of (opcode, argval [, line [, ...]]) are trivial to build. Instruction (with a minor, backward-compatible change) is compatible with that, but you don't need to use Instruction. Similarly, an iterable of such tuples is trivial to build; Bytecode is compatible with that, but you don't need to use Bytecode. Here's an example of what a bytecode processor could look like: for opcode, argval, *rest in instructions: if opcode == dis.LOAD_GLOBAL: yield (dis.LOAD_CONST, eval(argval, globals(), *rest) else: yield (opcode, argval, *rest) If you want to use the dis structures instead, you don't have to, but you can: bc = dis.Bytecode(instructions) for i, instr in enumerate(bc): if instr.opcode == dis.LOAD_GLOBAL: bc[i] = instr.replace(opcode=dis.LOAD_CONST, eval(instr.argval, globals())) return bc And notice that, even if you _do_ want to use those structures, the problems you're imagining don't arise. There are more complicated examples on the linked blog post.
And, as I said, you only have to supply opcode, argval, and sometimes line. The other attributes are there for reading existing bytecode, but aren't needed for emitting it. This is the same model that's used successfully in the tokenize module. (Of course that module has some other API nightmares, but _this_ part of it is very nice.) Tokens are a namedtuple with 6 attributes, but you can construct them with just the first 2, 3, or 4, or you can just substitute a tuple of 2, 3, or 4 elements in place of a Token.
dis.Instruction doesn't seem extensible to add new features.
Why not? I added hasjrel to see how easy it is: there's one obvious way to do it, which took a few seconds, and it works exactly as I'd want it to. What kind of new features do you think would be difficult to add?
Again, already covered, and covered in more detail in the blog post.
No it isn't. What we have in the dis module does _not_ have size; it's a flat sequence of instructions. If you've missed that, you probably need to go back and reread the proposal, because it doesn't really make sense if you think this is what it's suggesting.
Here we get to the core of the proposal. As I show in the linked blog post, it takes a handful of lines to go back and forth between the proposed format and a block-graph format. It's just as easy to go back and forth between having pseudo-instructions and not having them. Or any other format you come up with. That's not true for raw bytecode--going back and forth requires writing a complicated disassembler and even more complicated assembler. But, even more important, the proposed format is the same between CPython 3.6 and MicroPython 3.6, and it stays the same even if CPython 3.7 switches to wordcode. And any code you've written that builds a block graph out of the proposed format still works. That's what makes the proposed format a portable, resilient format. And I believe it's the simplest possible portable, resilient format. It's not the ideal format to use for every possible kind of bytecode manipulation. That isn't the goal. The fact that it happens to be good enough for a lot of kinds of bytecode manipulation is a nice side benefit, but it's not the point. The fact that it integrates nicely with dis is also very nice, but it's not the point. So, "let's build yet another third-party assembler and disassembler with a different API" is not a competing solution to this proposal; it's part of the problem I'm trying to solve.
By the way, I wrote PEP 511 for AST optimizers, not for bytecode optimizers.
As I've said before: you included bytecode optimizers in PEP 511, you made the API more complicated so you could allow them, you provide a rationale for why we need to allow them, and you gave an example of one. If the PEP is wrong, you don't have to convince anyone; it's your PEP, go change it. Anyway, from here you go off onto a long tangent arguing that my proposed format is not the ideal once-and-for-all-best format to use for every possible kind of bytecode manipulation. I already granted that above, and I'll grant it again and snip all the arguments.
As explained near the top, I want to share code between the assemble function in the compiler and the assemble function used in Python code. Ideally, I'd like to do this without having to expose any new types or utility functions or anything else to C. And, as it turns out, that's doable. I can write a PyCode_Assemble function that's used by the compiler and by Python code without having to add a single other new thing to the C API.
I don't understand the last sentence. Are you contradicting the rest of the paragraph, and suggesting that a simple but extensible API that can be used by byteplay, etc. and new projects is a bad thing? If so, why? Do you think it would be better to bless one of those projects, and keep all the others as hard to write as they are today?

2016-02-26 19:15 GMT+01:00 Andrew Barnert <abarnert@yahoo.com>:
Currently, Python/compile.c uses specific C structures: * struct instr: opcode, oparg, ... a jump target is pointer to a basicblock * struct basicblock: list of instructions, ... * struct fblockinfo * struct compiler_unit: list of constants, list of names, blocks, etc. * struct compiler: filename, compiler_unit, ... * ... Your proposal looks more like a flat list of instructions, it doesn't fit well with the current code (blocks). The structures contain many information which are specific to the compiler, I'm not sure that it would make sense to put them in your generic API. Or maybe you can rebuild the current structures on top of your API. My opinion on that is that it's not worth to modify Python/compile.c and leave it unchanged.
A tuple cannot be modified. By mutable, I mean being able to replace an attribute without having to create a new instruction: instr.arg = new_arg instead of bytecode[index] = instr.replace_arg(arg) In the first version in my bytecode project, I hesitated between abstract instruction and concrete instruction. I wanted to put checks, so I started with immutable instructions. But it's not really convenient. I would prefer mutable instructions. I left concrete instructions immutable, because arguments depend on a bytecode object. For example, LOAD_CONST uses an index in a list of constants. And jump targets depend on the exact size of other instructions. Maybe concrete bytecode should be made mutable too. But it's not too hard to create a new concrete instruction to replace an existing one.
Here's an example of what a bytecode processor could look like:
for opcode, argval, *rest in instructions:
Hum, "*rest" doesn't look good to me. What is the exact size of an instruction? (how many fields) What if we want to add a new field later? Will it break existing code relying on the minimum/maximum number of fields of an instruction?
Yeah, this API looks better: a single object which contains all information. It's more future-proof. (I just talking about the for "instr in bytecode:" :-))
Hum, I understand that an instruction is a named tuple. So if I create an instruction only with the opcode (ex: 100), the name field is not set, right? Which fields are "mandatory"? Which fields are optional? In my bytecode API, you provide a name, the opcode is computed from the name. You can modify the name, opcode is updated. If you modify opcode, name is updated. There are checks on lineno attributes (must be an int >= 1, or None). ConcreteInstr has strict checks on the argument.
dis.Instruction doesn't seem extensible to add new features.
Why not? I added hasjrel to see how easy it is: there's one obvious way to do it, which took a few seconds, and it works exactly as I'd want it to. What kind of new features do you think would be difficult to add?
In bytecode 0.1, I have the following methods on Instr: * format(labels) * __repr__() * __eq__(): smart comparison. For LOAD_CONST, it understands that -0.0 argument is different than +0.0 for example. * is_jump() * is_cond_jump() ConcreteInstr has additional methods: * assemble() * ConcreteInstr.disassemble() (static method) * get_jump_target(instr_offset) About your hasjrel example: do you mean that you added a new field to the namedtuple? Does the constructor of the instruction have to fill this field manually? What if the field is not set?
The offset attribute doesn't seem revelant for an abstract instruciton. If you remove an instruction before, the offset becomes inconsistent. I chose to not store the offset inside instructions, but recompute it each time that I iterate on concrete instructions (offset += instr.size).
I saw your "def blockify(instructions):" function, but I don't understand how do you store labels. You use a "is_jump_target" attribute. If you remove the target of a jump (an instruction with is_jump_target=True), I guess that you have to mark the following instrution with is_jump_target=True, right? What if the block only contains one instruction? I identified a requirement when you manipulate jumps: being able to "resolve jumps". From a jump, you want to know the target instruction. With bytecode.BytecodeBlocks, you get the target block with: "target_block = bytecode[jump.label]; target_instr = target_block[0]" (with a complexity of O(1), bycode[label] gets an item of an list, it uses a mapping label => block index to get the index.) With bytecode.Bytecode (list of instructions), you have to iterate on all iterations to search for the label. I can maybe optimize that later to build an internal cache, updated when the list is modified. I'm not sure that a label is the most convenient abstraction for blocks. In CFG, a jump points directly to a subtree (the instruction argument is directly the block), there is no indirection like my label object. In bytecode, you can also convert bytecode between the 3 formats (concrete, bytecode, blocks), the 3 classes have 5 conversion methods: * from_code() * to_code() * to_concrete_bytecode() * to_bytecode() * to_bytecode_blocks()
So, "let's build yet another third-party assembler and disassembler with a different API" is not a competing solution to this proposal; it's part of the problem I'm trying to solve.
I wrote the bytecode project to try to implement you idea. It looks like we don't want the same API :-)
Sorry, I wanted to write "maybe I'm wrong and it's a bad idea". Victor

On Feb 26, 2016, at 14:05, Victor Stinner <victor.stinner@gmail.com> wrote:
Yes. But most of that information is only needed earlier in the process--building the blocks, linearizing them, making sure each one ends in a return, etc. It requires a bit of reorganization to cleanly separate out the final assembly/fixup step, but not that much. And, while that last step does use the current structures today, it doesn't actually need anything from them but a way to iterate instructions. (I learned this during an earlier experiment, where I shared all of the compiler structures directly with the peephole optimizer and then tried to limit the sharing as much as possible.)
My opinion on that is that it's not worth to modify Python/compile.c and leave it unchanged.
DRY. With a single assembler rather than two, anyone who wants to change anything about the internal bytecode format or how fixup works or anything else only has to do it once, rather than figuring out how to do the same thing in two completely different pieces of code that are intended to accomplish the same thing.
So what? Even your examples don't mutate the Instr object; they build a new one instead. It's not like "bc[i] = Instruction(LOAD_CONST, constval)" or "yield (LOAD_CONST, constval)" and so on are less readable/Pythonic/concise than "bc[i].opcode = LOAD_CONST; bc[i].argval = constval". If you really think this is important, changing the format to any iterable of iterables instead of iterable of tuples is trivial, so you can use lists instead of tuples, and make Instruction mutable, and so on. But I don't see what it buys you. Also, one more time: I'm not trying to invent the all-singing, all-dancing best-possible interface for all kinds of bytecode manipulation; I'm trying to invent the simplest portable and resilient format that people can build other APIs on top of. The dis module happens to provide a somewhat useful such API for simple manipulations, which is nice, but it will never be the best one for all manipulations, and that's fine. So I don't want to change dis any more than necessary. Your own API can diverge much more radically from dis if it wants. It just has to take the same iterable-of-tuples format as input and output; it can store things internally however it wants, which can include mutable instruction objects if you want.
In my bytecode API, you provide a name, the opcode is computed from the name.
To me, requiring something that duck-types like an int, and then providing an IntEnum of all the opcodes, seems like a much nicer API than requiring strings and providing str<->int maps. That's exactly what enums are for. But if you want to build a string API on top of the portable duck-types-as-int format instead, you can. All you have to do is emit the ints in the iterable you pass to assemble. More generally, I don't think debating all, or even any, of the design decisions of your in-progress library is at all relevant to this discussion. Unless you think you have an example of something you can do with raw bytecode that you can't do with the portable format, it doesn't affect this proposal at all.
I gave examples that show how to do various things with and without storing labels. You're looking at one of the examples without storing labels, and asking where the labels are stored in that example. For that example, I instead make sure I have a complete dis.Bytecode with its is_jump_target fields filled in, and use that, to show that labels aren't always necessary. The next few questions are mostly irrelevant, so I'm skipping most of them. In general, you're asking how your library could do block-related things directly on the portable format. In many cases, what you want to do is actually easy, but it doesn't matter, because your library isn't going to do that; it's going to take the portable format as input and output, and do things on a graph of blocks in between, and the format you use for that graph of blocks is entirely up to you, as long as you can linearize that back to an iterable of tuples as the end.
No you don't. I gave examples that resolve jumps in O(1) time, both with and without labels. But, again, who cares? Your code won't be doing this.
I'm not sure that a label is the most convenient abstraction for blocks.
And it doesn't have to be. As long as it's sufficient for you to build whatever more convenient abstraction you think you need.
OK, then I think you're right, and it was a good idea. :) And that's exactly what I've attempted to do: come up with the simplest API that can support things like byteplay, codetransformer, and bytecode so they don't have to directly manipulate the byte strings anymore, giving us more freedom to change the internal CPython format without breaking every bytecode processor in the world.

On 26 February 2016 at 15:27, Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:
Of course we already have such a format today: dis.Bytecode.
I wouldn't feel beholden to hewing too closely to the existing dis API - that was defined to solve a specific problem with making it easier to test CPython's code generation pipeline, while also providing an improved foundation for the dis text output. For those use cases, the redundancy in the API is a help, rather than a hindrance, since we can easily test and display all the values of interest. For manipulation though, the redundancy is a problem - you need to either declare some fields authoritative and implicitly derive the others, or else expect users to keep things in sync manually (which would be a pretty user hostile API). I do think it's reasonable to seek to define a standard MutableBytecode format specifically to make manipulation easier, but I don't think it makes sense to couple that to PEP 511's definition of bytecode processing. The reason I feel that way is that I consider it *entirely acceptable* for the first generation of bytecode post-processors to be based on the disassemble-manipulate-reassemble model that folks already use for bytecode manipulating function decorators, and for doing that conveniently to be dependent on 3rd party libraries, at least for the time being. If we later settle on a standard mutable bytecode format, we may also decide to introduce bytecode pre-processors that accept and produce the pre-assembly form of the bytecode, but that's something to be done as a possible compile time reduction measure *after* folks have practical experience with the easier to define post-processing approach, rather than before. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Feb 26, 2016, at 03:36, Nick Coghlan <ncoghlan@gmail.com> wrote:
I don't feel too beholden to it. In fact, I only came back to it by accident. A few weeks ago, I proposed extending dis into something like byteplay and giving it a C API so the compiler could use it too. I decided that was a mistake, and went back to trying to come up with the simplest portable format that both the compiler and Python code could use, and came up with the iterable of tuples. And then I went and looked at what it would take to make dis support that format, and it turns out to take very little. So, rather than building a new Python convenience module for portable assembly (which I don't think we want to do), or not having one and just making people deal with the tuples with no convenience features (which is also viable, but I think less desirable), let's just use dis.
The portable assembly format doesn't use Instructions, just tuples of 2 or more elements where the first three are opcode, argval, and line. A dis.Instruction object (if we reorder its attributes and make the fact that it's a namedtuple type public) fits that, which means you can use it for convenience when it's convenient (probably leaving the other fields None), but you can also just ignore it and use a plain tuple, and often that's the simplest thing. Or, of course, you can use a library like byteplay that uses dis where it's helpful but provides whatever API it wants.
I do think it's reasonable to seek to define a standard MutableBytecode format specifically to make manipulation easier
I'm not sure it is. After failing to convince Victor that byteplay really can do what he wants, I now think that it's more reasonable to define a standard MutableBytecode format only to make it easier to build third-party libraries that each make manipulation easier in different ways.
Remember the starting motivation. What's the biggest stumbling block to switching to wordcode? It breaks code that manipulates bytecode directly. Adding a portable, future-proof format doesn't do any good if, at the same time, we also add something that encourages new code that ignores that format and instead manipulates bytecode directly.
My goal is to update byteplay to use the portable iterable-of-tuples format and lean on the built-in assembler if 3.6+. Even though plenty of things are actually simple enough to write with the iterable-of-tuples portable format, byteplay still has the advantage of (a) working back to Python 2.6, and (b) working with all of the existing code I've written for it over the last half-decade, so I will continue to use it. And I'm sure many other people will use it, or other third-party libraries. But, if most of those libraries (and most code people write without third-party libraries) rely on the portable format, then they'll continue to work throughout the 3.7 development cycle rather than making users wait 3-24 months before upgrading to 3.7. And it will allow us to make more radical changes in 3.7. And it'll allow MicroPython to support most of those libraries despite a slightly different internal format. And so on.
If compile-time performance were the issue here, I'd agree. But it's not an issue--or, if it is, it's a distant fourth place behind resilience, portability, and simplicity. And to get resilience and portability, we have to use a resilient and portable format from the start, not bolt one on later as an option.
participants (5)
-
Andrew Barnert
-
Brett Cannon
-
Ethan Furman
-
Nick Coghlan
-
Victor Stinner