On 04/01/2015 10:25 AM, Andrew Barnert wrote:
xchg = [(124, 0x65), #LOAD_FAST to LOAD_NAME (125, 0x5a)] #STORE_FAST to STORE_NAME
The first problem here is that any 124 or 125 in an operand to any opcode except 124 or 125 will be matched and converted (although you'll usually probably get an IndexError trying to treat the next two arbitrary bytes as an index...).
Yes, it will only work for very simple cases. It was just enough to get the initial examples working.
To solve this, you need to iterate opcode by opcode, not byte by byte. The dis module gives you the information to tell how many bytes to skip for each opcode's operands. (It also maps between opcode numbers and names, so you don't have to use magic numbers with comments.) Using it will solve this problem (and maybe others I didn't spot) and also make your code a lot simpler.
Unfortunately dis is written to give human output for python bytecode, not to edit bytecode. But it can help. It needs a function to go back to a code object after editing the instruction list.
Another problem is that this will only catch local variables. Anything you don't assign to in the function but do reference is liable to end up a global or cell, which will still be a global or cell when you try to run it later. I'm not sure exactly how you want to handle these (if you just convert them unconditionally, then it'll be a bit surprising if someone writes "global spam" and doesn't get a global...), but you have to do something, or half your motivating examples (like "lambda: x+1") won't work. (Or is that why you're doing the explicit namespace thing instead of using the actual scope later on, so that this won't break, because the explicit namespace is both your locals and your globals?)
Why are you applying these to a dict? I thought the whole point was to be able to run it inside a scope and affect that scope's variables? If you just leave out the ns, won't that be closer to what you want? (And it also means you don't need the get_sig thing in the first place, and I'm not sure what that adds. Using a function signature plus a call expression as a fancy way of writing a dict display seems like just obfuscation. Maybe with default values, *args, binding partials and then calling get_sig on them, etc. is interesting for something, but I'm not sure what?)
This was just a step in that direction. It obviously needs more work.
There are a number of interesting aspects and directions this can go.
* Ability to decompose functions into separate signature and body parts.
Not useful (or hard) in itself, but being able to reuse those parts may be good.
* Use those parts together.
This is a basic test that should just work. Again it's not really that useful in itself, but it's a nice to have equivalency and helps to demonstrate how it works.
* Use one body with different signatures.
For example you might have signatures with different default values, or signatures that interface to different data formats. Rather than having a function that converts different data formats to fit a single signature format, we can just use different signatures to interface with the data more directly. This is one of the things macro's do in other languages.
* Use code objects as blocks to implement continuation like behaviours.
This is done breaking an algorithm into composable parts, then applying them to data. It's not quite the same as continuations, or generators, but has some of the same benefits. If the blocks avoid parsing signatures and creating/destroying frames, it can be a fast way to translate data. Of course, it's very limited as you need to have strict naming conventions to do this. So it would be limited to within a scope that follows those conventions. (Doable now with compile and exec, but it's awkward in my opinion.)
* Use a body as a block in another function.
Yes, this requires getting the live namespace from the frame it's used in. f = sys_getframe(-1) may work, but it's not that easy to do in this case. When exec is given a code object, it call's PyEval_EvalCodeEx in ceval.c directly with local and global dictionaries. (That should answer some of your comments as to why I uses the dictionary.)
It may be possible to call directly to PyEval_EvalFrameEx (like generators do), with a frame object.
Some or all of these may/will require the code object to be in a form that is more easily relocatable, but as you noted, its not easy to do.
There are a lot of 'ifs' here, but I think it may be worth exploring.
I'm going to try and make the bytecode fixer function work better (using dis or parts of it.) And then put it up on github where this idea can be developed further.
(The other utilities I've found so far for editing bytecode aren't ported to python3 yet.)
I don't think editing the bytecode is the ideal solution in the long run, but it will identify the parts that need addressing and then other solutions for those could be looked into such as doing the needed alterations in the AST rather than the bytecode.