On Wednesday, April 1, 2015 12:40 PM, Ron Adam <ron3200@gmail.com> wrote:
On 04/01/2015 10:25 AM, Andrew Barnert wrote:
xchg = [(124, 0x65), #LOAD_FAST to LOAD_NAME (125, 0x5a)] #STORE_FAST to STORE_NAME The first problem here is that any 124 or 125 in an operand to any opcode except 124 or 125 will be matched and converted (although you'll usually probably get an IndexError trying to treat the next two arbitrary bytes as an index...).
Yes, it will only work for very simple cases. It was just enough to get the initial examples working.
To solve this, you need to iterate opcode by opcode, not byte by byte. The dis module gives you the information to tell how many bytes to skip for each opcode's operands. (It also maps between opcode numbers and names, so you don't have to use magic numbers with comments.) Using it will solve this problem (and maybe others I didn't spot) and also make your code a lot simpler.
Unfortunately dis is written to give human output for python bytecode, not to edit bytecode. But it can help. It needs a function to go back to a code object after editing the instruction list.
No; even without doing all the work for you, dis still provides more than enough information to be useful. For example: xchg = {'LOAD_FAST': 'LOAD_NAME', 'STORE_FAST': 'STORE_NAME'} bcode = bytearray(co_code) for instr in dis.Bytecode(co_code): try: newop = xchg[instr.opname] except KeyError: pass else: index = instr.arg char = varnames[index] names = names + (char,) index = names.index(char) b[instr.offset] = dis.opmap[newop] b[instr.offset+1:instr.offset+3] = struct.pack('>H', index) That does everything your big loop did, except that it doesn't generate crashing bytecode if you have 257 names or if any of your arguments are 124, 125, or [31744, 32256). (It still doesn't handle > 65536 names via EXTENDED_ARG, but last time I tested, albeit in 2.earlyish, neither did the interpreter itself, so that should be fine... If not, it's not too hard to add that too.)
Why are you applying these to a dict? I thought the whole point was to be able to run it inside a scope and affect that scope's variables? If you just leave out the ns, won't that be closer to what you want? (And it also means you don't need the get_sig thing in the first place, and I'm not sure what that adds. Using a function signature plus a call expression as a fancy way of writing a dict display seems like just obfuscation. Maybe with default values, *args, binding partials and then calling get_sig on them, etc. is interesting for something, but I'm not sure what?)
This was just a step in that direction. It obviously needs more work.
There are a number of interesting aspects and directions this can go.
* Ability to decompose functions into separate signature and body parts.
Not useful (or hard) in itself, but being able to reuse those parts may be good.
Again, why? If your goal is to be able to declare a body to be used inline in another scope, what will you ever need these signature objects for?
* Use those parts together.
This is a basic test that should just work. Again it's not really that useful in itself, but it's a nice to have equivalency and helps to demonstrate how it works.
* Use one body with different signatures.
For example you might have signatures with different default values, or signatures that interface to different data formats. Rather than having a function that converts different data formats to fit a single signature format, we can just use different signatures to interface with the data more directly. This is one of the things macro's do in other languages.
All your signature objects can do is return a dict that you can eval a code block in, instead of evaling it in the current frame. If your goal is to eval it in the current frame, what good does that dict do you?
* Use code objects as blocks to implement continuation like behaviours.
This is done breaking an algorithm into composable parts, then applying them to data. It's not quite the same as continuations, or generators, but has some of the same benefits. If the blocks avoid parsing signatures and creating/destroying frames, it can be a fast way to translate data. Of course, it's very limited as you need to have strict naming conventions to do this. So it would be limited to within a scope that follows those conventions. (Doable now with compile and exec, but it's awkward in my opinion.)
Sure, but again, the (transformed) code object alone already does exactly that. If the signature object added something (like being able to avoid the strict naming conventions, maybe?), it would be helpful, but it doesn't; you can do exactly the same thing with just the code object that you can do with both objects.
* Use a body as a block in another function.
Yes, this requires getting the live namespace from the frame it's used in.
That's trivial. If you just call eval with the default arguments, it gets the live namespace from the frame it's used in. If you want to wrap up eval in a function that does the exact same thing eval does, then you need to manually go up one frame. I'm not sure why you want to do that, but it's easy.
f = sys_getframe(-1) may work, but it's not that easy to do in this case.
def run_code_obj(code): loc = sys._getframe(-1).f_locals return eval(code, loc) How is that not easy?
When exec is given a code object, it call's PyEval_EvalCodeEx in ceval.c directly with local and global dictionaries. (That should answer some of your comments as to why I uses the dictionary.)
Yes, and running the block of code directly with the local and global dictionaries is exactly what you want it to do, so why are you telling it not to do that? For example: def macro(): x += 1 code = fix_code(macro.__code__) def f(code_obj): x = 1 loc = locals() eval(code_obj) return loc['x'] (Or, if you prefer, use "run_code_obj" instead of "eval".) The problem here is that if you just "return x" at the end instead of "return loc['x']", you will likely see 1 instead of 2. It's the same problem you get if you "exec('x += 1')", exactly as described in the docs. That happens because f was compiled to look up x by index in the LOAD_FAST locals array, instead of by name in the locals dict, but your modified code objects mutate only the dict, not the array. That's the big problem you need to solve. Adding more layers of indirection doesn't get you any closer to fixing it.
It may be possible to call directly to PyEval_EvalFrameEx (like generators
do), with a frame object.
No. You've already got access to exactly the same information that you'd have that way. The problem is that you converted all the STORE_FAST instructions to STORE_NAME, and that means you're ignoring the array of fast locals and only using the dict, which means that the calling code won't see your changes. One way you could solve this is by applying the same code conversion to any function that wants to _use_ a code block that you apply to one that wants to be _used as_ a code block (except the former would return wraps(types.FunctionType(code, f.__globals__, ...), ...) instead of just returning code). It's ugly, and it's a burden on the user, and it makes everything slower (and it may break functions that use normal closures plus your code block things), but it's the only thing that could possibly work. If you want to use STORE_NAME, the caller has to use LOAD_NAME.
Some or all of these may/will require the code object to be in a form that
is more easily relocatable, but as you noted, its not easy to do.
If you want to be able to run these code blocks in unmodified functions (without radically changing the interpreter), then yes, you need to affect the caller's LOAD_FAST variables, which means you need to do a STORE_FAST with the caller's index for the variable, and you don't have the caller's index until call time, which means you need relocation. It isn't really _necessary_ to make the code easily relocatable, it just makes the relocation (which is necessary) easier and more efficient. For example, at definition time, you can build a table like: {'x': (1, 8)} So at call time, all you have to do is: names = {name: index for index, name in enumerate(sys._getframe(-1).f_code.co_names)} b = bytearray(c.co_code) for name, offsets in relocs.items(): index = names[name] for offset in offsets: b[offset:offset+2] = struct.pack('>H', index) code = types.CodeType(blah, blah, bytes(b), blah) (Note that this will raise a KeyError if the called code block references a variable that doesn't exist in the calling scope; you may want to catch that and reraise it as a different exception. Also note that, as I explained before, you may want to map NAME/GLOBAL/CELL lookups to FAST lookups--almost the exact opposite of what you're doing--so that code like "def f(): return x+1" sees the calling function's local x, not the global or cell x at definition time, but that's tricky because you probably want "def f(): global x; return x+1" to see the global x...) _Now_ you face the problem that you need to run this on the actual calling frame, rather than what exec does (effectively, run it on a temporary frame with the same locals and globals dicts). And I think that will require extending the interpreter. But all the stuff you wrote above isn't a step in the direction of doing that, it's a step _away_ from that. Once you have that new functionality, you will not want a code object that's converted all FAST variables to NAME variables, or a signature object that gives you a different set of locals to use than the ones you want, or anything like that; you will want a code object that leaves FAST variables as FAST variables but renumbers them, and uses the frame's variables rather than a different namespace.
There are a lot of 'ifs' here, but I think it may be worth exploring.
I'm going to try and make the bytecode fixer function work better (using dis or parts of it.) And then put it up on github where this idea can be developed further.
(The other utilities I've found so far for editing bytecode aren't ported to python3 yet.)
As I said in my previous message, there are at least three incomplete ports of byteplay to 3.x. I think https://github.com/serprex/byteplay works on 3.2, but not 3.3 or 3.4. https://github.com/abarnert/byteplay works on 3.4, but mishandles certain constructions where 2.7+/3.3+ optimizes try/with statements (try running it on the stdlib, and you'll see exceptions on three modules) that I'm pretty sure won't affect your examples. At any rate, while fixing and using byteplay (or replacing it with something new that requires 3.4+ dis, or 2.7/3.3 with the dis 3.4 backport, and avoids all the hacky mucking around trying to guess at stack effects) might make your code nicer, I don't think it's necessary; what you need is a small subset of what it can do (e.g., you're not inserting new instructions and renumbering all the jump offsets, or adding wrapping statements in try blocks, etc.), so you could just cannibalize it to borrow the parts you need and ignore the rest.
I don't think editing the bytecode is the ideal solution in the long run, but it will identify the parts that need addressing and then other solutions for those could be looked into such as doing the needed alterations in the AST rather than the bytecode.
If you want to be able to convert functions to code blocks at runtime (which is inherent in using a decorator), the bytecode is all you have. If you want to capture the AST, you need to do at import/compile time. If you're going to do that, MacroPy already does an amazing job of that, so why reinvent the wheel? (If there's some specific problem with MacroPy that you don't think can be solved without a major rearchitecture, I'll bet Haoyi Li would like to know about it...) And, more importantly, why put all this work into something completely different, which has a completely different set of problems to solve, if you're just going to throw it out later? For example, all the problems with renumbering variables indices or converting between different kinds of variables that you're solving here won't help you identify anything relevant to an AST-based solution, where variables are still just Name(id='x').