Re: [Python-ideas] History on proposals for Macros?

1 Apr 2015

      On 04/01/2015 10:25 AM, Andrew Barnert wrote:
...
...
...
xchg = [(124, 0x65),     #LOAD_FAST to LOAD_NAME
           (125, 0x5a)]     #STORE_FAST to STORE_NAME
The first problem here is that any 124 or 125 in an operand to any
opcode  except 124 or 125 will be matched and converted (although you'll usually
probably get an IndexError trying to treat the next two arbitrary bytes as
an index...).
Yes, it will only work for very simple cases. It was just enough to get the 
initial examples working.
...
To solve this, you need to iterate opcode by opcode, not byte by byte.
The dis module gives you the information to tell how many bytes to skip for
each opcode's operands. (It also maps between opcode numbers and names, so
you don't have to use magic numbers with comments.) Using it will solve
this problem (and maybe others I didn't spot) and also make your code a lot
simpler.
Unfortunately dis is written to give human output for python bytecode, not 
to edit bytecode.  But it can help.  It needs a function to go back to a 
code object after editing the instruction list.
...
Another problem is that this will only catch local variables. Anything
you don't assign to in the function but do reference is liable to end up a
global or cell, which will still be a global or cell when you try to run it
later. I'm not sure exactly how you want to handle these (if you just
convert them unconditionally, then it'll be a bit surprising if someone
writes "global spam" and doesn't get a global...), but you have to do
something, or half your motivating examples (like "lambda: x+1") won't
work. (Or is that why you're doing the explicit namespace thing instead of
using the actual scope later on, so that this won't break, because the
explicit namespace is both your locals and your globals?)
...
Why are you applying these to a dict? I thought the whole point was to
be able to run it inside a scope and affect that scope's variables? If
you just leave out the ns, won't that be closer to what you want? (And
it also means you don't need the get_sig thing in the first place, and
I'm not sure what that adds. Using a function signature plus a call
expression as a fancy way of writing a dict display seems like just
obfuscation. Maybe with default values, *args, binding partials and then
calling get_sig on them, etc. is interesting for something, but I'm not
sure what?)
This was just a step in that direction.  It obviously needs more work.

There are a number of interesting aspects and directions this can go.

     * Ability to decompose functions into separate
       signature and body parts.

Not useful (or hard) in itself, but being able to reuse those parts may be 
good.

     * Use those parts together.

This is a basic test that should just work.  Again it's not really that 
useful in itself, but it's a nice to have equivalency and helps to 
demonstrate how it works.

     * Use one body with different signatures.

For example you might have signatures with different default values, or 
signatures that interface to different data formats. Rather than having a 
function that converts different data formats to fit a single signature 
format, we can just use different signatures to interface with the data 
more directly.  This is one of the things macro's do in other languages.

     * Use code objects as blocks to implement continuation like behaviours.

This is done breaking an algorithm into composable parts, then applying 
them to data. It's not quite the same as continuations, or generators, but 
has some of the same benefits.  If the blocks avoid parsing signatures and 
creating/destroying frames, it can be a fast way to translate data.  Of 
course, it's very limited as you need to have strict naming conventions to 
do this.  So it would be limited to within a scope that follows those 
conventions.   (Doable now with compile and exec, but it's awkward in my 
opinion.)

     * Use a body as a block in another function.

Yes, this requires getting the live namespace from the frame it's used in. 
   f = sys_getframe(-1) may work, but it's not that easy to do in this 
case.  When exec is given a code object, it call's PyEval_EvalCodeEx in 
ceval.c directly with local and global dictionaries.  (That should answer 
some of your comments as to why I uses the dictionary.)

It may be possible to call directly to PyEval_EvalFrameEx (like generators 
do), with a frame object.

Some or all of these may/will require the code object to be in a form that 
is more easily relocatable, but as you noted, its not easy to do.

There are a lot of 'ifs' here, but I think it may be worth exploring.

I'm going to try and make the bytecode fixer function work better (using 
dis or parts of it.)  And then put it up on github where this idea can be 
developed further.

(The other utilities I've found so far for editing bytecode aren't ported 
to python3 yet.)

I don't think editing the bytecode is the ideal solution in the long run, 
but it will identify the parts that need addressing and then other 
solutions for those could be looked into such as doing the needed 
alterations in the AST rather than the bytecode.

Cheers,
    Ron

Re: [Python-ideas] History on proposals for Macros?

Ron Adam