[Python-ideas] History on proposals for Macros?

Wed Apr 1 21:39:49 CEST 2015

On 04/01/2015 10:25 AM, Andrew Barnert wrote:
>> >    xchg = [(124, 0x65),     #LOAD_FAST to LOAD_NAME
>> >            (125, 0x5a)]     #STORE_FAST to STORE_NAME
> The first problem here is that any 124 or 125 in an operand to any
> opcode  except 124 or 125 will be matched and converted (although you'll usually
> probably get an IndexError trying to treat the next two arbitrary bytes as
> an index...).

Yes, it will only work for very simple cases. It was just enough to get the 
initial examples working.

> To solve this, you need to iterate opcode by opcode, not byte by byte.
> The dis module gives you the information to tell how many bytes to skip for
> each opcode's operands. (It also maps between opcode numbers and names, so
> you don't have to use magic numbers with comments.) Using it will solve
> this problem (and maybe others I didn't spot) and also make your code a lot
> simpler.

Unfortunately dis is written to give human output for python bytecode, not 
to edit bytecode.  But it can help.  It needs a function to go back to a 
code object after editing the instruction list.

> Another problem is that this will only catch local variables. Anything
> you don't assign to in the function but do reference is liable to end up a
> global or cell, which will still be a global or cell when you try to run it
> later. I'm not sure exactly how you want to handle these (if you just
> convert them unconditionally, then it'll be a bit surprising if someone
> writes "global spam" and doesn't get a global...), but you have to do
> something, or half your motivating examples (like "lambda: x+1") won't
> work. (Or is that why you're doing the explicit namespace thing instead of
> using the actual scope later on, so that this won't break, because the
> explicit namespace is both your locals and your globals?)

> Why are you applying these to a dict? I thought the whole point was to
> be able to run it inside a scope and affect that scope's variables? If
> you just leave out the ns, won't that be closer to what you want? (And
> it also means you don't need the get_sig thing in the first place, and
> I'm not sure what that adds. Using a function signature plus a call
> expression as a fancy way of writing a dict display seems like just
> obfuscation. Maybe with default values, *args, binding partials and then
> calling get_sig on them, etc. is interesting for something, but I'm not
> sure what?)

This was just a step in that direction.  It obviously needs more work.

There are a number of interesting aspects and directions this can go.

     * Ability to decompose functions into separate
       signature and body parts.

Not useful (or hard) in itself, but being able to reuse those parts may be 
good.

     * Use those parts together.

This is a basic test that should just work.  Again it's not really that 
useful in itself, but it's a nice to have equivalency and helps to 
demonstrate how it works.

     * Use one body with different signatures.

For example you might have signatures with different default values, or 
signatures that interface to different data formats. Rather than having a 
function that converts different data formats to fit a single signature 
format, we can just use different signatures to interface with the data 
more directly.  This is one of the things macro's do in other languages.

     * Use code objects as blocks to implement continuation like behaviours.

This is done breaking an algorithm into composable parts, then applying 
them to data. It's not quite the same as continuations, or generators, but 
has some of the same benefits.  If the blocks avoid parsing signatures and 
creating/destroying frames, it can be a fast way to translate data.  Of 
course, it's very limited as you need to have strict naming conventions to 
do this.  So it would be limited to within a scope that follows those 
conventions.   (Doable now with compile and exec, but it's awkward in my 
opinion.)

     * Use a body as a block in another function.

Yes, this requires getting the live namespace from the frame it's used in. 
   f = sys_getframe(-1) may work, but it's not that easy to do in this 
case.  When exec is given a code object, it call's PyEval_EvalCodeEx in 
ceval.c directly with local and global dictionaries.  (That should answer 
some of your comments as to why I uses the dictionary.)

It may be possible to call directly to PyEval_EvalFrameEx (like generators 
do), with a frame object.

Some or all of these may/will require the code object to be in a form that 
is more easily relocatable, but as you noted, its not easy to do.

There are a lot of 'ifs' here, but I think it may be worth exploring.

I'm going to try and make the bytecode fixer function work better (using 
dis or parts of it.)  And then put it up on github where this idea can be 
developed further.

(The other utilities I've found so far for editing bytecode aren't ported 
to python3 yet.)

I don't think editing the bytecode is the ideal solution in the long run, 
but it will identify the parts that need addressing and then other 
solutions for those could be looked into such as doing the needed 
alterations in the AST rather than the bytecode.

Cheers,
    Ron