[Python-ideas] .pyu nicode syntax symbols (was Re: Empty set, Empty dict)

Andrew Barnert abarnert at yahoo.com
Tue Jul 1 01:48:14 CEST 2014


First, two quick side notes:

It might be nice if the compiler were as easy to hook as the importer. Alternatively, it might be nice if there were a way to do "inline bytecode assembly" in CPython, similar to the way you do inline assembly in many C compilers, so the answer to random's question is just "asm [('BUILD_SET', 0)]" or something similar. Either of those would make this problem trivial.

I doubt either of those would be useful often enough that anyone wants to put in the work. But then I doubt the empty-set literal would be either, so anyone who seriously wants to work on this might want to work on the inline assembly and/or hookable compiler first.

Anyway:

On Monday, June 30, 2014 3:12 PM, Chris Angelico <rosuav at gmail.com> wrote:
>On Tue, Jul 1, 2014 at 3:18 AM,  <random832 at fastmail.us> wrote:

>> On Sat, Jun 28, 2014, at 01:28, Chris Angelico wrote:
>>> empty_set_literal =
>>> type(lambda:0)(type((lambda:0).__code__)(0,0,0,3,67,b't\x00\x00d\x01\x00h\x00\x00\x83\x02\x00\x01d\x00\x00S',(None,"I'm

I think it makes more sense to use types.FunctionType and types.CodeType here than to generate two extra functions for each function, even if that means you have to put an import types at the top of every munged source file.

>> If you're embedding the entire compiler (in fact, a modified one) in
>> your tool, why not just output a .pyc?
>
>I'm not, I'm calling on the normal compiler. Also, I'm not familiar
>with the pyc format, nor with any of the potential pit-falls of that
>approach. But if someone wants to make an "alternative front end that
>makes a .pyc file" kind of thing, they're most welcome to.


The tricky bit with making a .pyc file is generating the header information—last I checked (quite a while ago, and not that deeply…) that wasn't documented, and there were no helpers exported to Python.

But I think what he was suggesting is something like this: Let py_compile.compile generate the .pyc file as normal, then munge the bytecode in that file, instead of compiling each function, munging its bytecode, and emitting source that creates the munged functions.


Besides being a lot less work, his version works for ∅ at top level, in class definitions, in lambda expressions, etc., not just for def statements. And it doesn't require finding and identifying all of the things to munge in a source file (which I assume you'd do bottom-up based on the ast.parse tree or something).

But either way, this still doesn't solve the big problem. Compiling a function by hand and then tweaking the bytecode is easy; doing it programmatically is more painful. You obviously need the function to compile, so you have to replace the ∅ with something else whose bytecode you can search-and-replace. But what? That something else has to be valid in an expression context (so it compiles), has to compile to a 3-byte opcode (otherwise, replacing it will screw up any jump targets that point after it), can't add any globals/constants/etc. to the list (otherwise, removing it will screw up any LOAD_FOO statements that refer to a higher-numbered foo), and can't appear anywhere in the code being compiled.

The only thing I can think of off the top of my head is to replace it with whichever of [], (), or {} doesn't appear anywhere in the code being compiled, then you can search-replace BUILD_LIST/TUPLE/MAP 0 with BUILD_SET 0. But what if all three appear in the code? Raise a SyntaxError('Cannot use all 4 kinds of empty literals in the same scope')?

One more thing that I'm sure you thought of, but may not have thought through all the way: To make this generally useful, you can't just hardcode creating a zero-arg top-level function; you need to copy all of the code and function constructor arguments from the compiled function. 

So, if the function is a closure, how do you do that? You need to pass a list of closure cell objects that bind to the appropriate co_cellvars from the current frame, and I don't think there's a way to do that from Python. So, you need to do that by bytecode-hacking the outer function in the same way, just so it can build the inner function. And, even if you could build closure cells, once you've replaced the inner function definition with a function constructor from bytecode, when the resulting code gets compiled, it won't have any cellvars anymore.

And going back to the top, all of these problems are why I think random's solution would be a lot easier than yours, but why my solution (first build compiler hooks or inline assembly, then use that to implement the empty set trivially) would be no harder than either (and a lot more generally useful), and also why I think this really isn't worth doing.


More information about the Python-ideas mailing list