Before I get to the reply, because I couldn't find a 3.x-compatible bytecode assembler, I slapped one together at https://github.com/abarnert/cpyasm. I think it would be reasonably possible to use this to add inline assembly to a preprocessor, but I haven't tried, because I don't have a preprocessor I actually want, and this was the fun part. :)
On Monday, June 30, 2014 5:39 PM, Chris Angelico
wrote:
On Tue, Jul 1, 2014 at 9:48 AM, Andrew Barnert
wrote: First, two quick side notes: It might be nice if the compiler were as easy to hook as the importer. Alternatively, it might be nice if there were a way to do "inline bytecode assembly" in CPython, similar to the way you do inline assembly in many C compilers, so the answer to random's question is just "asm [('BUILD_SET', 0)]" or something similar. Either of those would make this problem trivial.
That would be interesting, but it raises the possibility of mucking up the stack. (Imagine if you put BUILD_SET 1 in there instead. What's it going to make a set of? What's going to happen to the rest of the stack? Do you REALLY want to debug that?)
The same thing that happens if you use bad inline assembly in C, or a bad C extension module in Python—bad things that you can't debug at source level. And yet, inline assembly in C and C extension modules in Python are still quite useful. Of course the difference is that you can drop from the source level to the machine level pretty easily in gdb, lldb, Microsoft's debugger, etc., while you can't as easily drop from the source level to the bytecode level in pdb. (I'm not sure that wouldn't be an interesting feature to add in itself, but that's getting even farther off topic, so forget it for now.)
Back when I did a lot of C and C++ programming, I used to make good
use of a "drop to assembly" feature. There were two broad areas where I'd use it: either to access a CPU feature that the compiler and library didn't offer me (like CPUID, in its early days), or to hand-optimize some code. Then compilers got better and better, and the first set of cases got replaced with library functions... and the second lot ended up being no better than the compiler's output, and potentially a lot worse - particularly because they're non-portable. Allowing a "drop to bytecode" in CPython would have the exact same effects, I think.
I'll ignore the second case for the moment, because I think it's rarely if ever appropriate to Python, and just focus on the first. Those cases did not go away because CPUID got replaced with library functions. Those library functions—which are compiled with the same compiler you use for your code—have inline assembly in them. (Or, if you're on linux, those library functions read from a device file, but the device driver, which is compiled with the same compiler you use, has inline assembly in it.) So, the compiler still needs to be able to compile it. There are cases where that isn't true. For example, most modern compilers that care about x86 have extended the C language in some way to make it unnecessary for you to write LOCK CMPXCHG all over the place if you want to do lockfree refcounting (and, even better, they've done so in a way that also does the right thing on ARM 9 or SPARC or whatever else you care about). Or, in some cases, they've done something halfway in between, adding "compiler intrinsic functions" that look like functions, but are compiled directly into inline asm. But either way, that didn't happen until a lot of people were publishing code that used that inline assembly. Otherwise, the compiler vendors have no reason to believe it's necessary to add a new feature. Plus, people still needed to keep distributing code that uses the inline asm for years, until the oldest compiler and library on every platform they support incorporated the change they needed. And, just as you say, I think it would have the exact same effects in CPython. If we added inline bytecode asm to 3.5, and there were actually something useful to do with it, people would start doing it, and that's how we'd know that something useful was worth adding to the language, and when we added that something useful in 3.7, eventually people could start using that, and then it would be years before all of the projects that need that feature either die or require 3.7. But that's not a problem; that's inline asm working exactly as it should. There is one good reason to reject the inline asm idea: If it's unlikely that there will be anything worth using it for (or if it might plausibly be useful, but not enough so that anyone's worth doing the work). Which I think is at least plausible, and maybe likely.
Some people would use it to create an empty set, others would use it to replace variable swapping with a marginally faster and *almost* identical stack-based swap:
Do you really think anyone would do the latter? Seriously, what kind of code can you imagine that's too slow in CPython, not worth rewriting in C or running in PyPy or whatever, but fast enough with the rot opcode removed? And if someone really _did_ need that, I doubt they'd care much that Python 3.8 makes it unnecessary; they obviously have a specific deployment platform that has to work and that needed that last 3% speedup under 3.6.2, and they're going to need that to keep working for years. The former, maybe. Not just to allow ∅, but maybe someone would want to write a Unicode-math-ified Python dialect as an import-hook preprocessor that used inline asm among other tools. In which case… so what? That's not going to be something that people just randomly drop into their code, there will be a single project with however many users, which will be no worse for the Python community than Hylang. If their demonstration is just so cool that everyone decides we need Unicode symbols in Python core, then great. If not, and they still want to keep using it, well, a simpler preprocessor will be easier for the rest of us to understand than a ridiculously complicated one that does bytecode hackery, or than a hacked-up CPython compiler.
So while an inline bytecode assembler might have some uses, I suspect
it'd be an attractive nuisance more than anything else.
I honestly don't see it becoming an attractive nuisance. I can easily see it just not getting used for anything at all, beyond people playing with the interpreter. And now, on to your other replies:
On Monday, June 30, 2014 3:12 PM, Chris Angelico
wrote:
On Tue, Jul 1, 2014 at 3:18 AM,
wrote: On Sat, Jun 28, 2014, at 01:28, Chris Angelico wrote:
empty_set_literal =
type(lambda:0)(type((lambda:0).__code__)(0,0,0,3,67,b't\x00\x00d\x01\x00h\x00\x00\x83\x02\x00\x01d\x00\x00S',(None,"I'm
I think it makes more sense to use types.FunctionType and types.CodeType here than to generate two extra functions for each function, even if that means you have to put an import types at the top of every munged source file.
Sure. This is just a proof-of-concept anyway, and it's not meant to be good code. Either way works, I just tried to minimize name usage (and potential name collisions).
But I think what he was suggesting is something like this: Let py_compile.compile generate the .pyc file as normal, then munge the bytecode in that file, instead of compiling each function, munging its bytecode, and emitting source that creates the munged functions.
Besides being a lot less work, his version works for ∅ at top level, in class definitions, in lambda expressions, etc., not just for def statements. And it doesn't require finding and identifying all of the things to munge in a source file (which I assume you'd do bottom-up based on the ast.parse tree or something).
Sure. But all I was doing was responding to the implied statement that it's not possible to write a .py file that makes a function with BUILD_SET 0 in it. Translating a .pyu directly into a .pyc is still possible, but was not the proposal.
Agreed, I just think it's an _easier_ proposal than yours, not a harder one (assuming you want to actually build the real thing, not just a proof of concept), which I think is why Random suggested it. Also, again, I don't think a real project that allowed ∅ in a def but not in a lambda, class, or top-level code would be acceptable to anyone, and I don't see how your solution can be easily adapted to those cases (well, except lambda). [snip, and everything below here condensed]
What I did was put in a literal string… It uses "∅ is set()" as a marker … and the resulting function has an unnecessary const in it.
I assumed that leaving the unnecessary const behind was unacceptable. After all, we're talking about (hypothetical?) people who find the cost of LOAD_GLOBAL set; CALL_FUNCTION 0 to be unacceptable… But you're right that fixing up all the other LOAD_CONST bytecodes' args is a feasible way to solve that.
So, if the function is a closure, how do you do that? Ah, that part I've no idea about. But it wouldn't be impossible for
someone to develop that a bit further.
Not impossible, but very hard, much harder than what you've done so far. Ultimately, I think that just backs up your larger point: This is doable, but it's going to be a lot of work, and the benefit isn't even nearly worth the cost. My point is that there are other ways to do it that would be less work and/or that would have more side benefits… but the benefit still isn't even nearly worth the cost, so who cares? :)