On Tuesday, July 1, 2014 1:39 AM, Chris Angelico email@example.com wrote:
On Tue, Jul 1, 2014 at 6:04 PM, Andrew Barnert firstname.lastname@example.org wrote:
On Monday, June 30, 2014 5:39 PM, Chris Angelico
That would be interesting, but it raises the possibility of mucking up the stack. (Imagine if you put BUILD_SET 1 in there instead. What's
going to make a set of? What's going to happen to the rest of the stack? Do you REALLY want to debug that?)
The same thing that happens if you use bad inline assembly in C, or a bad C
extension module in Python—bad things that you can't debug at source level. And yet, inline assembly in C and C extension modules in Python are still quite useful.
Right, useful but it adds another set of problems. (Just out of curiosity, what protection _is_ there for a smashed stack? I just tried fiddling with it and didn't manage to crash stuff.)
I believe there are cases where the interpreter can detect that you've gone below 0 and raise an exception, but in general there's no protection, or at least nothing you can count on.
For example, assemble this code as a complete function:
CALL_FUNCTION 1 RETURN_VALUE
In 3.4.1, on my Mac, I get a bus error.
But, even when you don't manage to crash the interpreter, when you just confuse it at the bytecode level, there's still no way to debug that except by dropping to gdb/lldb/etc.
I'll ignore the second case for the moment, because I think it's
rarely if ever appropriate to Python, and just focus on the first. Those cases did not go away because CPUID got replaced with library functions. Those library functions—which are compiled with the same compiler you use for your code—have inline assembly in them. (Or, if you're on linux, those library functions read from a device file, but the device driver, which is compiled with the same compiler you use, has inline assembly in it.) So, the compiler still needs to be able to compile it.
Or those library functions are written in assembly language directly. It's entirely possible to write something that uses CPUID and doesn't use inline assembly in a C source file. The equivalent here, I suppose, would be hand-rolling a .pyc file.
Yeah, that's entirely possible, but that's not how the linux device driver or the FreeBSD libc wrapper do it; they use inline assembly. Why? Well, for one thing, you get the function prolog and epilog code appropriate for your compiler automatically, instead of having to write it yourself. Also, you can do nice things like cast the result to a struct that you defined in C (which could be done with, e.g., a C macro wrapping the assembly source, but that's just making things more complicated for no benefit). And you don't need to know how to configure and run an assembler alongside the C compiler to build the device. And so on. Basically, the C versions of the exact same reasons you wouldn't want to hand-roll a .pyc file in Python…
- Do I think anyone would, if given the ability to tweak the
bytecode, go "Ah ha!" and proudly improve on what the compiler has done, and then brag about the performance improvement? Definitely. Someone will. It'll make some marginal difference to a microbenchmark, and if you don't believe that would cause people to warp their code into utter unreadability, you clearly don't hang out on python-list enough :)
Using ctypes to load Python.so to swap the pointers under the covers is already significantly faster, and would still be significantly faster than your optimized bytecode, and yes, people have suggested it on at least two StackOverflow questions. For that matter, you can already do exactly your optimization with a relatively simple bytecode hack, which would look a lot worse than the inline asm and have the same effect. Also, that bytecode hack could be factored out into a function, without any performance cost except a constant cost at .pyc time, while the inline asm obviously can't, another reason the inline asm (which would have to be written inline, and edited to fit the variables in question, each time) would be less of an attractive nuisance than what's already there. Sure, there may be a few people who are looking for horrible micro-optimizations like this, would know enough to figure out how to do this with inline asm, would not know how to do it with bytecode hacks, would not know any of the better (as in much worse, to anyone but them) alternatives, etc., but I think that number is vanishingly small.
What I did was put in a literal string…
It uses "∅ is set()" as a marker … and the resulting function has an unnecessary const in it.
I assumed that leaving the unnecessary const behind was unacceptable. After
all, we're talking about (hypothetical?) people who find the cost of LOAD_GLOBAL set; CALL_FUNCTION 0 to be unacceptable… But you're right that fixing up all the other LOAD_CONST bytecodes' args is a feasible way to solve that.
I'm not sure whether the problem is the cost of LOAD_GLOBAL followed by CALL_FUNCTION (and, by the way, one unnecessary constant in the function won't have anything like that cost - a bit of wasted RAM, but not a function call), or the fact that such a style is vulnerable to shadowing of the name 'set', which admittedly is a very useful name. But in any case, it's quite solvable.
I realize the cost of an extra LOAD_GLOBAL is much smaller than an extra CALL_FUNCTION, it's just that I think in 99.9999% of real cases neither will make a difference, and anyone who's objecting to the latter on principle will probably also object to the former on principle…
So, if the function is a closure, how do you do that?
Ah, that part I've no idea about. But it wouldn't be impossible
someone to develop that a bit further.
Not impossible, but very hard, much harder than what you've done so
Ultimately, I think that just backs up your larger point: This is doable,
but it's going to be a lot of work, and the benefit isn't even nearly worth the cost. My point is that there are other ways to do it that would be less work and/or that would have more side benefits… but the benefit still isn't even nearly worth the cost, so who cares? :)
Yep. Maybe someone (great, that probably means me) should write this up into a PEP for immediate rejection or withdrawal, just to be a document to point to - if you want an empty set literal, answer these objections.
I think Terry Reedy actually had a better answer: just tell people to implement it, polish it up, put it on PyPI, and come back to us when they're ready to show off their tons of users who can't live without it. Random objected that wasn't possible, in which case Terry's idea is more of a dismissal than a helpful suggestion, but I think https://github.com/abarnert/emptyset proves that it is possible, and even pretty easy.