
I have absolutely *no* intention of pushing any of this for 2.3. Good lord no. For a start, these would be major feature changes ...
True. I'm ambivalent about that myself. But in that case, I would argue instead that there should not be any option to remove asserts.
Yep - I know this. I would actually suggest removing .pyo and simply have the info held in the .pyc.
One of my options is to create a PEP specifically to have it rejected. However, I think there are definitely a couple of useful things in here. In particular, it provides a path for introducing optimisations. One of the complaints I have seen recently is that all optimisations are being added to both paths. Perhaps this could be reduced to a process PEP with the following major points: 1. Any new optimisation must be introduced on the optimised path. 2. Optimisations may be promoted from the optimised path to the vanilla path at BDFL discretion. 3. Experimental optimisations in general will required at least one complete release before being promoted from the optimised path to the vanilla path. Tim Delaney

Before everyone gets too far, are there actually concrete separate optimizations we are talking about here? Or is this just "in case someone comes up with an optimization that helps" I'm a compiler hacker by hobby and job (Technically, i'm a 2nd year law student by trade, who works for IBM's TJ Watson Research Center as a GCC Hacker), and i've looked at most optimizing python compilers that have existed in the past 4-5 years (geez, have i been lurking on python-dev that long. Wow. I used to actively contribute now and then, stopped for a few years). The only one that makes any appreciable difference is Psyco (unsurprising, actually), and measurements i did (and i think this was the idea behind it) show this is because of two things 1. Removal of python overhead (ie bytecode execution vs direct machine code) 2. Removal of temporary objects (which is more powerful than it sounds, because of how it's done. Psyco simply doesn't emit code to compute something at runtime until forced. it does as much as it can at compile time, when possible. In this way, one can view it as a very powerful symbolic execution engine) In terms of improvements, starting with Psyco as your base (to be honest, doing something completely different isn't a smart idea. He's got the right idea, there's no other real way you are going to get more speed), the best you can do are the following: 1. Improve the generated machine code (IE better register allocation, better scheduling, a peephole optimizer). as for register allocation, I've never measured how often Psyco spills right now. Some platforms are all about spill code generation (x86), others are more about coalescing registers. 2. Teach it how to execute more operations at compile time (IE improve the symbolic execution engine) 3. Improve the profiling done at runtime. That's about all you can do. I've lumped all classical compiler optimizations into "improve generated machine code", since that is where you'd be able to do them (unless you want to introduce a new middle IR, which will complicate matters greatly, and probably not significantly speed things up). Number 1 can become expensive quickly for a JIT, for rapidly diminishing gains. Number 2 has the natural limit that once you've taught it how to virtualize every base python object and operation, it should be able to compute everything not in a c module given the input, and your limit becomes how good at profiling you are to choose what to specialize. Number 3 doesn't become important until you start hitting negative gains due to choosing the wrong functions to specialize. Any useful thing not involving specialization is some combination of 1. Not going to be applicable without specialization and compilation to machine code (I can think of no useful optimization that will make a significant difference at the python code level, that wouldn't be easier and faster to do at the machine code level. Python does not give enough guarantees that makes it better to optimizer python bytecode). 2. Already covered by the way it does compilation. 3. Too expensive. Couple all of this with the fact that there are a limited number of operations performed at the python level already that aren't taken care of by making a better symbolic execution engine. In short, I believe if you want to seriously talk about "adding this optimization", or "adding that optimization", that time would be better served doing something like psyco (if it's not acceptable or can't be made acceptable), where your main thing was specialization of functions, and compilation to machine code of the specialized functions. These are your only real options for speeding up python code. Diddling around at the python source or bytecode level will buy you *less* (since you still have the interpreter overhead), and be just as difficult (since you will still need to specialize to be able to know the types involved). If you want something to look at besides Psyco, see LLVM's runtime abilities (http://llvm.cs.uiuc.edu). It might also make a good backend machine code optimizer replacement for Psyco's hard-coded x86 output, because it can exploit type information. To put all of this in context, i'm assuming you aren't looking for 5-10% gains, total. Instead, i'm assuming you are looking for very significant speedups (100% or greater). If you only want 5-10%, that's easy to do at just the bytecode level, but you eventually hit the limit of the speed of bytecode execution, and from experience, you will hit it rather quickly. --Dan

Before everyone gets too far, are there actually concrete separate optimizations we are talking about here? Or is this just "in case someone comes up with an optimization that helps" I'm a compiler hacker by hobby and job (Technically, i'm a 2nd year law student by trade, who works for IBM's TJ Watson Research Center as a GCC Hacker), and i've looked at most optimizing python compilers that have existed in the past 4-5 years (geez, have i been lurking on python-dev that long. Wow. I used to actively contribute now and then, stopped for a few years). The only one that makes any appreciable difference is Psyco (unsurprising, actually), and measurements i did (and i think this was the idea behind it) show this is because of two things 1. Removal of python overhead (ie bytecode execution vs direct machine code) 2. Removal of temporary objects (which is more powerful than it sounds, because of how it's done. Psyco simply doesn't emit code to compute something at runtime until forced. it does as much as it can at compile time, when possible. In this way, one can view it as a very powerful symbolic execution engine) In terms of improvements, starting with Psyco as your base (to be honest, doing something completely different isn't a smart idea. He's got the right idea, there's no other real way you are going to get more speed), the best you can do are the following: 1. Improve the generated machine code (IE better register allocation, better scheduling, a peephole optimizer). as for register allocation, I've never measured how often Psyco spills right now. Some platforms are all about spill code generation (x86), others are more about coalescing registers. 2. Teach it how to execute more operations at compile time (IE improve the symbolic execution engine) 3. Improve the profiling done at runtime. That's about all you can do. I've lumped all classical compiler optimizations into "improve generated machine code", since that is where you'd be able to do them (unless you want to introduce a new middle IR, which will complicate matters greatly, and probably not significantly speed things up). Number 1 can become expensive quickly for a JIT, for rapidly diminishing gains. Number 2 has the natural limit that once you've taught it how to virtualize every base python object and operation, it should be able to compute everything not in a c module given the input, and your limit becomes how good at profiling you are to choose what to specialize. Number 3 doesn't become important until you start hitting negative gains due to choosing the wrong functions to specialize. Any useful thing not involving specialization is some combination of 1. Not going to be applicable without specialization and compilation to machine code (I can think of no useful optimization that will make a significant difference at the python code level, that wouldn't be easier and faster to do at the machine code level. Python does not give enough guarantees that makes it better to optimizer python bytecode). 2. Already covered by the way it does compilation. 3. Too expensive. Couple all of this with the fact that there are a limited number of operations performed at the python level already that aren't taken care of by making a better symbolic execution engine. In short, I believe if you want to seriously talk about "adding this optimization", or "adding that optimization", that time would be better served doing something like psyco (if it's not acceptable or can't be made acceptable), where your main thing was specialization of functions, and compilation to machine code of the specialized functions. These are your only real options for speeding up python code. Diddling around at the python source or bytecode level will buy you *less* (since you still have the interpreter overhead), and be just as difficult (since you will still need to specialize to be able to know the types involved). If you want something to look at besides Psyco, see LLVM's runtime abilities (http://llvm.cs.uiuc.edu). It might also make a good backend machine code optimizer replacement for Psyco's hard-coded x86 output, because it can exploit type information. To put all of this in context, i'm assuming you aren't looking for 5-10% gains, total. Instead, i'm assuming you are looking for very significant speedups (100% or greater). If you only want 5-10%, that's easy to do at just the bytecode level, but you eventually hit the limit of the speed of bytecode execution, and from experience, you will hit it rather quickly. --Dan
participants (2)
-
Daniel Berlin
-
Delaney, Timothy C (Timothy)