[Python-Dev] Thoughts on -O

Tue, 29 Apr 2003 02:48:01 -0400

>>
>
> Yep - I know this. I would actually suggest removing .pyo and simply 
> have the info held in the .pyc.
>
>>> Anyway, any thoughts, rebuttals, etc would be of interest. I'd like
>>> to get some discussion before I create a PEP.
>>
>> I'm not convinced that we need anything, given the minimal effect of
>> most currently available optimizations.
>
> One of my options is to create a PEP specifically to have it rejected.
>
> However, I think there are definitely a couple of useful things in 
> here. In particular, it provides a path for introducing optimisations. 
> One of the complaints I have seen recently is that all optimisations 
> are being added to both paths.
>
> Perhaps this could be reduced to a process PEP with the following 
> major points:
>
> 1. Any new optimisation must be introduced on the optimised path.
>
> 2. Optimisations may be promoted from the optimised path to the 
> vanilla path at BDFL discretion.
>
> 3. Experimental optimisations in general will required at least one 
> complete release before being promoted from the optimised path to the 
> vanilla path.

Before everyone gets too far, are there actually concrete separate 
optimizations we are talking about here?
Or is this just "in case someone comes up with an optimization that 
helps"
I'm a compiler hacker by hobby and job (Technically, i'm a 2nd year law 
student by trade, who works for IBM's TJ Watson Research Center as a 
GCC Hacker), and i've looked at most optimizing python compilers that 
have existed in the past 4-5 years (geez, have i been lurking on 
python-dev that long. Wow.  I used to actively contribute now and then, 
stopped for a few years).
The only one that makes any appreciable difference is Psyco 
(unsurprising, actually), and measurements i did (and i think this was 
the idea behind it) show this is because of two things
1. Removal of python overhead (ie bytecode execution vs direct machine 
code)
2. Removal of temporary objects (which is more powerful than it sounds, 
because of how it's done.  Psyco simply doesn't emit code to compute 
something at runtime until forced. it does as much as it can at compile 
time, when possible.  In this way, one can view it as a very powerful 
symbolic execution engine)

In terms of improvements, starting with Psyco as your base (to be 
honest, doing something completely different isn't a smart idea.  He's 
got the right idea, there's no other real way you are going to get more 
speed), the best you can do are the following:
1. Improve the generated machine code (IE better register allocation, 
better scheduling, a peephole optimizer).  as for register allocation, 
I've never measured how often Psyco spills right now.  Some platforms 
are all about spill code generation (x86), others are more about 
coalescing registers.
2. Teach it how to execute more operations at compile time (IE improve 
the symbolic execution engine)
3. Improve the profiling done at runtime.

That's about all you can do. I've lumped all classical compiler 
optimizations into "improve generated machine code", since that is 
where you'd be able to do them (unless you want to introduce a new 
middle IR, which will complicate matters greatly, and probably not 
significantly speed things up).  Number 1 can become expensive quickly 
for a JIT, for rapidly diminishing gains.  Number 2 has the natural 
limit that once you've taught it how to virtualize every base python 
object and operation, it should be able to compute everything not in a 
c module given the input, and your limit becomes how good at profiling 
you are to choose what to specialize.  Number 3 doesn't become 
important until you start hitting negative gains due to choosing the 
wrong functions to specialize.

Any useful thing not involving specialization is some combination of
1. Not going to be applicable without specialization and compilation to 
machine code (I can think of no useful optimization that will make a 
significant difference at the python code level, that wouldn't be 
easier and faster to do at the machine code level. Python does not give 
enough guarantees that makes it better to optimizer python bytecode).
2. Already covered by the way it does compilation.
3. Too expensive.

Couple all of this with the fact that there are a limited number of 
operations performed at the python level already that aren't taken care 
of by making a better symbolic execution engine.

In short, I believe if you want to seriously talk about "adding this 
optimization", or "adding that optimization", that time would be better 
served doing something like psyco (if it's not acceptable or can't be 
made acceptable), where your main thing was specialization of 
functions, and compilation to machine code of the specialized 
functions.  These are your only real options for speeding up python 
code.
Diddling around at the python source or bytecode level will buy you 
*less* (since you still have the interpreter overhead), and be just as 
difficult  (since you will still need to specialize to be able to know 
the types involved).  If you want something to look at besides Psyco, 
see LLVM's runtime abilities (http://llvm.cs.uiuc.edu).  It might also 
make a good backend machine code optimizer replacement for Psyco's 
hard-coded x86 output, because it can exploit type information.

To put all of this in context, i'm assuming you aren't looking for 
5-10% gains, total. Instead, i'm assuming you are looking for very 
significant speedups (100% or greater).

If you only want 5-10%, that's easy to do at just the bytecode level, 
but you eventually hit the limit of the speed of bytecode execution, 
and from experience, you will hit it rather quickly.
--Dan