[Python-Dev] Thoughts on -O
Daniel Berlin
dberlin@dberlin.org
Tue, 29 Apr 2003 02:48:01 -0400
>>
>
> Yep - I know this. I would actually suggest removing .pyo and simply
> have the info held in the .pyc.
>
>>> Anyway, any thoughts, rebuttals, etc would be of interest. I'd like
>>> to get some discussion before I create a PEP.
>>
>> I'm not convinced that we need anything, given the minimal effect of
>> most currently available optimizations.
>
> One of my options is to create a PEP specifically to have it rejected.
>
> However, I think there are definitely a couple of useful things in
> here. In particular, it provides a path for introducing optimisations.
> One of the complaints I have seen recently is that all optimisations
> are being added to both paths.
>
> Perhaps this could be reduced to a process PEP with the following
> major points:
>
> 1. Any new optimisation must be introduced on the optimised path.
>
> 2. Optimisations may be promoted from the optimised path to the
> vanilla path at BDFL discretion.
>
> 3. Experimental optimisations in general will required at least one
> complete release before being promoted from the optimised path to the
> vanilla path.
Before everyone gets too far, are there actually concrete separate
optimizations we are talking about here?
Or is this just "in case someone comes up with an optimization that
helps"
I'm a compiler hacker by hobby and job (Technically, i'm a 2nd year law
student by trade, who works for IBM's TJ Watson Research Center as a
GCC Hacker), and i've looked at most optimizing python compilers that
have existed in the past 4-5 years (geez, have i been lurking on
python-dev that long. Wow. I used to actively contribute now and then,
stopped for a few years).
The only one that makes any appreciable difference is Psyco
(unsurprising, actually), and measurements i did (and i think this was
the idea behind it) show this is because of two things
1. Removal of python overhead (ie bytecode execution vs direct machine
code)
2. Removal of temporary objects (which is more powerful than it sounds,
because of how it's done. Psyco simply doesn't emit code to compute
something at runtime until forced. it does as much as it can at compile
time, when possible. In this way, one can view it as a very powerful
symbolic execution engine)
In terms of improvements, starting with Psyco as your base (to be
honest, doing something completely different isn't a smart idea. He's
got the right idea, there's no other real way you are going to get more
speed), the best you can do are the following:
1. Improve the generated machine code (IE better register allocation,
better scheduling, a peephole optimizer). as for register allocation,
I've never measured how often Psyco spills right now. Some platforms
are all about spill code generation (x86), others are more about
coalescing registers.
2. Teach it how to execute more operations at compile time (IE improve
the symbolic execution engine)
3. Improve the profiling done at runtime.
That's about all you can do. I've lumped all classical compiler
optimizations into "improve generated machine code", since that is
where you'd be able to do them (unless you want to introduce a new
middle IR, which will complicate matters greatly, and probably not
significantly speed things up). Number 1 can become expensive quickly
for a JIT, for rapidly diminishing gains. Number 2 has the natural
limit that once you've taught it how to virtualize every base python
object and operation, it should be able to compute everything not in a
c module given the input, and your limit becomes how good at profiling
you are to choose what to specialize. Number 3 doesn't become
important until you start hitting negative gains due to choosing the
wrong functions to specialize.
Any useful thing not involving specialization is some combination of
1. Not going to be applicable without specialization and compilation to
machine code (I can think of no useful optimization that will make a
significant difference at the python code level, that wouldn't be
easier and faster to do at the machine code level. Python does not give
enough guarantees that makes it better to optimizer python bytecode).
2. Already covered by the way it does compilation.
3. Too expensive.
Couple all of this with the fact that there are a limited number of
operations performed at the python level already that aren't taken care
of by making a better symbolic execution engine.
In short, I believe if you want to seriously talk about "adding this
optimization", or "adding that optimization", that time would be better
served doing something like psyco (if it's not acceptable or can't be
made acceptable), where your main thing was specialization of
functions, and compilation to machine code of the specialized
functions. These are your only real options for speeding up python
code.
Diddling around at the python source or bytecode level will buy you
*less* (since you still have the interpreter overhead), and be just as
difficult (since you will still need to specialize to be able to know
the types involved). If you want something to look at besides Psyco,
see LLVM's runtime abilities (http://llvm.cs.uiuc.edu). It might also
make a good backend machine code optimizer replacement for Psyco's
hard-coded x86 output, because it can exploit type information.
To put all of this in context, i'm assuming you aren't looking for
5-10% gains, total. Instead, i'm assuming you are looking for very
significant speedups (100% or greater).
If you only want 5-10%, that's easy to do at just the bytecode level,
but you eventually hit the limit of the speed of bytecode execution,
and from experience, you will hit it rather quickly.
--Dan