[summerofcode] High-level python optimizer.
Vova
physics at list.ru
Wed Jun 8 17:08:19 CEST 2005
Hello !
This is my project proposal: High-level python optimizer.
1. Pre-history.
Python optimization is difficult task because python's dynamic nature.
Applying many of optimization techniques used in other (static typed)
languages to python is almost impossible. There was an attempt to create
python optimizer that works on the byte-code
(http://www.foretec.com/python/workshops/1998-11/proceedings/papers/montanaro/montanaro.html),
but it speeds-up execution only by a few percent.
Different approach is taken in the psyco project. It implements JIT
compilation using run-time specialization. The results is very impressive.
But psyco does only simple local run-time optimizations, it can't look at the
whole program.
2. Project proposal.
My proposal is to create high-level optimizer, that will work on the source
tree of the whole program. It will transform the source tree to data-flow
graph that represents the flow of the data from program input (in general
sence) to output. The goal is elimination of all unnecessarily dynamics i.e.
making assumptions about actual types (and may be values) of variables and
their life-times. Of course this can't be done for any variable in the
program, but (usually) for most of them (just try to open any python program
and traverse it from the very beginning visiting any called function). Once
we have got this information we can apply a lot of traditional well-known
optimization techniques as well as python-specific ones. Then we can compile
and run optimized tree, or transform it back to python program. We can still
use psyco for JIT, and maybe we can even provide psyco with gathered
information about variable types.
3. Implementation notes.
This approach for optimization will be the most efficient if applied to
complete python program (not the stand-alone library) and if the sources of
all used modules will be available to optimizer. The former is needed to
extract the maximum information from input data, stand-alone modules can be
optimized too if it's acceptable input types are somehow specified. The
source of used modules is needed to extract information from used functions
such as possible return types, exceptions and side-effects and to make
partial evaluated version of module's functions. To work better with C
modules one can specify needed information about module's function to the
optimizer.
So the actual implementation of such optimizer is divided into three parts:
(a) creating general framework that will handle building data-flow graph,
gathering information and then applying (pluggable) optimizer
(b) Writing optimizers
(c) describing information about standard modules written in C.
Part (a) is the most important and interesting part of the project, parts (b)
and (c) can be always simply extended later (of course simplicity depends on
good design of (a)).
If someone from PSF agreed to be my mentor, please contact me.
--
Best regards,
Vladimir
More information about the summerofcode
mailing list