[summerofcode] High-level python optimizer.

Brett C. bac at OCF.Berkeley.EDU
Wed Jun 8 21:59:20 CEST 2005


Vova wrote:
> Hello !
> This is my project proposal: High-level python optimizer.
> 
> 1. Pre-history.
> Python optimization is difficult task because python's dynamic nature. 
> Applying many of optimization techniques used in other (static typed) 
> languages to python is almost impossible. There was an attempt to create 
> python optimizer that works on the byte-code 
> (http://www.foretec.com/python/workshops/1998-11/proceedings/papers/montanaro/montanaro.html), 
> but it speeds-up execution only by a few percent.
> Different approach is taken in the psyco project. It implements JIT 
> compilation using run-time specialization. The results is very impressive. 
> But psyco does only simple local run-time optimizations, it can't look at the 
> whole program.
> 

The other issue with Psyco is it only works with x86.  Don't forget Python does
run on other platforms so getting optimizations for other architectures is a
very nice thing.

> 2. Project proposal.
> My proposal is to create high-level optimizer, that will work on the source 
> tree of the whole program. It will transform the source tree to data-flow 
> graph that represents the flow of the data from program input (in general 
> sence) to output. The goal is elimination of all unnecessarily dynamics i.e. 
> making assumptions about actual types (and may be values) of variables and 
> their life-times. Of course this can't be done for any variable in the 
> program, but (usually) for most of them (just try to open any python program 
> and traverse it from the very beginning visiting any called function). Once 
> we have got this information we can apply a lot of traditional well-known 
> optimization techniques as well as python-specific ones. Then we can compile 
> and run optimized tree, or transform it back to python program. We can still 
> use psyco for JIT, and maybe we can even provide psyco with gathered 
> information about variable types.
> 

How are you planning to actually implement performance enhancements?  It is one
thing to get complete control flow information in a CFG and do the data
analysis, but Python's bytecode is in no way going to help you in terms of
performance improvement from this information with its lack of type-specific
bytecodes short of generating built-in types and LIST_APPEND.

Are you going to add new bytecodes?  Are you going to do post-compile
transformations like Raymond Hettinger's decorator to make built-ins act as
constants?

And I hope you do realize it sounds like you are breaking semantics of the
language by assuming the code you analyze is the code that gets executed.  You
can read my thesis (http://www.drifty.org/thesis.pdf ; chapter 3) to get the
details.  It's fine if you are, but you should make sure you make that clear in
your proposal.

To answer Mark Dufour's response here as well, Starkiller is not under active
development anymore by Mike (at least to my knowledge).  And Starkiller in no
way optimized Python but just performed a type inference for an entire Python
application.

-Brett


More information about the summerofcode mailing list