[Python-Dev] Rattlesnake progress

Tue, 19 Feb 2002 08:14:20 -0500 (EST)

On Mon, 18 Feb 2002, Tim Peters wrote:

> [Neil Schemenauer]
> > I've been working on Skip's rattlesnake and have made some progress.
> 
> Cool!  I encourage this.  That and 2 dollars will buy you a cup of coffee.
> 
> > Right now I'm trying to hack the compiler package and am looking for a
> > good reference on code generation.  I have the "New Dragon book" as well
> > as "Essentials of Programming Lanuages" but neither one seem to be
> > telling me want I want to know.  Any suggestions?
> 
> Write it in Python because you won't be happy with your first 2 or 3
> attempts.
> 
> Compiler books have historically been very heavy on front end issues, in
> large part because the theory of parsing is well understood.  What
> relatively little you can find on back end issues tends to be sketchy and
> severely limited by the author's personal experience.  In large part this is
> because almost no interesting optimization problem can be solved in linear
> time (whether it's optimal instruction selection, optimal register
> assignment, optimal instruction ordering, ...), so real-life back ends are a
> mountain of idiosyncratic heuristics.

This is true.
In fact, it's actually worse than "can be solved in linear time", it's 
"are currently thought/proved to be in NP".  For graph coloring register 
allocation algorithms, it's even worse (if you thought that was possible).  
You can't even approximate the chromatic number of the graph (IE, the 
number of colors, and therefore, registers, it would take to color it) 
to more than a certain degree in an absurd time bound.

However, you've missed the middle end, where a lot of 
interesting optimizations *can* be done in linear time or n log n time, 
and where most people now concentrate their time.

On an SSA graph, you can do at least the following in linear time or n 
log n time:

Partial Redundancy Elimination
Conditional Constant Propagation
Copy propagation
Dead store elimination
Dead load elimination
Global code motion
Global value numbering
Store motion
Load motion
Dead code elimination
Lots of loop optimizations
Lots of memory hiearchy optimizations

I've ignored interprocedural optimizations, including various pointer 
analyses that are linear time or close to it, because it would be harder 
to apply them to python.

 > 
> Excellent advice that almost nobody follows <0.5 wink>:  choose a flexible
> intermediate representation, then structure all your transformations as
> independent passes, such that the output of every pass is acceptable as the
> input to every pass. 

Everyone tries to do this these days, actually.
At least, from my working on gcc and looking at the source to tons of 
compilers each year.
You really need more than one level of IR to do serious optimization.
Tradeoff between losing valueable info (such as array indexing operations) 
vs. simplicity of writing optimization passes usually causes people to do 
some types optimization on higher level IR's (particularly, loop 
optimizations), while other optimization passes on lower IR's.

GCC is moving towards 3 IR's, a language independent tree IR, a mid-level 
RTL, and the current low-level RTL.

>  Then keep each pass focused, as simple as possible
> (for example, if a transformation may create regions of dead code, don't
> dare try to clean it up in the same pass, or contort the logic even a little
> bit to try to avoid creating dead code -- instead let it create all the dead
> code it wants, and (re)invoke a "remove dead code" pass afterwards).

Usually you don't hit this problem inside a single pass like DCE, because 
they iterate until nothing changes.

> One compiler I read about (but didn't have the pleasure of using) actually
> allowed you to specify the sequence of back end transformations on the
> cmdline, using a regular expression notation where, e.g.
> 
>    (hoist dead)+
> 
> meant "run the hoist pass followed by the dead code removal pass, one or
> more times, until a fixed point is reached".
> 
> Since none of that told you want to know either <wink>, what do you want to
> know?  Sounds like a legit topic for python-dev.