Simple question about how the optimizer works

Fri May 10 09:24:59 EDT 2002

Robin Becker wrote:

> >No, I was using "pattern" in it's most general sense, meaning to include both
> >global and local optimizers.  In this overall thread I have alluded to examples
> >in both extremes.
> .....
> well I disagree with the term pattern in this reply as it implies
> something that matches anything which is optimisable ie such a pattern
> must match all inputs.

Personally I very much prefer the term "pattern" as that's how I learned to think
about this class of algorithms back when I studied AI.  There are strong parallels
between global optimization and a lot of AI algorithms.

I don't disagree with anything you say below.  It's a good summary of the true,
theoretical objective of optimization, maximizing a "utility" metric.  That very
terminology -- utility or cost functions -- is a signature feature of many AI
algorithms.

So I was talking at the next level down in abstraction from you, about the actual
algorithms used in practice by many optimizing compilers.  Without contradicting
anything you say (other than your rejection of my use of "pattern") I am attempting
to portray in more detail the actual implementation that often gets used.

You're right to point out that different utility functions may apply and, strictly
speaking, that would rule out any single "pattern".   However, in practice
"optimization" does not consist of just one pattern that either succeeds or fails.
Rather there is a collection of patterns and a way to select which subset is to be
applied in any given circumstance.

The optimizer visits successive nodes in a structure that represents the original
code.  "Patterns" define subsets of node structures that can be compared with to
prospective matches in the code tree.  Nodes in patterns and the parse tree are
annotated with information about semantics that further constrains the match.  When
there's a match, the parse tree is transformed according to rules associated with
each pattern.  Of course, once an optimization is applied, conceiveably the entire
tree needs to be re-examined.  Also some possible optimizations may conflict and
it's necessary to consider both alternatives to squeze the last bit of speed.  Of
course this implies a lot of work as a small number of alternatives quickly produces
a large number of total combinations to consider.  So a lot of compilers have a
switch to enable "full" optimization.  This switch remains off during much of the
development phase, as it can significantly increase build times.

In practice, you usually have a big global switch for speed vs. space.  Some
"patterns" save BOTH speed and space and might as well always be applied, though
most line up either on the speed or on the space side of the hall.  For better or
worse, compilers also usually provide a list of individual specific options, such as
'assume no aliases across function calls', 'keep inline functions', 'no function
common sub expression elimination', etc.  Each is a pattern or a family of patterns
that may selectively searched for over the entire parse tree.

As others pointed out, this is a very complex problem domain with many, many
opportunities to screw things up.

Again, I don't mean to contradict anything you say below.

Regards

--jb

> The problem of code optimisation problem is
> approximately
>
> choose c to optimise F(c,X(c,d))
>       such that X(c,d)=S(p,d)
> c=code
> p=program
> d=program data
> F=utility function ie code size/speed etc
> X=execution function
> S=semantic function
>
> expressed in this form looking for patterns in p is clearly wrong since
> S(p,d) is fixed for a particular program only up to the program data
> which we don't normally know. Classic example is all that loop hoisting
> which is wasteful if the loop never actually gets used.
>
> Real experts on optimisation will also tell us that there are extended
> versions of the above eg vector valued F etc etc and will almost always
> be very cautious about solvability.
>
> Clearly the most important thing here is to ensure feasibility ie
> X(c,d)=S(p,d) for all d; I suppose that's what we mean by safe.
>
> --
> Robin Becker
> --
> http://mail.python.org/mailman/listinfo/python-list

--
James J. Besemer  503-280-0838 voice
http://cascade-sys.com  503-280-0375 fax
mailto:jb at cascade-sys.com