[summerofcode] High-level python optimizer.

Armin Rigo arigo at tunes.org
Sat Jun 11 16:11:09 CEST 2005


Hi Mark, hi Vova,

Vova: I think you are missing some crucial points on how Python works.
If some parts of the program are not completely known or analyzable,
then it doesn't just mean that these parts cannot be analyzed and
optimized.  It means that all the information about the complete program
is potentially wrong.  Consider the following simple example: in module
'foo.py', two functions f() and g() call each other.  Now in some
completely unrelated part of the program, there is an 'exec' statement.
Most probably, it's not possible to know statically what the 'exec'
statement will do.  Maybe the string that reaches the 'exec' will be at
run-time  "import foo; foo.g=some_other_function".  In this case, your
call graph suddenly becomes invalid.

This is only an example, though.  It could contain much more funny
stuff.  Here are a few other examples in which I tried to have at least
one invalidating example for each invariant you talked about:

    math.sin = newfunction
    foo.factorial = newfunction
    instance.__class__ = newclass
    SomeClass.__add__ = newmethod
    SomeClass.__bases__ = (NewBaseClass,)
    reload(mymodule)
    somefunction.func_defaults = ('yadda',)
    somefunction.func_code = newcodeobject
    __builtin__.len = newfunction
    setattr(instance, name, value)  # where 'name' shadows a method

Some of these examples are pretty silly, but they just show that you
cannot guarantee correctness.  Brett's thesis was about as far as it's
possible to go in this direction.  (You are falling in a common pitfall,
though: it's not the first time someone puts forwards arguments like
yours.)


On Fri, Jun 10, 2005 at 03:39:44PM +0200, Mark Dufour wrote:
> I am just worried that if you plan on doing a _global_ dataflow
> analysis, it will be difficult to remain precise, when all sort of
> dynamic things happen in the code. I'm not sure, but I think the
> people of PyPy are already working heavily on the kinds of
> optimizations as you are talking about, but applying it during
> run-time, so they can be somewhat precise for dynamic code - ie work
> for any Python program (Armin? :))

We have a two-stage approach; right now, we are working on static
analysis of a subset of Python.  Later, we will extend these techniques
to perform an approximative analysis of full Python, but at run-time.
Psyco does extreme run-time-only analysis; we will use similar ideas,
while trying to combine the advantages of static and run-time analysis
(e.g. saving some information as hints between sessions).  PyPy has been
built from the start with the idea of enabling such a balance between
compile- and run-time optimizations.


A bientot,

Armin.


More information about the summerofcode mailing list