[summerofcode] High-level python optimizer.
Vova
physics at list.ru
Sun Jun 12 13:23:58 CEST 2005
Hello Armin !
On Saturday 11 June 2005 18:11, you wrote:
> Hi Mark, hi Vova,
>
> Vova: I think you are missing some crucial points on how Python works.
> If some parts of the program are not completely known or analyzable,
> then it doesn't just mean that these parts cannot be analyzed and
> optimized. It means that all the information about the complete program
> is potentially wrong. Consider the following simple example: in module
> 'foo.py', two functions f() and g() call each other. Now in some
> completely unrelated part of the program, there is an 'exec' statement.
> Most probably, it's not possible to know statically what the 'exec'
> statement will do. Maybe the string that reaches the 'exec' will be at
> run-time "import foo; foo.g=some_other_function". In this case, your
> call graph suddenly becomes invalid.
Ok, you are right, exec is a bad thing for my algorithm. When control flow
reaches exec statement (or any other unknown code without annotation)
analyser must forget all assumptions. But before exec that assumptions are
right. And after exec there may be new assumptions (of course only about
local scopes).
The best what can be done is inspecting new code at run-time, modifying
assumptions appropriately and recompiling if they had changed.
Let's try to see how often exec (and compile) are used in real programs. I've
checked some big python projects (I hope the list was random chosen):
BitTorrent: not used
Mailman: not used
PIL: not used (actually used once in unit-tests framework)
Plone: used a lot
PyRex: not used
Portage: not used
SCons: used
With Plone situation is really bad. It uses a lot of external scripts and does
a lot of dynamic tricks (even more interesting than your examples below), so
without user interaction it can't be optimized a lot.
Let's look inside SCons. It uses exec in the file Script/SConscript.py for
executing user-supplied scripts. But these script are executed in specially
created scope, and not intended to modify anything outside from this scope.
If this information can be passed to analyser manually it can still work.
And finally let's look for some timings:
from timeit import Timer
print Timer('pass', 'pass').timeit()
print Timer('f()', 'def f(): pass').timeit()
print Timer('for i in xrange(5): pass', 'pass').timeit()
print Timer('for i in range(5): pass', 'pass').timeit()
this code prints on my machine:
0.0886490345001
0.522771120071
1.64868998528
2.10483407974
Comparing lines 1 and 2 gives some estimation how much can we can get from
function inlining (for object methods it's even more actual). Just look at
any object-oriented program and count how many it has one-line functions and
how often they are called.
Comparing lines 1 and 3 gives estimation for loop-unrolling.
This is only random examples.
> This is only an example, though. It could contain much more funny
> stuff. Here are a few other examples in which I tried to have at least
> one invalidating example for each invariant you talked about:
>
> math.sin = newfunction
> foo.factorial = newfunction
> instance.__class__ = newclass
> SomeClass.__add__ = newmethod
> SomeClass.__bases__ = (NewBaseClass,)
> reload(mymodule)
> somefunction.func_defaults = ('yadda',)
> somefunction.func_code = newcodeobject
> __builtin__.len = newfunction
> setattr(instance, name, value) # where 'name' shadows a method
As I said before these all can be tracked.
> Some of these examples are pretty silly, but they just show that you
> cannot guarantee correctness. Brett's thesis was about as far as it's
> possible to go in this direction. (You are falling in a common pitfall,
> though: it's not the first time someone puts forwards arguments like
> yours.)
>
> On Fri, Jun 10, 2005 at 03:39:44PM +0200, Mark Dufour wrote:
> > I am just worried that if you plan on doing a _global_ dataflow
> > analysis, it will be difficult to remain precise, when all sort of
> > dynamic things happen in the code. I'm not sure, but I think the
> > people of PyPy are already working heavily on the kinds of
> > optimizations as you are talking about, but applying it during
> > run-time, so they can be somewhat precise for dynamic code - ie work
> > for any Python program (Armin? :))
>
> We have a two-stage approach; right now, we are working on static
> analysis of a subset of Python. Later, we will extend these techniques
> to perform an approximative analysis of full Python, but at run-time.
> Psyco does extreme run-time-only analysis; we will use similar ideas,
> while trying to combine the advantages of static and run-time analysis
> (e.g. saving some information as hints between sessions). PyPy has been
> built from the start with the idea of enabling such a balance between
> compile- and run-time optimizations.
Yes, this is promising. But for run-time optimization it's really important
that optimizer must take not more time that it saves from program execution.
It prohibits some optimizations at all.
--
Best regards,
Vladimir
More information about the summerofcode
mailing list