[Python-Dev] Thoughts on -O

Delaney, Timothy C (Timothy) tdelaney@avaya.com
Tue, 29 Apr 2003 17:16:01 +1000


> From: Daniel Berlin [mailto:dberlin@dberlin.org]
> >
> > 1. Any new optimisation must be introduced on the optimised path.
> >
> > 2. Optimisations may be promoted from the optimised path to the=20
> > vanilla path at BDFL discretion.
> >
> > 3. Experimental optimisations in general will required at least one=20
> > complete release before being promoted from the optimised=20
> path to the=20
> > vanilla path.
>=20
> Before everyone gets too far, are there actually concrete separate=20
> optimizations we are talking about here?
> Or is this just "in case someone comes up with an optimization that=20
> helps"

One I had in mind would be the CALL_ATTR patch, which Guido
explicitly mentioned as having been implemented on the main
path, not on the optimised path, and pointed out that if it
had been implemented only on the optimised path a number of
issues with it would have been discovered much earlier.

> The only one that makes any appreciable difference is Psyco

Indeed. I would love Psyco to eventually be part of Python, but
suspect it will only be so in the PyPy implementation.

> To put all of this in context, i'm assuming you aren't looking for=20
> 5-10% gains, total. Instead, i'm assuming you are looking for very=20
> significant speedups (100% or greater).

Many of the recent optimisation patches have involved 5% speedups in
some cases. If they all worked without impacting each other (cache
effects, etc) we could probably approach 50% improvement in some
cases.

I have no problems if someone can get a 5% speedup across the
board without introducing incredibly hairy code. I would like such
optimisations to eventually become part of the main path - but I
would prefer that it not become part of the main path until it has
been exposed to many different environments - assuming the
implementor or someone else can't come up with one or more cases
where it becomes a pessimisation.

> If you only want 5-10%, that's easy to do at just the bytecode level,=20
> but you eventually hit the limit of the speed of bytecode execution,=20
> and from experience, you will hit it rather quickly.

Indeed. Every attempt so far has either been in the 5% improvement or
less, standalone, and most have resulted in worse performance when
combined.

Tim Delaney