[pypy-dev] More on optimization

Armin Rigo arigo at tunes.org
Tue Nov 2 17:19:05 CET 2004

Hi Samuele,

So LLVM comes again under the focus; we should really give it a serious try at
some point.

There are clearly two point of views that we can take on RPython.  The first
point of view is that RPython is a kind of nice syntax over a C- or Java-like
language (in this view integers are primitive, and type inference is used to
fill in the missing type declarations).  The alternative is to see RPython as
semantically close to Python, and all the variables contain objects, but we
can obtain better performance via optimization.  The current
annotation-and-Pyrex-or-Lisp approach follows the 1st approach, but I feel
like the 2nd one can give better results.  Well, I guess you guessed :-)

Let me try to explore a few consequences of such a choice. A general critique
I have regarding a lot of research (including the paper you cited) is that the
techniques are flow-insensitive.  I'm not sure if it is a problem for the 1st
approach but it is for the 2nd one: the low-level flow graph of, say, "a+b"

   if type(a) is int and type(b) is int:
       x = make_int_object(add_long(a.ob_ival, b.ob_ival))

It is essential to follow the exact control path in this function, which tells
us that a and b only have a field 'ob_ival' to read when they are actually of
type int.

There are still good ideas to consider from the paper you cited (which
references did you think would be most relevant?).  One is that this algorithm
is nicely context-sensitive (i.e. it is almost like all the functions were
systematically inlined, which is great for optimization -- but which is
impossible in practice, so this algorithm gives an efficient way to compute
what *would* occur if all functions were inlined).  This looks like an
alternative to the traditional way of making functions polymorphic (where we
try to guess, when a function gets called with various kinds of arguments, if
we should analyse the function several times with each specialized set of
arguments or if we should generalize and analyse the function only once).

I tend now to think that the "structure-inlining" problem is best attacked at
the more general level, independently of RPython.  It's quite possible that it
could be done with an adaptation of the techniques that we already have in the
annotation subdirectory.  It would be simpler because we have fewer, more
elementary types.  It would be harder because we'd have to focus on the basic
problem of tracking references more precisely.  Trying to sketch the basics by
putting together some equivalents of the SomeXxx classes and factories, it
looks like it works...  (but some more thinking needed)

Re function pointers: in Java we'd probably have to group functions by
families (two functions are in the same family as soon as there is a single
variable that could point to either one), and then replace function pointers
by integers and use switches...  Lots of one-method classes with a common
interface look like a waste...  But the integer tricks seems independent of
the optimization techniques.  Did you have better tricks in mind that would
require more care?


More information about the Pypy-dev mailing list