Mailman 3 More on optimization - pypy-dev

Oct. 31, 2004

      Hi!

Sorry for focusing the next sprint so much on translation.  This might have
put some people off.  Well, lesson learned.

It doesn't mean we should stop talking about translation :-)  Pushing the
previously dicussed ideas to their conclusion, we get an interesting point of
view...

First, for the context, the current situation in the repository:  RPython code
can be turned into a flow graph.  Then the annotation pass on the flow graph
infers types, with the side effect that it infers which functions call which
functions, so it helps to build the complete list of functions that should be
translated.  Finally, genc.py produces naive C code using PyObject* only.  
The inferred types are not really used; the annotation phase is currently
useful only to build the complete call graph.  (Let's ignore the Lisp and
Pyrex generators for the moment.)

Here is an example with the flow graph's operations on the left and the
corresponding C code (after macro expansion) on the right:

  v3 = add(v1, v2)            v3 = PyNumber_Add(v1, v2);

Quite obvious, and quite slow.  Imagine instead that there is a C-ish
low-level language, whose input "syntax" is of course flow graphs, with only
operations like the following ones:

* llcall(func, arg1, arg2...)   # calls another function
* lladd_int(x, y)               # adds two ints and returns an int
* lleq_ptr(x, y)                # compares two pointers for equality
* llstruct('name')              # creates an empty structure in the heap
* llget_int(x, 'key')           # reads the field 'key' of struct object x
* llset_int(x, 'key', value)    # writes the field 'key' of struct object x
* llget_ptr(x, 'key')           # \
* llget_ptr(x, 'key', value)    # / the same with fields containing a pointer

The only data types would be "pointer to structure" and the atomic types like
"int" and "char".  A structure would be essentially like a dictionary, with
either strings or integers as keys (a dict with int keys is much like a list,
without the ability to insert or remove elements easily).

This very limited set of operations allows interesting global analysis and
optimizations, like removing the allocation of structures in the heap
altogether, or not writing some fields of some structures when they have a
known constant value, or (most commonly) not using a hash table but just a 'C
struct' with fields when all the keys are known.

It is easy to modify our regular flow graphs until they only use the above
operations: we replace high-level operations like 'add' with calls to (or
inlined copies of) support functions like:

def rpyhon_add(v, w):
    if type(v) is int and type(w) is int:
        a = llget_int(v, 'ob_ival')
        b = llget_int(w, 'ob_ival')
        x = llstruct()
        llset(x, 'ob_type', int)
        llset(x, 'ob_ival', lladd_int(a, b))
        return x
    else:
        return llcall('PyNumber_Add', v, w)

Only such support functions can use the ll*() functions, which stand directly
for the corresponding ll* operation.  The ll*() functions themselves are not
executable in the above Python source (actually they don't have to be callable
objects at all).  They are only meant as placeholders.  If you wish, it's just
an easy way to write down a complicated flow graph using ll* operations:
instead of inventing a syntax and writing a parser for it, we use the Python
syntax and the flowobjspace to build our flow graphs.  Note that the
type() used above is a function defined as:

def type(v):
    return llget_ptr(v, 'ob_type')

(Actually, we might want to directly execute such code, for testing, with
Python dictionaries in most variables and (say) llget_ptr() defined as
dict.__getitem__(), but that's not the main goal here.)

Let's come back to PyPy.  We can now replace each high-level operation with a
call to a support function like the above one.  Initially these support
functions could contain only the llcall() to the CPython API; then we can
progressively add more cases to cover all the RPython behavior.  The goal
would be that eventually the support functions and the optimizer become good
enough to remove all calls to the CPython API.  The optimizer can do that
using global analysis: e.g. in the above rpython_add(), if both 'v' and 'w'
come from a previous llstruct() where 'ob_type' was set to 'int', then we know
that the first branch is taken.  This is a kind of generalized type inference
using global constant propagation; it would superscede half of the annotation
pass' job.

Also, the incremental nature of this approach is nice (and very Psyco-like,
btw).  It is good for testing along the way, and also means that when we reach
the goal of supporting enough for RPython, the result is still nicely useful
for non-RPython code; in this case, when types cannot be inferred, there are
remaining calls to the CPython API.  I think that this would be a widely
useful tool!

I also like a lot the idea of a low-level language and optimizer which is
entierely independent from Python and PyPy.  We could even (still) devise an
assembly-like syntax and a parser for it, and make it a separate release.  

Such an optimizer is obviously not easy to write but I think it's quite worth
the effort.  Most of it would be independent on the target language: it would
e.g. replace operations with still more precise operations describing the
usage of the structs, like

   llheapalloc('C struct name')
   llheapget_int(x, 'field', 'C struct name')

Also, we should look closely for related work.  It looks like it *has to* have
been done somewhere before...  But the kind of global optimizations that we
need might be different from the ones any reasonable language would need,
because no one would reasonably write code like the rpython_add() above when
he actually means to do a simple integer addition :-)  What's essential here,
after constant propagation, is to do liveness and alias analysis for the heap
structures, so that they can be not allocated in the heap at all as often as
possible.  Also, if we want to do explicit memory management (i.e. we would
have to use a 'llfree' operation) with reference counters, then the above
example of rpython_add() becomes even longer -- but of course it should still
be optimized away.

A bientôt,

Armin.

More on optimization

Armin Rigo

Samuele Pedroni

Armin Rigo

Samuele Pedroni

Samuele Pedroni

Armin Rigo

Armin Rigo

Samuele Pedroni

Samuele Pedroni

Armin Rigo

Samuele Pedroni

Armin Rigo

hpk＠trillke.net

Armin Rigo

holger krekel

Armin Rigo

holger krekel

Armin Rigo

holger krekel

Samuele Pedroni

Armin Rigo

Samuele Pedroni

Samuele Pedroni

Armin Rigo

Armin Rigo

Samuele Pedroni

Samuele Pedroni

Armin Rigo

Samuele Pedroni

Armin Rigo

hpk＠trillke.net

Armin Rigo

holger krekel

Armin Rigo

holger krekel

Armin Rigo

holger krekel

tags

participants (4)