[pypy-dev] LLVM backend

Sun Feb 6 17:53:11 CET 2005

Hi pypy-list!

I just checked my LLVM-backend in (I hope I did nothing wrong). It resides
in pypy/translator/llvm. It is still a bit rough but should work for most
functions that use just ints and bools (I started addint some list and
string support but at the moment only lists of length one work and strings
are not tested at all).

To try it out use the function 'llvmcompile' in the module
pypy.translator.llvm.genllvm on an annotated flowgraph:
%~/pypy/translator/llvm> python -i genllvm.py
>>> t = Translator(test.my_gcd) 
>>> a = t.annotate([int, int])
>>> f = llvmcompile(t)
>>> f(10, 2)
2
>>>

You need to have the LLVM executables (llvm-as, llc, llvmc) in your path, as
well as 'as' and gcc. The LLVM c/c++ frontend is not needed for genllvm to
work (which eases installation a bit). 

llvmcompile tries to produce LLVM-assembly out the entry function of the
translator and of all the functions the entry function calls. The assembly
is then optimized by LLVM and native code (a .o file) is generated.
Additionally the LLVM backend produces a pyrex wrapper for the generated
function to be able to use the LLVM produced version from python. Since all
the dynamic features of Python are not supported yet (and since the LLVM
optimizers seem to be very good) it is not surprising that the produced
functions are pretty fast (faster than C? ;-). 

How genllvm works:

For every function, constant and variable of the flowgraph an object is
generated that knows how representate the object in LLVM. These
LLVMRepr-objects can have dependencies on other LLVMRepr objects, for
example a variable depends on the representation of its type, a function
depends on all its variables and constants... (this seems to be a bit
similar to the nameof* functions of genc and genjava but I'm not shure of
that)

To generate LLVM-code this dependenciy-tree is walked depth-first and every
object is asked for its global declarations and then for the functions it
needs. For example in the global declaration phase the used types would
return their own declaration. In the function phase the used types return
the space-ops that are defined for them and the functions return their own
llvm-code. The code generation for the functions to be translated takes
place in the FunctionRepr class. This is quite straightforward since the
structure of flowgraphs corresponds very well to the control flow mechanisms
of LLVM - for example direct support for phi nodes.

So far this works quite well but a lot of issues remain open:

  - Does my approach makes sense at all? I never did anything even remotely
    similar so I might be doing lots of stupid things.

  - I think there should be some more intelligent way to produce the
    necessary LLLVM-implementations for the space operations of more
    complex types than just writing them in LLVM-assembler, which can be
    quite tedious (it's no fun writing programs in SSA form).

  - List and Strings should be relatively easy to implement with arrays.
    I'm not quite shure wether I manage to do it, I'll just ask questions
    if I run into problems.

  - Are tuples really only used for returning multiple values from a
    function? If yes they could be avoided altogether using additional
    pointer arguments that point to where the return value should be
    stored.

  - I don't know how exactly 'interned strings' work in CPython so I don't
    know what to do with dicts yet.

  - Classes, GC, exceptions and all that...

Regards, 

Carl Friedrich