[pypy-dev] Hi!

Thu Feb 3 03:59:27 CET 2005

Hi pypy-list!

I've [1] been following the pypy-dev mailing list for quite some time now
and am really exited about this project. This weekend I checked out the
code and started to play around with it a bit. Since there has been some
talk about adding a LLVM backend and since this doesn't seem to have
happened I decided to take a stab at it. I installed LLVM (which really
is a pain), read the LLVM documentation and started to write a (very
rudimentary) genllvm.py. It can already generate LLVM-assembler for
simple functions (e.g. just ints, no function calls, no default
arguments...). Then a Pyrex-wrapper for the functions is generated so
that they can imported.

For the function snippet.my_gcd the following LLVM-assembler code is
generated:

int %my_gcd(int %a_2, int %b_3) {
block0:
	%r_7 = call int %mod(int %a_2, int %b_3)
	br label %block1
block3:
	%a_29 = phi int [%a_8, %block1]
	%b_30 = phi int [%b_9, %block1]
	%r_31 = phi int [%r_10, %block1]
	%v32 = phi bool [%v11, %block1]
	%r_21 = call int %mod(int %b_30, int %r_31)
	br label %block1
block2:
	%v4 = phi int [%b_9, %block1]
	ret int %v4
block1:
	%a_8 = phi int [%a_2, %block0], [%b_30, %block3]
	%b_9 = phi int [%b_3, %block0], [%r_31, %block3]
	%r_10 = phi int [%r_7, %block0], [%r_21, %block3]
	%v11 = call bool %is_true(int %r_10)
	br bool %v11, label %block3, label %block2
}

Note that this exactly mirrors the flowgraph. I just use function calls
for all SpaceOperations (though some probably have to be special-cased
later). It is not neccessary to rename these functions since LLVM
considers functions to be different if their signatures differ. The
implementation of these functions is:

int %mod(int %a, int %b) {
        %r = rem int %a, %b
        ret int %r
}

bool %is_true(int %a) {
        %b = cast int %a to bool
        ret bool %b
}

LLVM omptimizes the above code to:

int %my_gcd(int %a_2, int %b_3) {
block0:
        %r.i = rem int %a_2, %b_3
        br label %block1

block3:
        %r.i1 = rem int %b_9, %r_10
        br label %block1

block2:
        ret int %b_9

block1:
        %b_9 = phi int [ %b_3, %block0 ], [ %r_10, %block3 ]
        %r_10 = phi int [ %r.i, %block0 ], [ %r.i1, %block3 ]
        %b.i = seteq int %r_10, 0
        br bool %b.i, label %block2, label %block3
}

This is then compiled to native code. At the moment I'm not using the
LLVM-API to generate this code since it was simpler to just do the string
shuffling in python than to wrap and learn the LLVM-API.

In my opinion this can be extended to nearly all of Python's data types
as long as the annotation succeeds. I cannot yet judge wether other things
like classes, exception handling, garbage collection etc. will be easy but
we shall see.

As for the code: It is quite convoluted and ad-hoc, I need to clean it up,
write some more tests (I already wrote some) and extend it a bit before it
is fit for someone else to see. Should I just post it or apply for checkin
rights?

What do you all think? Does my approach make sense or are there some
obvious problems that I didn't see.

Regards,
  Carl Friedrich 

[1] To introduce myself shorly: My name is Carl Friedrich Bolz, I'm 21 and
    studying maths and physics in Heidelberg, Germany, currently in my 3rd
    semester. I've been using Python since four years, mostly for my own
    projects. In addition I did/am doing a bit of C/C++ programming, mostly
    for 3D-graphics and high energy physics data analysis.