Hi pypy-list! I've [1] been following the pypy-dev mailing list for quite some time now and am really exited about this project. This weekend I checked out the code and started to play around with it a bit. Since there has been some talk about adding a LLVM backend and since this doesn't seem to have happened I decided to take a stab at it. I installed LLVM (which really is a pain), read the LLVM documentation and started to write a (very rudimentary) genllvm.py. It can already generate LLVM-assembler for simple functions (e.g. just ints, no function calls, no default arguments...). Then a Pyrex-wrapper for the functions is generated so that they can imported. For the function snippet.my_gcd the following LLVM-assembler code is generated: int %my_gcd(int %a_2, int %b_3) { block0: %r_7 = call int %mod(int %a_2, int %b_3) br label %block1 block3: %a_29 = phi int [%a_8, %block1] %b_30 = phi int [%b_9, %block1] %r_31 = phi int [%r_10, %block1] %v32 = phi bool [%v11, %block1] %r_21 = call int %mod(int %b_30, int %r_31) br label %block1 block2: %v4 = phi int [%b_9, %block1] ret int %v4 block1: %a_8 = phi int [%a_2, %block0], [%b_30, %block3] %b_9 = phi int [%b_3, %block0], [%r_31, %block3] %r_10 = phi int [%r_7, %block0], [%r_21, %block3] %v11 = call bool %is_true(int %r_10) br bool %v11, label %block3, label %block2 } Note that this exactly mirrors the flowgraph. I just use function calls for all SpaceOperations (though some probably have to be special-cased later). It is not neccessary to rename these functions since LLVM considers functions to be different if their signatures differ. The implementation of these functions is: int %mod(int %a, int %b) { %r = rem int %a, %b ret int %r } bool %is_true(int %a) { %b = cast int %a to bool ret bool %b } LLVM omptimizes the above code to: int %my_gcd(int %a_2, int %b_3) { block0: %r.i = rem int %a_2, %b_3 br label %block1 block3: %r.i1 = rem int %b_9, %r_10 br label %block1 block2: ret int %b_9 block1: %b_9 = phi int [ %b_3, %block0 ], [ %r_10, %block3 ] %r_10 = phi int [ %r.i, %block0 ], [ %r.i1, %block3 ] %b.i = seteq int %r_10, 0 br bool %b.i, label %block2, label %block3 } This is then compiled to native code. At the moment I'm not using the LLVM-API to generate this code since it was simpler to just do the string shuffling in python than to wrap and learn the LLVM-API. In my opinion this can be extended to nearly all of Python's data types as long as the annotation succeeds. I cannot yet judge wether other things like classes, exception handling, garbage collection etc. will be easy but we shall see. As for the code: It is quite convoluted and ad-hoc, I need to clean it up, write some more tests (I already wrote some) and extend it a bit before it is fit for someone else to see. Should I just post it or apply for checkin rights? What do you all think? Does my approach make sense or are there some obvious problems that I didn't see. Regards, Carl Friedrich [1] To introduce myself shorly: My name is Carl Friedrich Bolz, I'm 21 and studying maths and physics in Heidelberg, Germany, currently in my 3rd semester. I've been using Python since four years, mostly for my own projects. In addition I did/am doing a bit of C/C++ programming, mostly for 3D-graphics and high energy physics data analysis.
Hi Carl! On Thu, Feb 03, 2005 at 03:59 +0100, Carl Friedrich Bolz wrote:
I've [1] been following the pypy-dev mailing list for quite some time now and am really exited about this project. This weekend I checked out the code and started to play around with it a bit. Since there has been some talk about adding a LLVM backend and since this doesn't seem to have happened I decided to take a stab at it. I installed LLVM (which really is a pain), read the LLVM documentation and started to write a (very rudimentary) genllvm.py. It can already generate LLVM-assembler for simple functions (e.g. just ints, no function calls, no default arguments...). Then a Pyrex-wrapper for the functions is generated so that they can imported.
sounds good. Before i comment further a disclaimer: i think that Armin, Samuele, Michael or Christian or others can possibly better provide more in-depth comments since i haven't much worked on the current translator/annotator codebase. One of the obstacles regarding LLVM is indeed its installation process, as is often the case with large C++ codebases not packaged by the distributions. If we want to use LLVM more then we should try to provide supplemental installation instructions i guess.
For the function snippet.my_gcd the following LLVM-assembler code is generated:
int %my_gcd(int %a_2, int %b_3) { block0: %r_7 = call int %mod(int %a_2, int %b_3) br label %block1 block3: %a_29 = phi int [%a_8, %block1] %b_30 = phi int [%b_9, %block1] %r_31 = phi int [%r_10, %block1] %v32 = phi bool [%v11, %block1] %r_21 = call int %mod(int %b_30, int %r_31) br label %block1 block2: %v4 = phi int [%b_9, %block1] ret int %v4 block1: %a_8 = phi int [%a_2, %block0], [%b_30, %block3] %b_9 = phi int [%b_3, %block0], [%r_31, %block3] %r_10 = phi int [%r_7, %block0], [%r_21, %block3] %v11 = call bool %is_true(int %r_10) br bool %v11, label %block3, label %block2 }
Note that this exactly mirrors the flowgraph. I just use function calls for all SpaceOperations (though some probably have to be special-cased later). It is not neccessary to rename these functions since LLVM considers functions to be different if their signatures differ.
(... nice example ...)
This is then compiled to native code. At the moment I'm not using the LLVM-API to generate this code since it was simpler to just do the string shuffling in python than to wrap and learn the LLVM-API.
hehe.
In my opinion this can be extended to nearly all of Python's data types as long as the annotation succeeds. I cannot yet judge wether other things like classes, exception handling, garbage collection etc. will be easy but we shall see.
Oh don't worry, the other backends don't care too much for this, either :-) The Pyrex and GenC still cooperate with the CPython runtime and borrow its garbage collection among other things. Once we target a standalone (without the CPython runtime) version garbage collection needs to be done. (Usually at this point sometime drops in the two words "Boehm collector" :-) Exceptions get analyzed by the flow space in a way that makes generation of low-level code rather straightforward.
As for the code: It is quite convoluted and ad-hoc, I need to clean it up, write some more tests (I already wrote some) and extend it a bit before it is fit for someone else to see. Should I just post it or apply for checkin rights?
Apply for checkin rights, i'd say. I guess you are aware of at least our coding-style document http://codespeak.net/pypy/index.cgi?doc/coding-style and of the fact that we generally want MIT-licensed (BSD-licensed) code If so, how about you send me privately your desired account name?
What do you all think? Does my approach make sense or are there some obvious problems that I didn't see.
I think it makes sense. It would be great to have you at one of our next sprints and further explore the LLVM backend, i think. Btw, Armin and Christian intend to do cleanup work on the translator backends and it is sensible to already have LLVM in mind. What i would like to find out is if we could use a stripped down version of LLVM because i also guess that many supplemental (code generation) tasks are better done in Python than with wrapping and using some of the LLVM API. I guess i am going to download and try-installing the thing again :-) cheers, holger
Hi Holger! On Thu Feb 3 09:46:08 MET 2005, Holger Krekel wrote:
One of the obstacles regarding LLVM is indeed its installation process, as is often the case with large C++ codebases not packaged by the distributions. If we want to use LLVM more then we should try to provide supplemental installation instructions i guess.
In my opinion this can be extended to nearly all of Python's data types as long as the annotation succeeds. I cannot yet judge wether other
I can provide provide you with some pointers (Maybe I should write these installation instructions at one point): One of the biggest problems for me was that LLVM needs a farely recent GCC-Version to even compile. I think they recommend GCC 3.3.3/3.4.0 or later -- earlyer versions give "Internal compiler errors" or somethin like that. If you have a recent version of GCC installing the main LLVM package is still relatively easy. What's more difficult is the cfrontend package which is needed if you want to be able to compile c/c++ code to LLVM. This package consists of the ripped-of GCC-frontends for these languages. A precompiled version of this package exists for some platform. If these don't work (which they did for me eventually) you have to compile cfrontend: Quote from the docs: """This is currently a somewhat fragile, error-prone process, and you should _only_ try to do it if: 1. you really, really, really can't use the binaries we distribute 2. you are an elite GCC hacker """ One the other hand is the cfrontend not needed to user LLVM. You need it only if you want to compile c-code with it which is not neccessary for my llvm-backend for pypy (but might become so in the future). things
like classes, exception handling, garbage collection etc. will be easy but we shall see.
Oh don't worry, the other backends don't care too much for this, either :-) The Pyrex and GenC still cooperate with the CPython runtime and borrow its garbage collection among other things. Once we target a standalone (without the CPython runtime) version garbage collection needs to be done. (Usually at this point sometime drops in the two words "Boehm collector" :-) Exceptions get analyzed by the flow space in a way that makes generation of low-level code rather straightforward.
I don't know enough about the memory management of LLVM to decide whether we can use the "Boehm collector". As far as I see it, it is a c/c++-library which could be difficult to use from within LLVM. LLVM has some gc-hooks though (and I think even a very simple gc to show how to use these), maybe we can work with that.
I think it makes sense. It would be great to have you at one of our next sprints and further explore the LLVM backend, i think. Btw, Armin and Christian intend to do cleanup work on the translator backends and it is sensible to already have LLVM in mind.
I won't be able to go to PyCon but I hope I can make it to one of the sprints after that at PyCon. Is there some sprint planning that goes a bit beyond just the next one? I'm even thinking about organizing a sprint here in Heidelberg this summer/fall though I don't know whether this will fly.
What i would like to find out is if we could use a stripped down version of LLVM because i also guess that many supplemental (code generation) tasks are better done in Python than with wrapping and using some of the LLVM API. I guess i am going to download and try-installing the thing again :-)
We can probably try not to use the API altogether. Then we would only need the command line tools. The APIs would be useful for something Psyco-like though as it is possible to emit code and do JIT-compilation , and use it on-the-fly without writing anything to a file first. Regards, Carl Friedrich
On torsdag 3 februari 2005 15.00, Carl Friedrich Bolz wrote:
I won't be able to go to PyCon but I hope I can make it to one of the sprints after that at PyCon. Is there some sprint planning that goes a bit beyond just the next one? I'm even thinking about organizing a sprint here in Heidelberg this summer/fall though I don't know whether this will fly.
Yes, these are the sprint plans so far: 8. 19-22 March, before Pycon, Washington DC, USA 9. 24 April - 1 May, after ACCU UK, Oxford, UK 10. 30 June - 7 July, after Europython, Göteborg, Sweden 11. End August/Beginning of September 12. October 13. December (somewhere warm, focused on reports for EU project) 14. February 2006 15. March 2006, probably close to Pycon 16. May 2006 17. June 2006, probably close to Europython (CERN?) 18. August 2006 19. October 2006 20. November 2006 (Focused on reports for EU projects) Dates are still quite flexible for sprints beyond sprint 10. If you are willing to host a sprint in Heidelberg, we should make sure that we can get all the logistics to work and then ask the team if they want to do a sprint in Heidelberg. I'm the contact point for logistics. Best regards Jacob Hallén
hpk wrote:
One of the obstacles regarding LLVM is indeed its installation process, as is often the case with large C++ codebases not packaged by the distributions. If we want to use LLVM more then we should try to provide supplemental installation instructions i guess.
LLVM is currently being packaged for Debian. See http://bugs.debian.org/239415 Preliminary package does work. 1.4 release is currently packaged. Here's the installation instruction: 1. Add "deb http://toolchain.org/~ahs3 /" to /etc/apt/sources.list 2. apt-get install llvm 3. Happy hacking! Seo Sanghyeon
participants (4)
-
Carl Friedrich Bolz
-
hpk@trillke.net
-
Jacob Hallén
-
Sanghyeon Seo