
<sorry I'm joining in late here> Carl Friedrich Bolz wrote:
I just checked my LLVM-backend in (I hope I did nothing wrong). It resides in pypy/translator/llvm.
Wow cool, I had no idea that you guys were this far along! Here are some answers to LLVM related questions:
I just removed the usage of llvmc so if that was really the problem it could work now. The only llvm tools that genllvm now uses are llvm-as (which works in your case) and llc (produces native code).
Yup, unfortunately, 'llvmc' was not quite stable when the 1.4 release was put out. It's a bit better in LLVM CVS now, but is still a work in progress. Just not using it is a good way to go for now.
- I think there should be some more intelligent way to produce the necessary LLLVM-implementations for the space operations of more complex types than just writing them in LLVM-assembler, which can be quite tedious (it's no fun writing programs in SSA form).
Oh yeah, generating SSA is quite a pain. The traditional way that LLVM front-ends deal with this is to use 'alloca's for local variables and use explicit loads/stores to them. This generates really gross looking code, but the LLVM optimizers (specifically mem2reg) rip them up. For example, for something simple like: X = 1; ... = X; ... X = 2; You can generate code that looks like this: %X = alloca int ;; in the entry block for the fn ... store int 1, int* %X ... = load int* %X ... store int 2, int* %X ... If you run this sort of code through the LLVM "-mem2reg" optimization, it will promote all of these to SSA values, so you don't have to do it yourself. If the "..."'s contain control flow, this is a non-trivial task. :)
- List and Strings should be relatively easy to implement with arrays. I'm not quite shure wether I manage to do it, I'll just ask questions if I run into problems.
If you have any LLVM specific question, please feel free to contact the llvmdev list (http://mail.cs.uiuc.edu/mailman/listinfo/llvmdev), the people on the list are quite helpful and would love to help you in any way they can.
- Are tuples really only used for returning multiple values from a function? If yes they could be avoided altogether using additional pointer arguments that point to where the return value should be stored.
On the LLVM side of things, the only way for a function to return multiple values is (as you mention) to pass a pointer that indicates where to store the result. In the future (say 3-4 months out), LLVM will be extended to allow functions to return multiple values in registers. Another thought: I see that you're currently using llc to build your programs, have you considered using the LLVM JIT? Anyway, if you have any questions or run into problems, again, we'd love to help, just let us know. :) -Chris -- http://nondot.org/sabre/ http://llvm.cs.uiuc.edu/

Hey Chris! On Wed, Feb 16, 2005 at 23:29 -0600, Chris Lattner wrote:
<sorry I'm joining in late here>
it is great that you joined the pypy-dev list! Welcome! We have been following LLVM with interest for quite some time and are at a stage where we are seriously considering how we (re-)organize our translation backends and LLVM is a primary contender, (if we can make sure that the installation of the very basic tools is feasible :-) Special thanks, btw, to Carl Friedrich Bolz which went ahead with LLVM integration where the rest of us will have to catch up, soon. We are currently buried in refactorings to make type inference and some internal mechanisms on the pypy side easier ... When PyPy follows the LLVM route i personally hope that we can contribute something like a "(restricted) python frontend" for LLVM back (which would likely require an install of Python but that should be easier than a multitude of C++ libraries :-) cheers, holger

On Thu, 17 Feb 2005, holger krekel wrote:
Hey Chris!
On Wed, Feb 16, 2005 at 23:29 -0600, Chris Lattner wrote:
<sorry I'm joining in late here>
it is great that you joined the pypy-dev list! Welcome!
Thanks! :)
We have been following LLVM with interest for quite some time and are at a stage where we are seriously considering how we (re-)organize our translation backends and LLVM is a primary contender, (if we can make sure that the installation of the very basic tools is feasible :-) Special thanks, btw, to Carl Friedrich Bolz which went ahead with LLVM integration where the rest of us will have to catch up, soon. We are currently buried in refactorings to make type inference and some internal mechanisms on the pypy side easier ...
Ok. :)
When PyPy follows the LLVM route i personally hope that we can contribute something like a "(restricted) python frontend" for LLVM back (which would likely require an install of Python but that should be easier than a multitude of C++ libraries :-)
That would be great. C/C++ are pretty ugly, I'm sure a (even restricted) python front-end would be much nicer. :) -Chris -- http://nondot.org/sabre/ http://llvm.cs.uiuc.edu/

Chris Lattner wrote:
Another thought: I see that you're currently using llc to build your programs, have you considered using the LLVM JIT?
It would be interesting to have e Boost Python wrap of LLVM for playing with... One of the most valuable feature of LLVM is the generation of new code 'on-the-fly' with the JIT... My regret is that I have no time to play with that beautiful piece of code (pypy and, now also llvm) *sob* --- Paolo Invernizzi

Hi Chris, On Wed, 16 Feb 2005 23:29:01 -0600 (CST), Chris Lattner <sabre@nondot.org> wrote:
<sorry I'm joining in late here>
Carl Friedrich Bolz wrote:
I just checked my LLVM-backend in (I hope I did nothing wrong). It resides in pypy/translator/llvm.
Wow cool, I had no idea that you guys were this far along!
Depends on what you call 'this' :-). At the moment I can only compile a small subset of RPython. RPython itself is an informally defined restricted subset of Python that makes it easy to be compiled (Disclaimer: I'm not really qualified to explain all this since I started with pypy rather recently). For example a variable should only hold values of the same type. The RPython source code is then converted to a flow graph in SSA form. The types of the variables in the flow graph can then be inferred, if the types of the entry functions are given. This is all done by some parts of pypy. The LLVM-backend has now a very easy job: A flow graph in SSA form with type-annotations maps directly to LLVM code. [snip]
- I think there should be some more intelligent way to produce the necessary LLLVM-implementations for the space operations of more complex types than just writing them in LLVM-assembler, which can be quite tedious (it's no fun writing programs in SSA form).
Oh yeah, generating SSA is quite a pain. The traditional way that LLVM front-ends deal with this is to use 'alloca's for local variables and use explicit loads/stores to them. This generates really gross looking code, but the LLVM optimizers (specifically mem2reg) rip them up. For example, for something simple like:
X = 1; ... = X; ... X = 2;
You can generate code that looks like this:
%X = alloca int ;; in the entry block for the fn ... store int 1, int* %X ... = load int* %X ... store int 2, int* %X ...
If you run this sort of code through the LLVM "-mem2reg" optimization, it will promote all of these to SSA values, so you don't have to do it yourself. If the "..."'s contain control flow, this is a non-trivial task. :)
The problem is not to transform RPython to SSA since the pypy-tools do all the difficult work. I was talking about the implementation of Python's more interesting types like lists and dictionaries. All the methods of these types have to be implemented somehow, which I did from hand in LLVM assembler at first. This is what I meant when I talked about 'pain' ;-). I still have no good solution for this. At the moment I do the following: The methods of the list objects are implemented in C as arrays of pointers to "object" and turned to LLVM code (by compiling and disassembling it). The result is used as a kind of template: All the occurences of the pointers to "object" are replaced by the type of the values the list is supposed to hold. This sounds rather brittle but works quite well at the moment. I will probably run into limitations whith this later. For example if I implement exceptions (which should not be too complicated using invoke and unwind) I can't raise them from within the C code that produces the list implementation. [snip]
Another thought: I see that you're currently using llc to build your programs, have you considered using the LLVM JIT?
At the moment I don't produce standalone programs but rather shared libraries that can be loaded into Python as modules to get access to the LLVM-compiled functions. So I really need to use llc.
Anyway, if you have any questions or run into problems, again, we'd love to help, just let us know. :)
-Chris
I'll do that. Thanks a lot. Regards, Carl Friedrich

On Thu, 17 Feb 2005, Carl Friedrich Bolz wrote:
Wow cool, I had no idea that you guys were this far along!
Depends on what you call 'this' :-). At the moment I can only compile a small subset of RPython. RPython itself is an informally defined restricted subset of Python that makes it easy to be compiled (Disclaimer: I'm not really qualified to explain all this since I started with pypy rather recently). For example a variable should only hold values of the same type. The RPython source code is then converted to a flow graph in SSA form. The types of the variables in the flow graph can then be inferred, if the types of the entry functions are given. This is all done by some parts of pypy. The LLVM-backend has now a very easy job: A flow graph in SSA form with type-annotations maps directly to LLVM code.
Okay, that makes sense, a lot of sense in fact :)
The problem is not to transform RPython to SSA since the pypy-tools do all the difficult work.
Ok, sorry :(
I was talking about the implementation of Python's more interesting types like lists and dictionaries. All the methods of these types have to be implemented somehow, which I did from hand in LLVM assembler at first. This is what I meant when I talked about 'pain' ;-).
Ok.
I still have no good solution for this. At the moment I do the following: The methods of the list objects are implemented in C as arrays of pointers to "object" and turned to LLVM code (by compiling and disassembling it). The result is used as a kind of template: All the occurences of the pointers to "object" are replaced by the type of the values the list is supposed to hold. This sounds rather brittle but works quite well at the moment.
So let me see if I understand correctly: you're writing the code you want in C, compiling it with the LLVM C compiler, and using void*'s. This means you get things like "[100 x sbyte*]" instead of "[100 x dictionary*]" as you would like. To get back the correct type, you're having the compiler "copy and paste" this code, inserting the correct types as appropriate. To me, I don't think it's worth doing this. Instead, why not just compile these function to LLVM bytecode and link them into the program as you need them, inserting cases to/from sbyte* as appropriate? Sure, some amount of type-safety will be "lost" at the LLVM level, but I don't think that will significantly impeed the optimizers. The other nice thing about this approach is that you can modify the runtime library more easily, and performance won't be bad from the result: the LLVM inliner can inline these methods where it makes sense. Am I missing something?
I will probably run into limitations whith this later. For example if I implement exceptions (which should not be too complicated using invoke and unwind) I can't raise them from within the C code that produces the list implementation.
Yeah, that is annoying. One trick that works well for doing the 'unwind' is to just write a simple llvm function that unwinds, and call it from C. Invoke is a bit more tricky unfortunately.
[snip]
Another thought: I see that you're currently using llc to build your programs, have you considered using the LLVM JIT?
At the moment I don't produce standalone programs but rather shared libraries that can be loaded into Python as modules to get access to the LLVM-compiled functions. So I really need to use llc.
Ah ok. For some reason, I thought you were doing things at runtime, sorry about that. :)
Anyway, if you have any questions or run into problems, again, we'd love to help, just let us know. :)
I'll do that. Thanks a lot.
Sounds good, -Chris -- http://nondot.org/sabre/ http://llvm.cs.uiuc.edu/

Hi Chris, On Thu, Feb 17, 2005 at 10:50:12AM -0600, Chris Lattner wrote:
So let me see if I understand correctly: you're writing the code you want in C, compiling it with the LLVM C compiler, and using void*'s. This means you get things like "[100 x sbyte*]" instead of "[100 x dictionary*]" as you would like. To get back the correct type, you're having the compiler "copy and paste" this code, inserting the correct types as appropriate.
To me, I don't think it's worth doing this. Instead, why not just compile these function to LLVM bytecode and link them into the program as you need them, inserting cases to/from sbyte* as appropriate?
At some point, I can imagine that we will need a wider choice than just "a pointer". For example, an array of integers would hold "int" instead of a pointer. I guess you can cast between ints and pointers, but that's only the start of the trouble. An array of tuples (int, int) would ideally be stored as an array of struct { int a,b; }, and you can't fit such a struct into a single void*. Conversely, an array of void* is wasteful to implement strings as an array of characters... Armin

Hi Chris, On Thu, 17 Feb 2005 10:50:12 -0600 (CST), Chris Lattner wrote: [snip]
So let me see if I understand correctly: you're writing the code you want in C, compiling it with the LLVM C compiler, and using void*'s. This means you get things like "[100 x sbyte*]" instead of "[100 x dictionary*]" as you would like. To get back the correct type, you're having the compiler "copy and paste" this code, inserting the correct types as appropriate.
To me, I don't think it's worth doing this. Instead, why not just compile these function to LLVM bytecode and link them into the program as you need them, inserting cases to/from sbyte* as appropriate? Sure, some amount of type-safety will be "lost" at the LLVM level, but I don't think that will significantly impeed the optimizers. The other nice thing about this approach is that you can modify the runtime library more easily, and performance won't be bad from the result: the LLVM inliner can inline these methods where it makes sense. Am I missing something?
Well, it was extremely easy to implement which justifies the approach a bit ;-). Since it works quite well I think I'll stick with it a bit and change it if problems come up (for example if there are lists of so many different types that the code bloat gets unbearable).
I will probably run into limitations whith this later. For example if I implement exceptions (which should not be too complicated using invoke and unwind) I can't raise them from within the C code that produces the list implementation.
Yeah, that is annoying. One trick that works well for doing the 'unwind' is to just write a simple llvm function that unwinds, and call it from C.
Invoke is a bit more tricky unfortunately.
Ok, maybe it is enough to unwind from within those functions. I can't think of any string, list, dict... method that actually catches an exception. So it seems best to set some global variables that indicate the error and then call a function written in llvm that unwinds. Does this sound ok? [snip] Thanks for your help! Regards, Carl Friedrich

On Sun, 20 Feb 2005, Carl Friedrich Bolz wrote:
easily, and performance won't be bad from the result: the LLVM inliner can inline these methods where it makes sense. Am I missing something?
Well, it was extremely easy to implement which justifies the approach a bit ;-). Since it works quite well I think I'll stick with it a bit and change it if problems come up (for example if there are lists of so many different types that the code bloat gets unbearable).
ok :)
Invoke is a bit more tricky unfortunately.
Ok, maybe it is enough to unwind from within those functions. I can't think of any string, list, dict... method that actually catches an exception. So it seems best to set some global variables that indicate the error and then call a function written in llvm that unwinds. Does this sound ok?
Yup, makes sense, -Chris -- http://nondot.org/sabre/ http://llvm.cs.uiuc.edu/

On Thu, 17 Feb 2005, Carl Friedrich Bolz wrote:
... The problem is not to transform RPython to SSA since the pypy-tools do all the difficult work. I was talking about the implementation of Python's more interesting types like lists and dictionaries. All the methods of these types have to be implemented somehow, which I did from hand in LLVM assembler at first. This is what I meant when I talked about 'pain' ;-).
I still have no good solution for this. At the moment I do the following: The methods of the list objects are implemented in C as arrays of pointers to "object" and turned to LLVM code (by compiling and disassembling it). The result is used as a kind of template: All the occurences of the pointers to "object" are replaced by the type of the values the list is supposed to hold. This sounds rather brittle but works quite well at the moment.
I will probably run into limitations whith this later. For example if I implement exceptions (which should not be too complicated using invoke and unwind) I can't raise them from within the C code that produces the list implementation. ...
Previous attempts to generate C code simply have inlined the calls the Python VM would have made. I wonder if you aren't making things too difficult by chosing not to leverage more of the Python C API. In PyFront, I wanted to use type inference to move more of the data flow into native C types. This model of just generating C code that does what the Python VM would have done has other benefits. Namely, the exception handling was simple, even if it was done in a brute force fashion. Each Python C API call was checked for an exception and goto's were used to goto exception handling code when NULL was returned. Anyway, I look forward to looking at what's new here, and hopefully have another RPython straw man (and implementation) to pitch at the PyCon sprint. -Jon

Hi Jon, On Thu, 17 Feb 2005 13:43:07 -0600 (CST) Jonathan David Riehl wrote:
Previous attempts to generate C code simply have inlined the calls the Python VM would have made. I wonder if you aren't making things too difficult by chosing not to leverage more of the Python C API. In PyFront, I wanted to use type inference to move more of the data flow into native C types.
I think that's basically what genc (the C code generator) does: It replaces the RPython code by equivalent calls to the Python C API (to the best of my knowledge: I don't really know genc very well). With genllvm I explicitely try not to use the Python C API. I don't think that that's making things too dificult: RPython is designed to make translation to a low level language easy.
This model of just generating C code that does what the Python VM would have done has other benefits. Namely, the exception handling was simple, even if it was done in a brute force fashion. Each Python C API call was checked for an exception and goto's were used to goto exception handling code when NULL was returned.
Oh, I think Exception handling should be relatively easy in LLVM too since it has excellent exception support. There is an 'unwind' instruction that unwinds the stack and and 'invoke' instruction: invoke calls a function and specifies two basic blocks. If the called function returns normally, execution is continued in the first basic block. If callee (or some subsequent function) calls unwind, execution is continued in the second basic block.
Anyway, I look forward to looking at what's new here, and hopefully have another RPython straw man (and implementation) to pitch at the PyCon sprint.
I won't come to PyCon (but maybe I can hang around on IRC, if someone wants to take a look a genllvm and has questions). Regards, Carl Friedrich
participants (6)
-
Armin Rigo
-
Carl Friedrich Bolz
-
Chris Lattner
-
hpk@trillke.net
-
Jonathan David Riehl
-
Paolo Invernizzi