[pypy-dev] RE: LLVM backend

Thu Feb 17 17:50:12 CET 2005

On Thu, 17 Feb 2005, Carl Friedrich Bolz wrote:
>> Wow cool, I had no idea that you guys were this far along!
>
> Depends on what you call 'this' :-). At the moment I can only compile a
> small subset of RPython. RPython itself is an informally defined restricted
> subset of Python that makes it easy to be compiled (Disclaimer: I'm not
> really qualified to explain all this since I started with pypy rather
> recently). For example a variable should only hold values of the same type.
> The RPython source code is then converted to a flow graph in SSA form. The
> types of the variables in the flow graph can then be inferred, if the types
> of the entry functions are given. This is all done by some parts of pypy.
> The LLVM-backend has now a very easy job: A flow graph in SSA form with
> type-annotations maps directly to LLVM code.

Okay, that makes sense, a lot of sense in fact :)

> The problem is not to transform RPython to SSA since the pypy-tools do all
> the difficult work.

Ok, sorry :(

> I was talking about the implementation of Python's more interesting 
> types like lists and dictionaries. All the methods of these types have 
> to be implemented somehow, which I did from hand in LLVM assembler at 
> first. This is what I meant when I talked about 'pain' ;-).

Ok.

> I still have no good solution for this. At the moment I do the following:
> The methods of the list objects are implemented in C as arrays of pointers
> to "object" and turned to LLVM code (by compiling and disassembling it). The
> result is used as a kind of template: All the occurences of the pointers to
> "object" are replaced by the type of the values the list is supposed to
> hold. This sounds rather brittle but works quite well at the moment.

So let me see if I understand correctly: you're writing the code you want 
in C, compiling it with the LLVM C compiler, and using void*'s.  This 
means you get things like "[100 x sbyte*]" instead of "[100 x 
dictionary*]" as you would like.  To get back the correct type, you're 
having the compiler "copy and paste" this code, inserting the correct 
types as appropriate.

To me, I don't think it's worth doing this.  Instead, why not just compile 
these function to LLVM bytecode and link them into the program as you need 
them, inserting cases to/from sbyte* as appropriate?  Sure, some amount of 
type-safety will be "lost" at the LLVM level, but I don't think that will 
significantly impeed the optimizers.  The other nice thing about this 
approach is that you can modify the runtime library more easily, and 
performance won't be bad from the result: the LLVM inliner can inline 
these methods where it makes sense.  Am I missing something?

> I will probably run into limitations whith this later. For example if I
> implement exceptions (which should not be too complicated using invoke and
> unwind) I can't raise them from within the C code that produces the list
> implementation.

Yeah, that is annoying.  One trick that works well for doing the 'unwind' 
is to just write a simple llvm function that unwinds, and call it from C.

Invoke is a bit more tricky unfortunately.

> [snip]
>> Another thought: I see that you're currently using llc to build your
>> programs, have you considered using the LLVM JIT?
>
> At the moment I don't produce standalone programs but rather shared
> libraries that can be loaded into Python as modules to get access to the
> LLVM-compiled functions. So I really need to use llc.

Ah ok.  For some reason, I thought you were doing things at runtime, sorry 
about that.  :)

>> Anyway, if you have any questions or run into problems, again, we'd love
>> to help, just let us know. :)
>
> I'll do that. Thanks a lot.

Sounds good,

-Chris

-- 
http://nondot.org/sabre/
http://llvm.cs.uiuc.edu/