Hi Arthur! Arthur Peters wrote:
Hello, I am Arthur. I've been lurking on the list for a month or so and I've had a good bit of fun playing with the code.
Welcome to PyPy! I'll try to take a stab at some of the questions. Corrections are welcome.
My questions are:
- Why did you start a second LLVM backend? Was the first one badly miss designed?
When I wrote the first LLVM backend the RTyper did not exist yet. This resulted in a big duplication of work: I basically implemented a lot of things on LLVM level (like lists, tuples, classes...) which would have been useful for other C-ish backends as well. Later the RTyper was written which made all that work superfluous because the RTyper did exactly that: implement all those things on a level where every other C-ish backend could use it. Eric and me tried to adapt the old LLVM backend to the new model but it didn't work very well (because the assumptions where totally different). Thus Holger and me started the new LLVM backend on a weekend.
- So I take it the initial goal will be to translate pypy into C (or LLVM) and create a binary executable. At that point will it be possible to translate any given program or will the translator still only work with a subset of the python language? Will the final goal be a JIT VM for python or will I at some point be able to statically compile python to machine code?
The translator will always only work on a subset of Python. It's probably not really possible to statically compile arbritrary Python code to machine code (at least not whithout cheating and using the whole interpreter in the machine code again :-) ).
- Will there be support for multiple specialized versions of functions like in psyco? I know this is a long way down the road but I'm curious what people think.
The idea is to integrate Psyco's ideas, yes. Although there are not really multiple specialized versions of a function in Psyco, it's rather one function that does different things depending on the type of the arguments.
- I read what some discussion of the GIL as it relate to pypy and I agree that the GIL need to be implemented and that a different thread model might be a good way to go. However I thought of the following: Would it be possible to detect when a section of code only uses local variables and unlock the GIL? This seems possible because local variables will cannot be shared between threads anyway. In addition, local variables likely be translated into more basic types than a python object (native ints and floats for instance), that would not require any interaction with the object-space to manipulate (not sure about that usage of the term "object-space"). Thoughts?
First a comment about the usage of the term "object space": I think you are mixing levels here (and I might misunderstand you). The PyPy interpreter (meaning the bytecode interpreter plus the standard objectspace) gets translated to low level code. This "interpreter level code" deals with the standard object space as a regular class instance that is in principle in no way different than any other class. The classes that appear on interpreter level are all translated to more basic types (what would be the alternative, there is no layer below that could deal with anything else), probably to something like a struct. Thus the operations of objects at this level never need to be done via the object space -- the object space is rather a regular object in itself. The object space /is/ used, if you interpret a Python program with PyPy's interpreter. The bytecode interpreter does not now how to deal with /any/ object (except for True and False), it has to delegate to the object space for every single operation. Even for basic types like ints and such -- at this level ("application level") there isn't any type inference or anything like that! Now on threading: I'm not really the right person to say much about it. As far as I understand it, the general idea is that we don't want to clutter our whole interpreter implementation with threading details. Instead threading is supposed to become a translation aspect: The translation process is meant to "weave" the threading model into the translated interpreter. This would have a lot of advantages: Instead of being stuck with a single threading model which is deeply integrated into all the parts of our interpreter and hard to get rid of again, we can change it by changing a small localized part of whatever (probably the translator). Thus it would be possible to translate an interpreter which uses a GIL -- appropriate for an environment where threads are rarely used. Or we could translate an interpreter with, say, more finely grained locking which would be slower for a single threaded program but could speed up applications with multiple threads. [snip] Hope that helped a bit and wasn't too confused :-). Regards, Carl Friedrich