PyPy to generate C/C++ code
To be very clear this is not a question on PyPY RPython itself. :-)) But I had another thought and wanted to run it by PyPy team. As I understand it PyPy is foremost a language development framework. It is about implementing the python interpreter in RPython, plus additional hints to assist in JIT generation. If the Python language implementation in RPython has enough information to create a python interpreter and do JIT compilation. I am thinking it should have enough information to generate C/C++ code. The kind that shedskin has under shedskin/lib/ Basically port the type inference engine from shedskin over to PyPy and use the bulk of Shedksin C++ code but use PyPy Language Framework to implement the Python Compiler that shedksin implements? In otherwords, can PyPy be the language framework in which Shedskin is implemented/ported onto? Looking Shedskin and PyPy do yall have a rough feel for how difficult this would be. Why the question? I am planning to fund some prize money for an Under/Graduate school project back in India and am looking for ideas. This means we would able to motivate a team of 2-5 smart young engineers for about 6 months into doing something interesting for them but beneficial for the python community. One area I am obviously looking at is compiling Python code. I was thinking the project could be to 1. take your C++ code under shedskin/lib as is 2. Have them implement/port the shedskin type inference engine onto the PyPy framework and create a PyPy backend that generates the C++ code from the shedskin/lib What would yall think of such an idea. Estimates? Feasibility? Do you see any benefits to this work for Shedskin or PyPy or both? Sarvi
2010/9/14 Saravanan Shanmugham <sarvi@yahoo.com>:
To be very clear this is not a question on PyPY RPython itself. :-))
But I had another thought and wanted to run it by PyPy team.
As I understand it PyPy is foremost a language development framework. It is about implementing the python interpreter in RPython, plus additional hints to assist in JIT generation.
If the Python language implementation in RPython has enough information to create a python interpreter and do JIT compilation. I am thinking it should have enough information to generate C/C++ code.
Creating a JIT compiler is completely different from statically compiling code. In a JIT, you use runtime information to optimize the code. You can't do anything about this in C. -- Regards, Benjamin
I don't expect this python compiler to be for full python but just a Restricted statically typed subset of python as defined by Shedskin. Yes. JIT annotation may not serve the purpose of generating a compiler. Hence the porting of the type inference engine and may be use JIT notations if it can be . Sarvi ----- Original Message ----
From: Benjamin Peterson <benjamin@python.org> To: Saravanan Shanmugham <sarvi@yahoo.com> Cc: pypy-dev@codespeak.net Sent: Tue, September 14, 2010 2:26:31 PM Subject: Re: [pypy-dev] PyPy to generate C/C++ code
2010/9/14 Saravanan Shanmugham <sarvi@yahoo.com>:
To be very clear this is not a question on PyPY RPython itself. :-))
But I had another thought and wanted to run it by PyPy team.
As I understand it PyPy is foremost a language development framework. It is about implementing the python interpreter in RPython, plus additional hints to assist in JIT generation.
If the Python language implementation in RPython has enough information to create a python interpreter and do JIT compilation. I am thinking it should have enough information to generate C/C++ code.
Creating a JIT compiler is completely different from statically compiling code. In a JIT, you use runtime information to optimize the code. You can't do anything about this in C.
-- Regards, Benjamin
2010/9/15 Saravanan Shanmugham <sarvi@yahoo.com>:
I don't expect this python compiler to be for full python but just a Restricted statically typed subset of python as defined by Shedskin.
Yes. JIT annotation may not serve the purpose of generating a compiler. Hence the porting of the type inference engine and may be use JIT notations if it can be.
I've downloaded and read the source code of shedskin.
From what I understand, here are some differences between PyPy and Shedksin.
- Shedskin analyses and generates code directly by walking the AST of a python module. (there are two passes: the first to grab information about global types and functions, the second to emit code) - Shedskin does very little type inference. Shedskin's type system is based on C++ templates, and once a variable's type has been determined, generic code is emitted and the C++ compiler will select the correct implementation. Other inference engines also work on the AST; Logilab's pylint, for example, works much harder to check all instructions and the type of all variables. Shedskin does not seem to need such power. - On the other hand, PyPy analyzes imported modules, and works on the bytecode of functions living in memory. It does a complete type inference and emits low-level C code or Java intermediate representation. - PyPy has its own way to write generic code and templates, the language for meta-programming is Python itself! [I'm referring to loops that generate classes and functions, and things like "specialize:argtype(0)", "unrolling_iterable" combined with constant propagation]. In most cases, PyPy does not generate better code than Shedskin. When Shedskin compiles code, it does it well. And its restrictions are easier to work with; RPython is really tricky to get right sometimes. Of course, PyPy goal is different: it does not only generate low-level C code, it also generates a JIT compiler that can optimize calls at runtime - in the context of an interpreter. I can't see which computations made there could be applied to static code. Bottom line: if you want to generate efficient C code from python, use (and improve) Shedskin. If you want python code to run faster, don't translate anything, and use the PyPy interpreter. -- Amaury Forgeot d'Arc
Hi Amaury, On 09/15/2010 01:11 AM, Amaury Forgeot d'Arc wrote:
2010/9/15 Saravanan Shanmugham<sarvi@yahoo.com>:
I don't expect this python compiler to be for full python but just a Restricted statically typed subset of python as defined by Shedskin.
Yes. JIT annotation may not serve the purpose of generating a compiler. Hence the porting of the type inference engine and may be use JIT notations if it can be.
I've downloaded and read the source code of shedskin.
From what I understand, here are some differences between PyPy and Shedksin.
- Shedskin analyses and generates code directly by walking the AST of a python module. (there are two passes: the first to grab information about global types and functions, the second to emit code)
- Shedskin does very little type inference. Shedskin's type system is based on C++ templates, and once a variable's type has been determined, generic code is emitted and the C++ compiler will select the correct implementation. Other inference engines also work on the AST; Logilab's pylint, for example, works much harder to check all instructions and the type of all variables. Shedskin does not seem to need such power.
- On the other hand, PyPy analyzes imported modules, and works on the bytecode of functions living in memory. It does a complete type inference and emits low-level C code or Java intermediate representation.
- PyPy has its own way to write generic code and templates, the language for meta-programming is Python itself! [I'm referring to loops that generate classes and functions, and things like "specialize:argtype(0)", "unrolling_iterable" combined with constant propagation].
In most cases, PyPy does not generate better code than Shedskin. When Shedskin compiles code, it does it well. And its restrictions are easier to work with; RPython is really tricky to get right sometimes.
Nice analysis and description, thank you! Carl Friedrich
At a higher level of abstraction, Python is a dynamic language. The dynamicity is what makes it slow. There are simply so many things that might occur at runtime that have to be taken into account in the code. The JIT is designed to find cases where the dynamic properties of the language are not being used in that particular instance of execution, and generate faster code for that bit of the program. This has almost nothing in common with trying to generate C or machine code from a static language that superficially looks like Python. Square Peg, Round Hole. Jacob Hallén
participants (5)
-
Amaury Forgeot d'Arc -
Benjamin Peterson -
Carl Friedrich Bolz -
Jacob Hallén -
Saravanan Shanmugham