[pypy-dev] How to translate 300000 lines of C
Armin Rigo
arigo at tunes.org
Mon Jan 20 11:28:02 CET 2003
Hello Christian,
On Mon, Jan 20, 2003 at 03:27:31AM +0100, Christian Tismer wrote:
> I believe that it is possible to automate this translation
> process!
Yes! I think it is a very good idea. I would certainly be much more happy
with keeping a reasonable-sized translator up-to-date than having to do so
with the huge C code base.
Let's be clear, we cannot automate the whole translation process, and setting
this up might take as long as manually translating most of CPython, but I am
confident that it will be a big win afterwards (and I am sure you know it
better than me, having discovered it the hard way).
The point is not to blindly translate the C code into Python code that is
guaranteed to do the same thing. Instead, we need to discover the high-level
structure of the C code and map this to Python. It should be relatively easy
given that the whole CPython code follows consistent style guidelines. All we
need is a C parser; translation could be done from the resulting syntax tree.
> For every switch statement, create an according number of
> local functions (indeed making use of the new scopes), and
> prepare a dispatcher table for all the functions.
Maybe just write a chain of if:elif:. New scopes are not completely
sufficient because they won't let us modify a variable from the parent scope.
> For very macro constant, use a constant notation.
> for every macro function, provide a Python function.
Yes. In no case should be preprocess the C code to replace the macros by
their definition. This would be loosing essential high-level information.
> Addition:
> For every C module, provide an extra Python module that is
> able to override some of the automatic decision of above.
Yes. Never change the emitted Python code directly, it would prevent us from
keeping up-to-date with CPython. We need some way to give hints to the
translator (small hints or whole hand-tuned versions of some functions).
Attaching a Python module to each C source file looks like a good way to do
it, althought we might also consider adding the hints directly into the C
source at the point where they apply, as C comments (or #ifdef'ed-away lines).
An advantage of this is that CVS will warn us in case of conflicts between our
hints and CPython updates. Well, maybe there is a need for both inline hints
and attached Python modules.
> For ceval.c, overwrite all the specialized opcode implementations
> which try to optimize integer operations. These should not
> be written by hand any longer, but they are the objective of
> Psyco's specializing features.
Yes, althought I would say that the main loop deserves some special treatment.
There is no need, for example, to copy the code that calls
Py_MakePendingCalls() every _Py_CheckInterval bytecode instructions. This is
a parallel aspect that will might want to add or not later, like reference
counting. The big switch should be special-cased into a bundle of frame
methods with the dispatch table. The Python-in-Python interpreter main loop
should be hand-written. Each opcode function is itself produced by the
C-to-Python translator unless otherwise specified.
> My proposal right now is: Let's write (or change) such a
> compiler which emits fairly good scripts, and then let's
> add modifications which make these into really good scripts.
I believe you are absolutely right.
A bientot,
Armin.
More information about the Pypy-dev
mailing list