[pypy-dev] How to translate 300000 lines of C

Armin Rigo arigo at tunes.org
Mon Jan 20 11:28:02 CET 2003


Hello Christian,

On Mon, Jan 20, 2003 at 03:27:31AM +0100, Christian Tismer wrote:
> I believe that it is possible to automate this translation
> process!

Yes!  I think it is a very good idea.  I would certainly be much more happy 
with keeping a reasonable-sized translator up-to-date than having to do so 
with the huge C code base.

Let's be clear, we cannot automate the whole translation process, and setting
this up might take as long as manually translating most of CPython, but I am
confident that it will be a big win afterwards (and I am sure you know it
better than me, having discovered it the hard way).

The point is not to blindly translate the C code into Python code that is
guaranteed to do the same thing.  Instead, we need to discover the high-level
structure of the C code and map this to Python.  It should be relatively easy
given that the whole CPython code follows consistent style guidelines.  All we
need is a C parser; translation could be done from the resulting syntax tree.

> For every switch statement, create an according number of
> local functions (indeed making use of the new scopes), and
> prepare a dispatcher table for all the functions.

Maybe just write a chain of if:elif:.  New scopes are not completely
sufficient because they won't let us modify a variable from the parent scope.

> For very macro constant, use a constant notation.
> for every macro function, provide a Python function.

Yes.  In no case should be preprocess the C code to replace the macros by
their definition.  This would be loosing essential high-level information.

> Addition:
> For every C module, provide an extra Python module that is
> able to override some of the automatic decision of above.

Yes.  Never change the emitted Python code directly, it would prevent us from
keeping up-to-date with CPython.  We need some way to give hints to the
translator (small hints or whole hand-tuned versions of some functions).  
Attaching a Python module to each C source file looks like a good way to do
it, althought we might also consider adding the hints directly into the C
source at the point where they apply, as C comments (or #ifdef'ed-away lines).  
An advantage of this is that CVS will warn us in case of conflicts between our
hints and CPython updates.  Well, maybe there is a need for both inline hints
and attached Python modules.

> For ceval.c, overwrite all the specialized opcode implementations
> which try to optimize integer operations. These should not
> be written by hand any longer, but they are the objective of
> Psyco's specializing features.

Yes, althought I would say that the main loop deserves some special treatment.  
There is no need, for example, to copy the code that calls
Py_MakePendingCalls() every _Py_CheckInterval bytecode instructions.  This is
a parallel aspect that will might want to add or not later, like reference
counting.  The big switch should be special-cased into a bundle of frame
methods with the dispatch table.  The Python-in-Python interpreter main loop
should be hand-written.  Each opcode function is itself produced by the
C-to-Python translator unless otherwise specified.

> My proposal right now is: Let's write (or change) such a
> compiler which emits fairly good scripts, and then let's
> add modifications which make these into really good scripts.

I believe you are absolutely right.


A bientot,

Armin.


More information about the Pypy-dev mailing list