[pypy-dev] Flow graphs, backends and JIT

Tue Sep 18 21:55:43 CEST 2012

Hi Haael,

Here is again a high-level overview.  Although we use the term
"backend" for both, there are two completely unrelated components: the
JIT backends and the translation backends.

The translation backends are part of the static translation of a PyPy
(with or without the JIT) to C code.  The translation backends turn
control flow graphs into, say, C source code representing them.  These
control flow graphs are roughly at the same level as Java VM opcodes,
except that depending on the backend, they may either contain GC
operations (e.g. when translating to Java or CLI) or not any more
(e.g. when translating to C).  We have control flow graphs for each
RPython function in the source code of PyPy, describing an interpreter
for Python.

Now the JIT is an optional part of that, which is written as more
RPython code --- and gets statically translated into more control flow
graphs, but describing only the JIT itself, not any JITted code.
JITted code (in the form of machine code) is produced at runtime,
obviously, but using different techniques.  It is the job of the JIT
backends to produce this machine code in memory.  This is unrelated to
the translation backends: a JIT backend inputs something that is not a
control flow graph (but a linear "trace" of operations), works at
runtime (so is itself written in RPython), and outputs machine code in
memory (rather than writing C sources into a file).

The input for the JIT backend comes from a front-end component: the
tracing JIT "metacompiler".  It works by following what the
interpreter would do for some specific input (i.e. the precise Python
code we see at runtime).  This means that the JIT front-end starts
with the control flow graphs of the interpreter and produces a linear
trace out of it, which is fed to the JIT backend.  The control flow
graphs in questions must be available at runtime, so we need to
serialize them.  The precise format in which the flow graphs are
serialized is called "JitCodes".  Although very similar to the flow
graphs, everything that is unnecessary for the JIT was removed, most
importantly the details of the type information --- e.g. all sizes and
signedness of integer variables are all represented as one "int" type,
because the JIT wouldn't have use for more; and similarly any GC
pointer to any object is represented as just one "GC pointer" type.

I hope this helps :-)

A bientôt,

Armin.