# [pypy-svn] r63475 - pypy/extradoc/talk/icooolps2009

cfbolz at codespeak.net cfbolz at codespeak.net
Wed Apr 1 10:14:00 CEST 2009

Author: cfbolz
Date: Wed Apr  1 10:13:58 2009
New Revision: 63475

Modified:
Log:
Some tweaks here and there, start with the related work.

==============================================================================
+++ pypy/extradoc/talk/icooolps2009/paper.tex	Wed Apr  1 10:13:58 2009
@@ -142,7 +142,8 @@
\section{The PyPy Project}
\label{sect:pypy}

-The PyPy project\footnote{http://codespeak.net/pypy} was started to implement a
+The PyPy project\footnote{http://codespeak.net/pypy}
+\cite{rigo_pypys_2006,carl_friedrich_bolz_to_2007} was started to implement a
new Python interpreter in Python but has now extended its goals to be an
environment where flexible implementation of dynamic languages can be written.
To implement a dynamic language with PyPy, an interpreter for that language has
@@ -181,9 +182,9 @@
machine code at runtime. The techniques were then successfully applied to Java
VMs \cite{gal_hotpathvm:effective_2006}. It also turned out that they are a
relatively simple way to implement a JIT compiler for a dynamic language
-\cite{XXX}. The technique is now used by both and are now being used by both Mozilla's
-TraceMonkey JavaScript VM \cite{XXX} and Adobe's Tamarin ActionScript VM
-\cite{XXX}.
+\cite{mason_chang_efficient_2007}. The technique is now used by both and are now
+being used by both Mozilla's TraceMonkey JavaScript VM \cite{XXX} and Adobe's
+Tamarin ActionScript VM \cite{XXX}.

Tracing JITs are built on the following basic assumptions:

@@ -244,6 +245,8 @@
tracing is started or already existing assembler code entered; during tracing
they are the place where the check for a closed loop is performed.

+XXX write somewhere here on which level the RPython tracer operates
+
Let's look at a small example. Take the following (slightly contrived) RPython
code:
\begin{verbatim}
@@ -293,8 +296,6 @@

\subsection{Applying a Tracing JIT to an Interpreter}

-XXX \cite{sullivan_dynamic_2003} somewhere
-
The tracing JIT of the PyPy project is atypical in that it is not applied to the
user program, but to the interpreter running the user program. In this section
we will explore what problems this brings, and how to solve them (at least
@@ -540,11 +541,13 @@
translation process. In this way, the result of the translation is identical to
as if no hints were present in the interpreter at all.

-If the JIT is enabled, things are more interesting. A classical tracing JIT will
+If the JIT is enabled, things are more interesting. At the moment the JIT can
+only be enabled when translating the interpreter to C, but we hope to lift that
+restriction in the future. A classical tracing JIT will
interpret the program it is running until a common loop is identified, at which
point tracing and ultimately assembler generation starts. The tracing JIT in
PyPy is operating on the language interpreter, which is written in RPython. But
-RPython programs are translatable to C. This means that interpreting the
+RPython programs are statically translatable to C anyway. This means that interpreting the
language interpreter before a common loop is found is clearly not desirable,
since the overhead of this double-interpretation would be significantly too big
to be practical.
@@ -597,13 +600,14 @@
interface to an assembler backend for code generation. This makes it possible to
easily port the tracing JIT to various architectures (including, we hope, to
virtual machines such as the JVM where backend could generate bytecode at
-runtime).
+runtime). At the moment the only implemented backend is a simple 32-bit
+Intel-x86 backend.

\textbf{Trace Trees:} This paper ignored the problem of guards that fail in a
large percentage of cases because there are several equally likely paths through
-a loop. This of course is not always practicable. Therefore we also start
-tracing from guards that failed many times and produce assembler code for that
-path, instead of always falling back to interpretation.
+a loop. Just falling back to interpretation in this case is not practicable.
+Therefore we also start tracing from guards that failed many times and produce
+machine code for that path, instead of always falling back to interpretation.

\textbf{Allocation Removal:} A key optimization for making the approach
produce good code for more complex dynamic language is to perform escape
@@ -624,17 +628,55 @@
the code generated by the JIT.

\section{Evaluation}
-
\label{sect:evaluation}
+
+In this section we try to evaluate the work done so far by looking at some
+benchmark numbers. Since the work is not finished, these benchmarks can only be
+preliminary. All benchmarking was done on a machine with a 1.4 GHz Pentium M
+processor and 1GiB RAM, using Linux 2.6.27.
+
%- benchmarks
%    - running example
%    - gameboy?

\section{Related Work}

-% dynamorio stuff
-% partial evaluation
-% XXX
+Applying a trace-based optimizer to an interpreter, adding hints to help the
+tracer produce better results been tried before in the context of the DynamoRIO
+project \cite{sullivan_dynamic_2003}. This work is conceptually very close to
+ours. They achieve the same unrolling of the interpreter loop so that the
+unrolled version corresponds to the loops in the user program. However the
+approach is greatly hindered by the fact that they trace on the machine code
+level and thus have no high-level information available about the interpreter.
+This makes it necessary to add quite a large number of hints, because at the
+assembler level it is not really visible anymore that e.g. a bytecode string is
+really immutable. Also more advanced optimizations like allocation removal would
+not be possible with that approach.
+
+The standard approach for automatically producing a compiler for a programming
+language given an interpreter for it is that of partial evaluation \cite{XXX},
+\cite{XXX}. Conceptually there are some similarities to our work. In partial
+evaluation some arguments of the interpreter function are known (static) while
+the rest are unknown (dynamic). This separation of arguments is related to our
+separation of variables into those that should be part of the position key and
+the rest. In partial evaluation all parts of the interpreter that rely only on
+static arguments can be constant-folded so that only operations on the dynamic
+arguments remain.
+
+Classical partial evaluation has failed to be useful for dynamic language for
+much the same reasons why ahead-of-time compilers cannot compile them to
+efficient code. If the partial evaluator knows only the program it simply does
+not have enough information to produce good code. Therefore some work has been
+done to do partial evaluation at runtime. One of the earliest works on runtime
+specialisation is Tempo for C \cite{XXX}. However, it is essentially a normal
+partial evaluator packaged as a library''; decisions about what can be
+specialised and how are pre-determined. Another work in this direction is DyC
+\cite{grant_dyc_2000}, another runtime specialiser for C. Both of these projects
+have a similar problem as DynamoRIO.  Targeting the C language makes
+higher-level specialisation difficult (e.g.\ \texttt{malloc} can not be
+optimized).
+
+XXX what else?

\section{Conclusion and Next Steps}

@@ -646,7 +688,6 @@
% - advantages are that the complex operations that occur in dynamic languages
%   are accessible to the tracer
\cite{bolz_back_2008}
-\cite{Psyco}

\bigskip