cfbolz at codespeak.net cfbolz at codespeak.net
Wed Apr 8 13:25:25 CEST 2009

Author: cfbolz
Date: Wed Apr  8 13:25:22 2009
New Revision: 63823

Modified:
Log:
a number of fixes

==============================================================================
+++ pypy/extradoc/talk/icooolps2009/paper.tex	Wed Apr  8 13:25:22 2009
@@ -8,7 +8,7 @@
\usepackage[utf8]{inputenc}

{\newcommand{\nb}[2]{
\fbox{\bfseries\sffamily\scriptsize#1}
@@ -208,7 +208,7 @@
Tracing JITs are an idea initially explored by the Dynamo project
\cite{bala_dynamo:transparent_2000} in the context of dynamic optimization of
machine code at runtime. The techniques were then successfully applied to Java
-VMs \cite{gal_hotpathvm:effective_2006}. It also turned out that they are a
+VMs \cite{gal_hotpathvm:effective_2006, andreas_gal_incremental_2006}. It also turned out that they are a
relatively simple way to implement a JIT compiler for a dynamic language
\cite{mason_chang_efficient_2007}. The technique is now
being used by both Mozilla's TraceMonkey JavaScript VM
@@ -223,10 +223,10 @@

The basic approach of a tracing JIT is to only generate machine code for the hot
code paths of commonly executed loops and to interpret the rest of the program.
-The code for those common loops however should be highly optimized, including
+The code for those common loops however is highly optimized, including
aggressive inlining.

-Typically, programs executed by a tracing VMs goes through various phases:
+Typically, programs executed by a tracing VM go through various phases:
\begin{itemize}
\item Interpretation/profiling
\item Tracing
@@ -310,8 +310,8 @@
return result
\end{verbatim}
}
-\toon{next sentence is strange} To trace this, a bytecode form of these functions needs to be introduced that
-the tracer understands. The tracer interprets a bytecode that is an encoding of
+
+The tracer interprets these functions in a bytecode that is an encoding of
the intermediate representation of PyPy's translation toolchain after type
inference has been performed.
When the profiler discovers
@@ -409,9 +409,13 @@
\end{figure}

An example is given in Figure \ref{fig:tlr-basic}. It shows the code of a very
-simple bytecode interpreter with 256 registers and an accumulator. The
+simple bytecode interpreter with 256 registers and an accumulator.  The
\texttt{bytecode} argument is a string of bytes, all register and the
-accumulator are integers. A program for this interpreter that computes
+accumulator are integers.\footnote{The
+chain of \texttt{if}, \texttt{elif}, ... instructions that check the various
+opcodes is transformed into a \texttt{switch} statement by one of PyPy's
+optimizations. Python does not have a \texttt{switch} statement}
+A program for this interpreter that computes
the square of the accumulator is shown in Figure \ref{fig:square}. If the
tracing interpreter traces the execution of the \texttt{DECR\_A} opcode (whose
integer value is 7), the trace would look as in Figure \ref{fig:trace-normal}.
@@ -442,7 +446,7 @@
relevant variables of the language interpreter with the help of a \emph{hint}.
The tracing interpreter will then effectively add the values of these variables
to the position key. This means that the loop will only be considered to be
-closed if these variables that are making up program counter at the language
+closed if these variables that are making up the program counter at the language
interpreter level are the same a second time.  Loops found in this way are, by
definition, user loops.

@@ -535,16 +539,16 @@
language interpreter, it would still be an improvement if some of these operations could
be removed.

-\toon{very difficult to read (actually so is the whole paragraph; rephrase)}
-The simple insight how to improve the situation is that most of the
-operations in the trace are actually concerned with manipulating the
-bytecode and the program counter. Those are stored in variables that are part of
-the position key (they are green''), that means that the tracer checks that they
-are some fixed value at the beginning of the loop (they may well change over the
-course of the loop, though). In the example the check
-would be that the \texttt{bytecode} variable is the bytecode string
-corresponding to the square function and that the \texttt{pc} variable is
-\texttt{4}. Therefore it is possible to constant-fold computations on them away,
+The simple insight how to improve the situation is that most of the operations
+in the trace are actually concerned with manipulating the bytecode string and
+the program counter. Those are stored in variables that are green'' (e.g. they
+are part of the position key).  This means that the tracer checks that those
+variables have some fixed value at the beginning of the loop (they may well
+change over the course of the loop, though). In the example of Figure
+\ref{fig:trace-no-green-folding} the check would be that at the beginning of the
+trace the \texttt{pc} variable is \texttt{4} and the \texttt{bytecode} variable
+is the bytecode string corresponding to the square function. Therefore it is
+possible to constant-fold computations on them away,
as long as the operations are side-effect free. Since strings are immutable in
RPython, it is possible to constant-fold the \texttt{strgetitem} operation. The
@@ -595,9 +599,7 @@
all. It is possible to choose when the language interpreter is translated to C
whether the JIT should be built in or not. If the JIT is not enabled, all the
hints that are possibly in the interpreter source are just ignored by the
-translation process. In this way, the result of the translation is identical to
-that when no hints were present in the interpreter at all. \toon{strange
-sentence}
+translation process.

If the JIT is enabled, things are more interesting. At the moment the JIT can
only be enabled when translating the interpreter to C, but we hope to lift that
@@ -729,13 +731,9 @@
Python).

The results show that the tracing JIT speeds up the execution of this Python
-function significantly, even outperforming CPython. \sout{by a bit. The tracer needs to
-trace through quite a bit of dispatching machinery of the Python interpreter to
-achieve this, XXX.}
-\anto{
-To achieve this, the tracer traces through the whole Python dispatching
-machinery, automatically inlining only the relevant fast paths.
-}
+function significantly, even outperforming CPython. To achieve this, the tracer
+traces through the whole Python dispatching machinery, automatically inlining
+the relevant fast paths.

\begin{figure}
\label{fig:bench-example}