antocuni at codespeak.net antocuni at codespeak.net
Wed Apr 1 11:23:45 CEST 2009

Author: antocuni
Date: Wed Apr  1 11:23:44 2009
New Revision: 63477

Modified:
Log:

==============================================================================
+++ pypy/extradoc/talk/icooolps2009/paper.tex	Wed Apr  1 11:23:44 2009
@@ -370,32 +370,36 @@
\label{fig:trace-normal}
\end{figure}

-% YYY: (anto) I reviewd until here
-
To improve this situation, the tracing JIT could trace the execution of several
opcodes, thus effectively unrolling the bytecode dispatch loop. Ideally, the
bytecode dispatch loop should be unrolled exactly so much, that the unrolled version
-corresponds to a loop on the level of the user program. A loop in the user
-program occurs when the program counter of the language interpreter has the
-same value many times. This program counter is typically one or several
+corresponds to \emph{user loop}. User loops
+occur when the program counter of the language interpreter has the
+same value many times. This program counter is typically stored in one or several
variables in the language interpreter, for example the bytecode object of the
currently executed function of the user program and the position of the current
-bytecode within that.
+bytecode within that.  In the example above, the program counter is represented by
+the \texttt{bytecode} and \texttt{pair} variables.
+\anto{XXX: why many times''? Twice is enough to have a loop, though is not hot}

Since the tracing JIT cannot know which parts of the language interpreter are
the program counter, the author of the language interpreter needs to mark the
relevant variables of the language interpreter with the help of a \emph{hint}.
The tracing interpreter will then effectively add the values of these variables
-to the position key. This means, that the loop will only be considered to be
-closed, if these variables that are making up program counter at the language
-interpreter level are the same a second time. Such a loop is a loop of the user
-program. The program counter of the language interpreter can only be the same a
+to the position key. This means that the loop will only be considered to be
+closed if these variables that are making up program counter at the language
+interpreter level are the same a second time. \sout{Such a loop is a loop of the user
+program.} \anto{Loops found in this way are, by definition, user loops}.
+
+The program counter of the language interpreter can only be the same a
second time after an instruction in the user program sets it to an earlier
value. This happens only at backward jumps in the language interpreter. That
means that the tracing interpreter needs to check for a closed loop only when it
encounters a backward jump in the language interpreter. Again the tracing JIT
cannot known where the backward branch is located, so it needs to be told with
the help of a hint by the author of the language interpreter.
+\anto{XXX: the backward jumps are in the user program, not in the language
+  interprer. Am I missing something?}

The condition for reusing already existing machine code needs to be adapted to
this new situation. In a classical tracing JIT there is at most one piece of
@@ -410,6 +414,7 @@
check again only needs to be performed at the backward branches of the language
interpreter.

+\sout{
There is a similar conceptual problem about the point where tracing is started.
Tracing starts when the tracing interpreter sees one particular loop often
enough. This loop is always going to be the bytecode dispatch loop of the
@@ -420,6 +425,15 @@
program. Therefore profiling is also done at the backward branches of the
language interpreter, using one counter per seen program counter of the language
interpreter.
+}
+
+\anto{I find the previous para a bit confusing. What about something more
+  lightweight, like the following?}
+
+\anto{Similarly, the interpreter uses the same techniques to detect \emph{hot
+    user loops}: the profiling is done at the backward branches of the user
+  program, using one counter per seen program counter of the language
+  interpreter.}

\begin{figure}
\input{code/tlr-paper-full.py}
@@ -438,6 +452,9 @@
\texttt{pc} variable is meaningless without the knowledge of which bytecode
string is currently being interpreted. All other variables are red.

+\anto{XXX: they driver does not list \emph{all} the variables; e.g. \texttt{n}
+  is not listed.  But maybe we can just ignore this issue}
+
In addition to the classification of the variables, there are two methods of
\texttt{JitDriver} that need to be called. Both of them get as arguments the
current values of the variables listed in the definition of the driver. The
@@ -507,6 +524,10 @@
bad anyway (in fact we have an experimental optimization that does exactly that,
but it is not finished).

+\anto{I propose to show also the trace with the malloc removal enabled, as it
+  is much nicer to see. Maybe we can say that the experimental optimization we
+  are working on would generate this and that}
+
\begin{figure}
\input{code/full.txt}
\caption{Trace when executing the Square function of Figure \ref{fig:square},
@@ -515,7 +536,9 @@
\label{fig:trace-full}
\end{figure}

-
+\anto{Once we get the highly optimized trace, we can pass it to the \emph{JIT
+    backend}, which generates the correspondent machine code. XXX: do we want
+  to say something more about backends?}

%- problem: typical bytecode loops don't follow the general assumption of tracing
%- needs to unroll bytecode loop
@@ -527,12 +550,18 @@
%- constant-folding of operations on green things
%    - similarities to BTA of partial evaluation

+% YYY (anto)
+
\section{Implementation Issues}
\label{sect:implementation}

-In this section we will describe some of the practical issues when implementing
-the scheme described in the last section in PyPy. In particular we will describe
-some of the problems of integrating the various parts with each other.
+In this section we will describe some of the practical issues when
+implementing the scheme described in the last section in PyPy. In particular
+we will describe some of the problems of integrating the various parts with
+each other.
+
+\anto{XXX: We shoud clarify the distinction between translation/compilation
+  somewhere in the introduction}

The first integration problem is how to \emph{not} integrate the tracing JIT at
all. It should be possible to choose when the interpreter is translated to C