antocuni at codespeak.net antocuni at codespeak.net
Tue Mar 31 15:25:24 CEST 2009

Author: antocuni
Date: Tue Mar 31 15:25:22 2009
New Revision: 63448

Modified:
Log:
finished the review of the paper until section 3.1 (excluded)

==============================================================================
+++ pypy/extradoc/talk/icooolps2009/paper.bib	Tue Mar 31 15:25:22 2009
@@ -158,3 +158,15 @@
year = {2004},
pages = {15--26}
}
+
+ at InProceedings{AACM-DLS07,
+  Author         = {Ancona, D. and Ancona, M. and Cuni, A and Matsakis, N.},
+  Title          = {R{P}ython: a {S}tep {T}owards {R}econciling
+                   {D}ynamically and {S}tatically {T}yped {OO} {L}anguages},
+  BookTitle      = {O{OPSLA} 2007 {P}roceedings and {C}ompanion, {DLS}'07:
+                   {P}roceedings of the 2007 {S}ymposium on {D}ynamic
+                   {L}anguages},
+  Pages          = {53--64},
+  Publisher      = {ACM},
+  year           = 2007
+}

==============================================================================
+++ pypy/extradoc/talk/icooolps2009/paper.tex	Tue Mar 31 15:25:22 2009
@@ -3,6 +3,26 @@
\usepackage{ifthen}
\usepackage{fancyvrb}
\usepackage{color}
+\usepackage{ulem}
+
+  {\newcommand{\nb}[2]{
+    \fbox{\bfseries\sffamily\scriptsize#1}
+    {\sf\small$\blacktriangleright$\textit{#2}$\blacktriangleleft$}
+   }
+   \newcommand{\version}{\emph{\scriptsize$-$Id: main.tex 19055 2008-06-05 11:20:31Z cfbolz $-$}}
+  }
+  {\newcommand{\nb}[2]{}
+   \newcommand{\version}{}
+  }
+
+\newcommand\cfbolz[1]{\nb{CFB}{#1}}
+\newcommand\anto[1]{\nb{ANTO}{#1}}
+\newcommand\arigo[1]{\nb{AR}{#1}}
+\newcommand{\commentout}[1]{}
+

\let\oldcite=\cite

@@ -12,7 +32,7 @@

\title{Tracing the Meta-Level: PyPy's JIT Compiler}

-\numberofauthors{2}
+\numberofauthors{3}
\author{
\alignauthor Carl Friedrich Bolz\\
@@ -22,6 +42,11 @@
\email{cfbolz at gmx.de}
+\alignauthor Antonio Cuni\\
+       \email{cuni at disi.unige.it}
\alignauthor Armin Rigo\\
\email{arigo at tunes.org}
}
@@ -75,7 +100,7 @@
other dynamic languages as well. The general approach is to implement an
interpreter for the language in a subset of Python. This subset is chosen in
such a way that programs in it can be compiled into various target environments,
-such as C/Posix, the CLR or the JVM. The PyPy project is described in more
+such as C/Posix, the CLI or the JVM. The PyPy project is described in more
details in Section \ref{sect:pypy}.

In this paper we discuss ongoing work in the PyPy project to improve the
@@ -122,12 +147,17 @@
new Python interpreter in Python but has now extended its goals to be an
environment where flexible implementation of dynamic languages can be written.
To implement a dynamic language with PyPy, an interpreter for that language has
-to be written in RPython. RPython ("Restricted Python") is a subset of Python
+to be written in RPython \cite{AACM-DLS07}. RPython ("Restricted Python") is a subset of Python
chosen in such a way that type inference can be performed on it. The language
interpreter can then be translated with the help of PyPy into various target
-environments, such as C/Posix, the CLR and the JVM. This is done by a component
+environments, such as C/Posix, the CLI and the JVM. This is done by a component
of PyPy called the \emph{translation toolchain}.

+
+\anto{XXX: are the following paragraphs really needed? I don't think this
+  details are needed in Section \ref{sect:implementation}.  I would say
+  something much shorter, see below.}
+
The central idea of this way to implement VMs is that the interpreter
implementation in RPython should be as free as possible of low-level
implementation details, such as memory management strategy, threading model or
@@ -158,6 +188,15 @@
than what would be produced after compilation to C. These low-level graphs are
also what the tracing JIT takes as input, as we will see later.

+\anto{By writing VMs in a high-level language, we keep the implementation of
+  the language free of low-level details such as memory management strategy,
+  during the translation process which consists in a series of steps, each
+  step transforming the representation of the program produced by the previous
+  one until we get the final executable.  As we will see later, this internal
+  low-level representation of the program is also used as an input for the
+  tracing JIT.}
+

%- original goal: Python interpreter in Python
%- general way to write flexible VMs for dynamic languages
@@ -194,15 +233,16 @@
The code for those common loops however should be highly optimized, including
aggressive inlining.

-The generation of loops works as follows: At first, everything is interpreted.
+\sout{The generation of loops works as follows: at first, everything is interpreted.}
+\anto{At first, when the program starts, everything is interpreted.}
The interpreter does a bit of lightweight profiling to figure out which loops
are run often. This lightweight profiling is usually done by having a counter on
each backward jump instruction that counts how often this particular backward jump
was executed. Since loops need a backward jump somewhere, this method finds
loops in the user program.

-When a common loop is identified, the interpreter enters a
-special mode (called tracing mode). When in tracing mode, the interpreter
+When a \sout{common}\anto{hot} loop is identified, the interpreter enters a
+special mode, called \emph{tracing mode}. When in tracing mode, the interpreter
records a history (the \emph{trace}) of all the operations it executes, in addition
to actually performing the operations. During tracing, the trace is repeatedly
(XXX make this more precise: when does the check happen?)
@@ -213,6 +253,12 @@
of all the operations in it. The machine code can then be immediately executed,
as it represents exactly the loop that is being interpreted at the moment anyway.

+\anto{XXX I think it's worth spending one more paragraph to explain what a
+  trace really is, i.e. that it's a list of \textbf{sequential} operations,
+  intermixed to guards which guarantee that this particular sequence is still
+  valid.  At the moment, the definition of trace is not given explicitly and
+  it's mixed with the details of how the JIT work}
+
This process assumes that the path through the loop that was traced is a
"typical" example of possible paths (which is statistically likely). Of course
it is possible that later another path through the loop is taken, therefore the
@@ -230,7 +276,7 @@
does not need to check all the time whether the position key already occurred
earlier, but only at instructions that are able to change the position key
to an earlier value, e.g. a backward branch instruction. Note that this is
-already the second place where backward branches are treated specially: During
+already the second place where backward branches are treated specially: during
interpretation they are the place where the profiling is performed and where
tracing is started or already existing assembler code entered; during tracing
they are the place where the check for a closed loop is performed.