antocuni at codespeak.net antocuni at codespeak.net
Fri Dec 19 11:39:18 CET 2008

Author: antocuni
Date: Fri Dec 19 11:39:16 2008
New Revision: 60585

Modified:
Log:

==============================================================================
+++ pypy/extradoc/talk/ecoop2009/benchmarks.tex	Fri Dec 19 11:39:16 2008
@@ -1,8 +1,11 @@
\section{Benchmarks}

-In section \ref{sec:tlc-features}, we saw that TLC provides most of the
+\anto{We should say somewhere that flexswitches are slow but benchmarks are so
+  good because they are not involved in the inner loops}
+
+In section \ref{sec:tlc-properties}, we saw that TLC provides most of the
features that usaully make dynamically typed language so slow, such as
-\emph{stack-based VM}, \emph{boxed arithmetic} and \emph{dynamic lookup} of
+\emph{stack-based interpreter}, \emph{boxed arithmetic} and \emph{dynamic lookup} of
methods and attributes.

In the following sections, we will show some benchmarks that show how our
@@ -12,9 +15,9 @@

\begin{enumerate}
\item By plain interpretation, without any jitting.
-\item With the jit enabled: this run includes the time spent by doing the
+\item With the JIT enabled: this run includes the time spent by doing the
compilation itself, plus the time spent by running the produced code.
-\item Again with the jit enabled, but this time the compilation has already
+\item Again with the JIT enabled, but this time the compilation has already
been done, so we are actually measuring how good is the code we produced.
\end{enumerate}

@@ -50,7 +53,7 @@
much better.  At the first iteration, the classes of the two operands of the
multiplication are promoted; then, the JIT compiler knows that both are
integers, so it can inline the code to compute the result.  Moreover, it can
-\emph{virtualize} all the temporary objects, because they never escape from
+\emph{virtualize} (see section \ref{sec:virtuals} all the temporary objects, because they never escape from
the inner loop.  The same remarks apply to the other two operations inside
the loop.

@@ -182,7 +185,7 @@

The computation \emph{per se} is trivial, as it calculates either $-n$ or
$1+2...+n-1$, depending on the sign of $n$. The interesting part is the
-polymorphic call to \lstinline{accumulate} inside the loop, because the VM has
+polymorphic call to \lstinline{accumulate} inside the loop, because the interpreter has
no way to know in advance which method to call (unless it does flow analysis,
which could be feasible in this case but not in general).  The equivalent C\#
code we wrote uses two classes and a \lstinline{virtual} method call to
@@ -191,7 +194,7 @@
However, our generated JIT does not compile the whole function at
once. Instead, it compiles and executes code chunk by chunk, waiting until it
knows enough informations to generate highly efficient code.  In particualr,
-at the time when it emits the code for the inner loop it exactly knows the
+at the time it emits the code for the inner loop it exactly knows the
type of \lstinline{obj}, thus it can remove the overhead of dynamic dispatch
and inline the method call.  Moreover, since \lstinline{obj} never escapes the
function, it is \emph{virtualized} and its field \lstinline{value} is stored

==============================================================================
+++ pypy/extradoc/talk/ecoop2009/clibackend.tex	Fri Dec 19 11:39:16 2008
@@ -52,17 +52,13 @@
Since in .NET methods are the basic units of compilation, a possible
solution consists in creating a new method
any time a new case has to be added to a flexswitch.
-\dacom{comment for Antonio: I am not sure this is the best solution. This cannot work for Java where classes are the basic
-  units. Closures will be available only with Java Dolphin and I do
-  not know how much efficient will be}
In this way, whereas flow graphs without flexswitches are translated
to a single method, the translation of flow graphs which can dynamically grow because of
flexswitches will be scattered over several methods.
Summarizing, the backend behaves in the following way:
\begin{itemize}
\item Each flow graph is translated in a collection of methods which
-  can grow dynamically. \dacom{I propose primary/secondary instead of
-    the overloaded terms main/child} Each collection contains at least one
+  can grow dynamically. Each collection contains at least one
method, called \emph{primary}, which is the first to be created.
All other methods, called \emph{secondary}, are added dynamically
whenever a new case is added to a flexswitch.
@@ -71,12 +67,13 @@
number of blocks, all belonging to the same flow graph. Among these blocks
there always exists an initial block whose input variables are
parameters of the method; the input variables of all other blocks
-  are local variables of the method.
+  are local variables of the method. \anto{This is wrong: the signature of the secondary methods is fixed, and input args are passed inside the InputArgs class, not as methodo parameters}
\end{itemize}

When  a new case is added to a flexswitch, new blocks are generated
and translated by the backend in a new single method pointed
-by a delegate which is stored in the code implementing the flexswitch,
+by a delegate \footnote{\emph{Delegates} are the .NET equivalent of function pointers}
+ of  which is stored in the code implementing the flexswitch,
so that the method can be invoked later.

@@ -88,7 +85,7 @@
the corresponding code fragment in the same method is emitted
to execute the new block, whereas the appropriate local variables are
-used for passing arguments.
+used for passing arguments. \anto{this is wrong for the same reason as above}
Also following an external link whose target is the initial block of a
method is not difficult: the corresponding method has to be invoked
with the appropriate arguments.
@@ -106,7 +103,7 @@
determine which block has to be executed.
This is done by passing to the method a 32 bits number, called
\emph{block id}, which uniquely identifies the next block of the graph to be executed.
-The high word of a block id is the id of the method to which the block
+The high word \anto{a word is 32 bit, block num and method id are 16 bit each} of a block id is the id of the method to which the block
belongs, whereas the low word is a progressive number univocally identifying
each block implemented by the method.

@@ -145,7 +142,7 @@
If the next block to be executed is implemented in the same method
({\small\lstinline{methodid == MY_METHOD_ID}}), then the appropriate
-can be managed efficiently.
+can be managed efficiently. \anto{wrong: internal links don't go through the dispatcher}
Otherwise, the \lstinline{jump_to_ext}
part of the dispatcher has to be executed.
The code that actually jumps to an external block is contained in
@@ -267,10 +264,10 @@
the link and jumps to the right block by performing a linear search in
array \lstinline{values}.

-Recall that the first argument of delegate \lstinline{FlexSwitchCase}
-is the block id to jump to; since the target of an external jump is
-always the initial block of the method, the first argument will be
-always 0.
+Recall that the first argument of delegate \lstinline{FlexSwitchCase} is the
+block id to jump to. By construction, the target block of a flexswitch is
+always the first in a secondary method, and we use the special value
+\lstinline{0} to signal this.

The value returned by method \lstinline{execute} is the next block id
to be executed;

==============================================================================
+++ pypy/extradoc/talk/ecoop2009/jitgen.tex	Fri Dec 19 11:39:16 2008
@@ -60,7 +60,7 @@
\label{fig:tlc-main}
\begin{center}
\input{tlc-simplified.py}
-\caption{The main loop of the TLC interpreter}
+\caption{The main loop of the TLC interpreter, written in RPython}
\end{center}
\end{figure}

@@ -194,7 +194,7 @@
The binding-time analyzer of our translation tool-chain is using a simple
abstract-interpretation based analysis. It is based on the
same type inference engine that is used on the source RPython program,
-the annotator.  In this mode, it is called the \emph{hint-annotator}; it
+the annotator \anto{I'm not sure we should mention the annotator, as it is not referred anywhere else}.  In this mode, it is called the \emph{hint-annotator}; it
RPython-level, and propagates annotations that do not track types but
value dependencies and manually-provided binding time hints.

==============================================================================
+++ pypy/extradoc/talk/ecoop2009/rainbow.tex	Fri Dec 19 11:39:16 2008
@@ -140,6 +140,7 @@

\section{Automatic Unboxing of Intermediate Results}
+\label{sec:virtuals}

XXX the following section needs a rewriting to be much more high-level and to
compare more directly with classical escape analysis
@@ -151,7 +152,7 @@
residual code as long as possible. The idea is to try to keep new
run-time structures "exploded": instead of a single run-time object allocated on
the heap, the object is "virtualized" as a set
-of fresh variables, one per field. Only when the object can be accessed by from
+of fresh local variables, one per field. Only when the object can be accessed by from
somewhere else is it actually allocated on the heap. The effect of this is similar to that of
escape analysis \cite{XXX}, which also prevents allocations of objects that can
be proven to not escape a method or set of methods.

==============================================================================
+++ pypy/extradoc/talk/ecoop2009/tlc.tex	Fri Dec 19 11:39:16 2008
@@ -21,7 +21,8 @@
Objects represent a collection of named attributes (much like JavaScript or
Self) and named methods.  At creation time, it is necessary to specify the set
of attributes of the object, as well as its methods.  Once the object has been
-created, it is not possible to add/remove attributes and methods.
+created, it is possible to call methods and read or write attributes, but not

The interpreter for the language is stack-based and uses bytecode to represent
the program. It provides the following bytecode instructions:
@@ -47,10 +48,8 @@
the VM needs to do all these checks at runtime; in case one of the check
fails, the execution is simply aborted.

-\subsection{TLC features}
-\label{sec:tlc-features}
-\cfbolz{calling this sections "features" is a bit obscure, since it is more
-properties of the implementation}
+\subsection{TLC properties}
+\label{sec:tlc-properties}

Despite being very simple and minimalistic, \lstinline{TLC} is a good
candidate as a language to test our JIT generator, as it has some of the