hakanardo noreply at buildbot.pypy.org
Thu Jun 16 16:31:28 CEST 2011

Author: Hakan Ardo <hakan at debian.org>
Changeset: r3703:52402b10e18d
Date: 2011-06-16 16:34 +0200

Log:	fixes

diff --git a/talk/iwtc11/paper.tex b/talk/iwtc11/paper.tex
--- a/talk/iwtc11/paper.tex
+++ b/talk/iwtc11/paper.tex
@@ -177,7 +177,7 @@
For the purpose of this paper, we are going to use a tiny interpreter for a dynamic language with
a very simple object
model, that just supports an integer and a float type (this example has been taken from a previous paper \cite{bolz_allocation_2011}). The objects support only
-two operations, \lstinline{add}, which adds two objects (promoting ints to floats in a
+one operation, \lstinline{add}, which adds two objects (promoting ints to floats in a
mixed addition). The implementation of \lstinline{add} uses classical Smalltalk-like
double-dispatching.
%These classes could be part of the implementation of a very
@@ -278,7 +278,7 @@
corresponding to the stack level of the function that contains the traced
operation. The trace is in single-assignment form, meaning that each variable is
assigned a value exactly once. The arguments $p_0$ and $p_1$ of the loop correspond
-to the live variables \lstinline{y} and \lstinline{res} in the while-loop of
+to the live variables \lstinline{y} and \lstinline{step} in the while-loop of
the original function.

The label of the loop is $l_0$ and is used by the jump instruction to
@@ -339,8 +339,8 @@

XXX find reference of prior work on this

-Loop peeling is achieved by appending a copy of the traced iteration at
-the end of the loop. See Figure~\ref{fig:overview}
+Loop peeling is achieved by appending an inlined copy of the traced iteration at
+the end of itselfe. See Figure~\ref{fig:overview}.
The first part (called \emph{preamble}) finishes with the jump the the second part
(called the \emph{peeled loop}). The second part end with the jump to itself. This way
the preamble will be executed only once while the peeled loop will
@@ -364,7 +364,7 @@
$J=\left(J_1, J_2, \cdots, J_{|J|}\right)$, that are passed as the input variables of the target loop. After
loop peeling there will be a second copy of this trace with input
variables equal to the jump arguments of the preamble, $J$, and jump
-arguments $K$. Looking back at our example we have
+arguments $K$. Looking  at the peeled version of our example in Figure~\ref{fig:peeled-trace} we have

%\left\{
\begin{array}{lcl}
@@ -470,7 +470,7 @@
No special concerns needs to be taken when implementing redundant
guard removal together with loop peeling. The guards from
the preamble might make the guards of the peeled loop
-redundant and thus removed. Therefore the net effect of combining redundant
+redundant and thus removed. Therefore one effect of combining redundant
guard removal with loop peeling is that loop-invariant guards are moved out of the
loop. The peeled loop of the example reduces to

@@ -488,11 +488,15 @@
jump($l_1$, $p_{0}$, $p_{9}$)
\end{lstlisting}

-The guard on $p_5$ on line 17 of Figure~\ref{fig:unopt-trace} can be
+The guard on $p_5$ on line 17 of Figure~\ref{fig:peeled-trace} can be
removed since $p_5$ is allocated on line 10 with a known class. The
guard on $p_0$ on line 20 can be removed since it is identical to the
guard on line 6.

+Note that the guard on $p_5$ is removed even though $p_5$ is not loop
+invariant, which shows that loop invariant code motion is not the only
+effect of loop peeling.
+
\subsection{Heap Caching}

XXX gcc calls this store-sinking and I'm sure there are some
@@ -507,7 +511,7 @@

The issue at hand is to keep the peeled loop a proper
trace. Consider the \lstinline{get} operation on line 19 of
-Figure~\ref{fig:unopt-trace}. The result of this operation can be
+Figure~\ref{fig:peeled-trace}. The result of this operation can be
deduced to be $i_4$ from the \lstinline{set} operation on line
12. Also, the result of the \lstinline{get} operation on line 22 can
be deduced to be $i_3$ from the \lstinline{get} operation on line
@@ -531,7 +535,7 @@

In general what is needed is for the heap optimizer is to keep track of
which variables from the preamble it reuses in the peeled loop.
-It has to construct a vector of such variables $H$ which
+It has to construct a vector, $H$,  of such variables which
can be used to update the input and jump arguments using

\hat J = \left(J_1, J_2, \cdots, J_{|J|}, H_1, H_2, \cdots, H_{|H}\right)
@@ -544,7 +548,7 @@
\label{eq:heap-jumpargs}

In the optimized trace $I$ is replaced by $\hat I$ and $K$ by $\hat -K$. The trace from Figure~\ref{fig:unopt-trace} will be optimized to:
+K$. The trace from Figure~\ref{fig:peeled-trace} will be optimized to: \begin{lstlisting}[mathescape,numbers = right,basicstyle=\setstretch{1.05}\ttfamily\scriptsize]$l_0$($p_{0}$,$p_{1}$): @@ -572,6 +576,10 @@ jump($l_1$,$p_{0}$,$p_{9}$,$i_3$,$i_8$) \end{lstlisting} +Note how the loop invaraint \lstinline{get} on$p_0$was moved out of +the loop, and how the non loop invariant \lstinline{get} on$p_5$was +removed entierly. + \subsection{Common Subexpression Elimination} If a pure operation appears more than once in the trace with same input arguments, it only needs be executed the first time and then the result @@ -599,7 +607,7 @@ Consider again the original unoptimized trace of Figure~\ref{fig:peeled-trace}. Line 10 contains the first allocation. It is removed and$p_5$is marked as virtual. This means -that it refers to an virtual object that was not yet been +that it refers to an virtual object that has not yet been (and might never be) allocated. Line 12 sets the \lstinline{intval} attribute of$p_5$. This operation is also removed and the optimizer registers that the attribute \lstinline{intval} of$p_5$is$i_4$. @@ -608,7 +616,7 @@ arguments of the \lstinline{jump} operation, which contains the virtual reference$p_5$. This can be achieved by exploding$p_5$into it's attributes. In this case there is only one attribute and it's value is -$i_4$, which means the$p_5$is replaced with$i_4$in the jump +$i_4$, which means that$p_5$is replaced with$i_4$in the jump arguments. In the general case, each virtual in the jump arguments is exploded into a @@ -641,8 +649,8 @@ \right) . -and the arguments of the \lstinline{jump} operation of the peeled loop, -$K$, constructed by inlining$\hat J$, +The arguments of the \lstinline{jump} operation of the peeled loop, +$K$, is constructed by inlining$\hat J$, \hat K = \left(m\left(\hat J_1\right), m\left(\hat J_1\right), \cdots, m\left(\hat J_{|\hat J|}\right)\right) @@ -678,7 +686,7 @@ Note that virtuals are only exploded into their attributes when constructing the arguments of the jump of the preamble. This explosion can't be repeated when constructing the arguments of the -jump of the peeled loop as it has to mach the first. This means +jump of the peeled loop as it has to mach the first jump. This means that the objects that was passed as pointers (non virtuals) from the first iteration to the second (from preamble to peeled loop) also has to be passed as pointers from the second iteration to the third (from peeled @@ -687,7 +695,7 @@ before the jump. With the simple objects considered in this paper, that is not a problem. However in more complicated interpreters such an allocation might, in combination with other optimizations, lead -to additional variables from the first iteration being imported into +to additional variables from the preamble being imported into the second. This extends both$\hat J$and$\hat K$, which means that some care has to be taken, when implementing this, to allow$\hat J$to grow while inlining it into$\hat K\$. XXX: Maybe we can skip this?
@@ -798,8 +806,9 @@
optimizations is during the constructing of the jump arguments
connecting the peeled of iteration (the preamble) with the loop body. This
approach
-turns standard optimizations such as redundant guard removal, heap
-caching, pure operation reuse and allocation removals into loop
+improves the effect of standard optimizations such as redundant guard removal, heap
+caching, common subexpression elimination and allocation removals. The
+most prominent effect is that they all become loop
invariant code motion optimizations.

XXX: is loop body'' or peeled loop'' the preferable term?
@@ -809,7 +818,7 @@
improve the run time of small loops containing numerical
calculations.
At least in cases where there are not too many guard
-failures. The standard way of handling guards that fail often is to
+failures. A common way of handling a guard that fails often is to
trace a bridge from it back to the start of some previously compiled
loop. This is applicable here too. However the bridge will have to end
with a jump to the preamble, which lessens the impact of the