[pypy-commit] extradoc extradoc: More small improvements
bivab
noreply at buildbot.pypy.org
Wed Aug 15 14:48:54 CEST 2012
Author: David Schneider <david.schneider at picle.org>
Branch: extradoc
Changeset: r4584:86c25059ca33
Date: 2012-08-15 14:48 +0200
http://bitbucket.org/pypy/extradoc/changeset/86c25059ca33/
Log: More small improvements
diff --git a/talk/vmil2012/paper.tex b/talk/vmil2012/paper.tex
--- a/talk/vmil2012/paper.tex
+++ b/talk/vmil2012/paper.tex
@@ -122,6 +122,7 @@
%___________________________________________________________________________
\todo{mention somewhere that it is to be expected that most guards do not fail}
+\todo{better formatting for lstinline}
\section{Introduction}
\todo{the introduction needs some work}
@@ -258,7 +259,7 @@
path, tracing is started thus recording all operations that are executed on this
path. This includes inlining functional calls.
As in most compilers, tracing JITs use an intermediate representation to
-store the recorded operations, which is typically in SSA
+store the recorded operations, typically in SSA
form~\cite{cytron_efficiently_1991}. Since tracing follows actual execution the
code that is recorded
represents only one possible path through the control flow graph. Points of
@@ -273,9 +274,9 @@
When the check of a guard fails, the execution of the machine code must be
stopped and the control is returned to the interpreter, after the interpreter's
-state has been restored. If a particular guard fails often a new trace is
-recorded starting from the guard. We will refer to this kind of trace as a
-\emph{bridge}. Once a bridge has been traced it is attached to the
+state has been restored. If a particular guard fails often a new trace
+starting from the guard is recorded. We will refer to this kind of trace as a
+\emph{bridge}. Once a bridge has been traced and compiled it is attached to the
corresponding guard by patching the machine code. The next time the guard fails
the bridge will be executed instead of leaving the machine code.
@@ -324,21 +325,21 @@
This information is called the \emph{resume data}.
To do this reconstruction it is necessary to take the values of the SSA
-variables of the trace and build interpreter stack frames. Tracing
+variables in the trace to build interpreter stack frames. Tracing
aggressively inlines functions, therefore the reconstructed state of the
interpreter can consist of several interpreter frames.
If a guard fails often enough, a trace is started from it
-forming a trace tree.
+to create a bridge, forming a trace tree.
When that happens another use case of resume data
-is to construct the tracer state.
+is to reconstruct the tracer state.
After the bridge has been recorded and compiled it is attached to the guard.
If the guard fails later the bridge is executed. Therefore the resume data of
that guard is no longer needed.
There are several forces guiding the design of resume data handling.
Guards are a very common operations in the traces.
-However, a large percentage of all operations
+However, as will be shown, a large percentage of all operations
are optimized away before code generation.
Since there are a lot of guards
the resume data needs to be stored in a very compact way.
@@ -355,14 +356,14 @@
The stack contains only those interpreter frames seen by the tracer.
The frames are symbolic in that the local variables in the frames
do not contain values.
-Instead, every local variables contains the SSA variable of the trace
+Instead, every local variable contains the SSA variable of the trace
where the value would later come from, or a constant.
\subsection{Compression of Resume Data}
\label{sub:compression}
After tracing has been finished the trace is optimized.
-During optimization a large percentage of operations can be removed.
+During optimization a large percentage of operations can be removed.\todo{add a reference to the figure showing the optimization rates?}
In the process the resume data is transformed into its final, compressed form.
The rationale for not compressing the resume data during tracing
is that a lot of guards will be optimized away.
@@ -407,7 +408,7 @@
Using many classical compiler optimizations the JIT tries to remove as many
operations, and therefore guards, as possible.
In particular guards can be removed by subexpression elimination.
-If the same guard is encountered a second time in the trace,
+If the same guard is encountered a second time in a trace,
the second one can be removed.
This also works if a later guard is weaker
and hence implied by an earlier guard.
@@ -432,7 +433,7 @@
Consequently the resume data needs to store enough information
to make this reconstruction possible.
-Adding this additional information is done as follows:
+Storing this additional information is done as follows:
So far, every variable in the symbolic frames
contains a constant or an SSA variable.
After allocation removal the variables in the symbolic frames can also contain
@@ -451,8 +452,8 @@
During the storing of resume data virtual objects are also shared
between subsequent guards as much as possible.
The same observation as about frames applies:
-Quite often a virtual object does not change from one guard to the next.
-Then the data structure is shared.
+Quite often a virtual object does not change from one guard to the next,
+allowing the data structure to be shared.
A related optimization is the handling of heap stores by the optimizer.
The optimizer tries to delay stores into the heap as long as possible.
@@ -495,7 +496,7 @@
\end{figure}
-After optimization the resulting trace is handed over to the platform specific
+After the recorded trace has been optimized it is handed over to the platform specific
backend to be compiled to machine code. The compilation phase consists of two
passes over the lists of instructions, a backwards pass to calculate live
ranges of IR-level variables and a forward pass to emit the instructions. During
@@ -508,9 +509,9 @@
emitted. Guards instructions are transformed into fast checks at the machine
code level that verify the corresponding condition. In cases the value being
checked by the guard is not used anywhere else the guard and the operation
-producing the value can often be merged, further reducing the overhead of the guard.
-Figure \ref{fig:trace-compiled} shows how the \texttt{int\_eq} operation
-followed by a \texttt{guard\_false} from the trace in Figure~\ref{fig:trace-log} are compiled to
+producing the value can merged, further reducing the overhead of the guard.
+Figure \ref{fig:trace-compiled} shows how the \lstinline{int_eq} operation
+followed by a \lstinline{guard_false} from the trace in Figure~\ref{fig:trace-log} are compiled to
pseudo-assembler if the operation and the guard are compiled separated or if
they are merged.
@@ -554,11 +555,11 @@
First a special data
structure called \emph{backend map} is created. This data structure encodes the
-mapping from the IR-variables needed by the guard to rebuild the state to the
+mapping from IR-variables needed by the guard to rebuild the state to the
low-level locations (registers and stack) where the corresponding values will
be stored when the guard is executed.
This data
-structure stores the values in a succinct manner using an encoding that uses
+structure stores the values in a succinct manner using an encoding that requires
8 bits to store 7 bits of information, ignoring leading zeros. This encoding is efficient to create and
provides a compact representation of the needed information in order
to maintain an acceptable memory profile.
@@ -570,18 +571,18 @@
backend map is loaded and after storing the current execution state
(registers and stack) execution jumps to a generic bailout handler, also known
as \emph{compensation code},
-that is used to leave the compiled trace in case of a guard failure.
+that is used to leave the compiled trace.
Using the encoded location information the bailout handler reads from the
-saved execution state the values that the IR-variables had at the time of the
+stored execution state the values that the IR-variables had at the time of the
guard failure and stores them in a location that can be read by the frontend.
-After saving the information the control is passed to the frontend signaling
-which guard failed so the frontend can read the information passed and restore
+After saving the information the control is returned to the frontend signaling
+which guard failed so the frontend can read the stored information and rebuild
the state corresponding to the point in the program.
-As in previous sections the underlying idea for the design of guards is to have
-a fast on-trace profile and a potentially slow one in the bailout case where
-the execution has to return to the interpreter due to a guard failure. At the same
+As in previous sections the underlying idea for the low-level design of guards is to have
+a fast on-trace profile and a potentially slow one in case
+the execution has to return to the interpreter. At the same
time the data stored in the backend, required to rebuild the state, should be as
compact as possible to reduce the memory overhead produced by the large number
of guards, the numbers in Figure~\ref{fig:backend_data} illustrate that the
@@ -600,9 +601,9 @@
main difference is the setup phase. When compiling a trace we start with a clean
slate. The compilation of a bridge is started from a state (register and stack
bindings) that corresponds to the state during the compilation of the original
-guard. To restore the state needed to compile the bridge we use the encoded
-representation created for the guard to rebuild the bindings from IR-variables
-to stack locations and registers used in the register allocator. With this
+guard. To restore the state needed to compile the bridge we use the backend map
+created for the guard to rebuild the bindings from IR-variables
+to stack locations and registers. With this
reconstruction all bindings are restored to the state as they were in the
original loop up to the guard. This means that no register/stack reshuffling is
needed before executing a bridge.
@@ -639,8 +640,8 @@
micro-benchmarks and larger programs.\footnote{\url{http://speed.pypy.org/}} The
benchmarks were taken from the PyPy benchmarks repository using revision
\texttt{ff7b35837d0f}.\footnote{\url{https://bitbucket.org/pypy/benchmarks/src/ff7b35837d0f}}
-The benchmarks were run on a version of PyPy based on the
-revision~\texttt{0b77afaafdd0} and patched to collect additional data about the
+The benchmarks were run on a version of PyPy based on
+revision~\texttt{0b77afaafdd0} and patched to collect additional data about
guards in the machine code
backends.\footnote{\url{https://bitbucket.org/pypy/pypy/src/0b77afaafdd0}} The
tools used to run and evaluate the benchmarks including the patches applied to
@@ -686,7 +687,7 @@
\item Guard failures are local and rare.
\end{itemize}
-All measurements presented in this section do not take garbage collection of machine code into account. Pieces
+All measurements presented in this section do not take garbage collection of resume data and machine code into account. Pieces
of machine code can be globally invalidated or just become cold again. In both
cases the generated machine code and the related data is garbage collected. The
figures show the total amount of operations that are evaluated by the JIT and
More information about the pypy-commit
mailing list