[pypy-commit] extradoc extradoc: merge

Tue Aug 14 10:46:44 CEST 2012

Author: Hakan Ardo <hakan at debian.org>
Branch: extradoc
Changeset: r4557:492cb04d2ac5
Date: 2012-08-14 10:46 +0200
http://bitbucket.org/pypy/extradoc/changeset/492cb04d2ac5/

Log:	merge

diff --git a/talk/vmil2012/paper.tex b/talk/vmil2012/paper.tex
--- a/talk/vmil2012/paper.tex
+++ b/talk/vmil2012/paper.tex
@@ -123,6 +123,7 @@
 %___________________________________________________________________________
 \todo{find a better name for \texttt{low-level resume data}}
 \todo{find better names for JIT front- and backend}
+\todo{mention somewhere that it is to be expected that most guards do not fail}
 \section{Introduction}
 
 \todo{the introduction needs some work}
@@ -633,7 +634,6 @@
 \section{Evaluation}
 \label{sec:evaluation}
 \todo{improve the table formatting}
-\todo{give a reference to the benchmark scripts to make things repeatable}
 
 The results presented in this section are based on numbers gathered by running
 a subset of the standard PyPy benchmarks. The PyPy benchmarks are used to
@@ -644,7 +644,11 @@
 The benchmarks were run on a version of PyPy based on the
 tag~\texttt{0b77afaafdd0} and patched to collect additional data about the
 guards in the machine code
-backends.\footnote{\url{https://bitbucket.org/pypy/pypy/src/0b77afaafdd0}} All
+backends.\footnote{\url{https://bitbucket.org/pypy/pypy/src/0b77afaafdd0}} The
+tools used to run and evaluate the benchmarks including the patches applied to
+the PyPy sourcecode can be found in the repository for this
+paper.\footnote{\url{https://bitbucket.org/pypy/extradoc/src/tip/talk/vmil2012}}
+All
 benchmark data was collected on a MacBook Pro 64 bit running Max OS 10.8 with
 the loop unrolling optimization disabled.\footnote{Since loop unrolling
 duplicates the body of loops it would no longer be possible to meaningfully
@@ -677,7 +681,7 @@
 From the mentioned benchmarks we collected different datasets to evaluate the
 frequency, the overhead and overall behaviour of guards, the results are
 summarized in the remainder of this section. We want to point out three
-aspects of guards in particular
+aspects of guards in particular:
 \begin{itemize}
   \item Guards are very common operations in traces.
   \item There is overhead associated with guards.
@@ -699,15 +703,14 @@
     \label{fig:benchmarks}
 \end{figure*}
 
-Figure~\ref{fig:benchmarks} summarizes the total number of operations that were
+Figure~\ref{fig:benchmarks} extends Figure~\ref{fig:guard_percent} and summarizes the total number of operations that were
 recorded during tracing for each of the benchmarks and what percentage of these
 operations are guards. The number of operations was counted on the unoptimized
 and optimized traces. The Figure shows that the overall optimization rate for
-operations which is between 69.4\% and 83.89\% of the traced operations and the
+operations, which is between 69.4\% and 83.89\%, of the traced operations and the
 optimization rate of guards, which is between 65.8\% and 86.2\% of the
-operations, are very similar, as could be assumed based on
-Figure~\ref{fig:guard_percent}. This indicates that the optimizer can remove
-most of the guards, but after the optimization pass guards still account for
+operations, are very similar. This indicates that the optimizer can remove
+most of the guards, but after the optimization pass these still account for
 15.2\% to 20.2\% of the operations being compiled and later executed.
 The frequency of guard operations makes it important to store the associated
 information efficiently and also to make sure that guard checks are executed
@@ -756,7 +759,7 @@
 \end{figure}
 
 Why the efficient storing of the \texttt{resume data} is a central concern in the design
-of guards is illustrated by Figure~\ref{fig:backend_data}. This figure shows
+of guards is illustrated by Figure~\ref{fig:resume_data_sizes}. This figure shows
 the size of the compressed \texttt{resume data}, the approximated size of
 storing the \texttt{resume data} without compression and
 an approximation of the best possible compression of the resume data by
@@ -767,11 +770,11 @@
 The results show that the current approach of compression and data sharing only
 requires 18.3\% to 31.1\% of the space compared to a naive approach. This
 shows that large parts of the resume data are redundant and can be stored more
-efficiently through using the techniques described above. On the other hand
+efficiently using the techniques described earlier. On the other hand
 comparing the results to the xz compression which only needs between 17.1\%
 and 21.1\% of the space required by our compression shows that the compression
 is not optimal but a trade-off between the required space and the time needed
-to build a good compressed representation of the compressed resume data for the
+to build a good, compressed representation of the resume data for the
 large amount of guards present in the traces.
 
 \subsection{Guard Failures}
@@ -784,9 +787,7 @@
     good results for long-running programs.
 }
 
-After the guard is patched
-failures execute the new bridge instead of jumping to the trampoline and returning to the interpreter. Hence the
-numbers presented for guards that have a bridge represent the
+The numbers presented for guards that have a bridge represent the
 failures up to the compilation of the bridge and all executions of the then
 attached bridge.
 
@@ -800,7 +801,7 @@
 of all the guards in the optimized traces ever fail. This amount varies between
 2.4\% and 5.7\% of all guards. As can be expected, even fewer guards fail often
 enough that a bridge is compiled for them, only 1.2\% to 3.6\% of all guards
-fail often enough that a bridge is compiled. Also of all failing guards a few fail extremely often
+fail often enough that a bridge is compiled. Also, of all failing guards a few fail extremely often
 and most fail rarely. The results emphasize that as most of the guards never
 fail it is important to make sure that the successful execution of a guard does
 not have unnecessary overhead.
@@ -812,7 +813,7 @@
 \subsection{Guards in Other Tracing JITs}
 \label{sub:Guards in Other Tracing JITs}
 
-Guards as described are a concept associated with tracing just-in-time
+Guards, as described, are a concept associated with tracing just-in-time
 compilers to represent possible divergent control flow paths.
 
 SPUR~\cite{bebenita_spur:_2010} is a tracing JIT compiler
@@ -830,7 +831,7 @@
 about how to rebuild the state from a guard failure using the information in the
 snapshot and the machine execution state. According to Pall~\cite{Pall:2009}
 snapshots for guards in LuaJIT are associated with a large memory footprint.
-The solution used in there is to store sparse snapshots, avoiding the creation
+The solution used there is to store sparse snapshots, avoiding the creation
 of snapshots for every guard to reduce memory pressure. Snapshots are only
 created for guards after updates to the global state, after control flow points
 from the original program and for guards that are likely to fail. As an outlook
@@ -842,11 +843,12 @@
 Linking side exits to pieces of later compiled machine code was described first
 in the context of Dynamo~\cite{Bala:2000wv} under the name of Fragment Linking.
 Once a new hot trace is emitted into the fragment cache it is linked to side
-exit that led to the compilation. Fragment Linking avoids the performance
-penalty involved in leaving the compiled and it to remove the compensation
-code used when restoring the machine state on a side exit.
+exit that led to the compilation of the fragment. Fragment Linking avoids the
+performance penalty involved in leaving the compiled code. Fragment linking
+also allows to remove compensation code associated to the linked fragments that
+would have been required to restored the execution state on the side exit.
 
-Gal et. al~\cite{Gal:2006} describe that in the HotpathVM they experimented
+Gal et. al~\cite{Gal:2006} describe how in the HotpathVM they experimented
 with having one generic compensation code block, like the RPython JIT, that
 uses a register variable mapping to restore the interpreter state. Later this
 was replaced by generating compensation code for each guard which produced a
@@ -909,14 +911,14 @@
 
 \section{Conclusion}
 \label{sec:Conclusion}
-In this paper we have concentrated on guards, an operation typically found in
+In this paper we have concentrated on guards, an operation found in
 tracing just-in-time compilers and used to denote points of possible control
 flow divergence in recorded traces.
 Based on the observation that guards are a frequent operation in traces and
 that they do not fail often, we described how they have been implemented in the
 high and low level components of RPython's tracing JIT compiler.
 
-Finally we have presented experimental data collected using the standard PyPy
+Additionally we have presented experimental data collected using the standard PyPy
 benchmark set to evaluate previous observations and assumptions. Our
 experiments confirmed that guards are a very common
 operation in traces. At the same time guards are associated with a high
@@ -928,7 +930,7 @@
 guards that fail at all, and even fewer that fail very often.
 These numbers validate the design decision of reducing the overhead of
 successful guard checks as much as possible while paying a higher price in the
-case of bailout due to having to decode compressed state representation.
+case of bailout due to having to decode a compressed state representation.
 The compressed state representation reduces the memory footprint of rarely
 used data.