[pypy-commit] extradoc extradoc: add all actionable reviewer comments to the paper

cfbolz noreply at buildbot.pypy.org
Mon Aug 6 10:57:53 CEST 2012

Author: Carl Friedrich Bolz <cfbolz at gmx.de>
Branch: extradoc
Changeset: r4415:971e5766953f
Date: 2012-08-06 09:05 +0200

Log:	add all actionable reviewer comments to the paper

diff --git a/talk/iwtc11/licm.pdf b/talk/iwtc11/licm.pdf
index ff2a7bf547f542771702ac86ea8531f8ba16cc28..434a60986afded368291319f00f30fcc2467c90a
GIT binary patch


diff --git a/talk/iwtc11/paper.tex b/talk/iwtc11/paper.tex
--- a/talk/iwtc11/paper.tex
+++ b/talk/iwtc11/paper.tex
@@ -82,6 +82,7 @@
 \newcommand\reva[1]{\nb{Reviewer 1}{#1}}
 \newcommand\revb[1]{\nb{Reviewer 2}{#1}}
 \newcommand\revc[1]{\nb{Reviewer 3}{#1}}
+\newcommand\revd[1]{\nb{Reviewer 4}{#1}}
 \newcommand{\ignore}[1]{} % {{\tt \small ignore(#1)}}
@@ -141,6 +142,11 @@
+You often use the word simple. While it might make sense to use it,
+it exact meaning in that context remains unclear.
 One of the advantages that tracing JIT compilers have above traditional
 JITs is that their optimizers are much easier to write. Because a tracing JIT
@@ -208,6 +214,11 @@
+Don't break code listings across pages, as at the start of section 3.  It makes
+them very hard to follow.
 To motivate the approach we propose here, let's look at a trivial (unrealistic)
 trace which corresponds to an infinite loop:
@@ -304,6 +315,15 @@
 \section{Running Example}
+I think the motivation section is great, in particular for readers
+who are less familiar with compiler/JIT optimizations. However,
+section 4 starts with "yet another example" - at least this was my
+impression when reading it. I understand the differences and
+everything, but still, you might consider to improve the transition
+between sections 3 and 4.
 For the purpose of this paper, we are going to use a tiny interpreter for a dynamic language with
  a very simple object
 model, that just supports an integer and a float type (this example has been taken from a previous paper \cite{bolz_allocation_2011}). The objects support only
@@ -438,6 +458,14 @@
 \section{Making Trace Optimizations Loop Aware}
+In general, the paper is over-long on generalities and too short on details.
+For example, the description of the basic technique at the beginning of section
+5 is the third time the idea is explained at basically the same level of detail
+(the others are in section 2 and section 4).  In contrast, the optimizations
+applied rely on a simple type analysis, but this is only briefly alluded to.
 Before a trace is passed to the backend compiling it into machine code
 it is optimized to achieve better performance.
 One goal of that is to move 
@@ -803,8 +831,62 @@
 variables \lstinline{step} and \lstinline{y}, and the overhead of
 using boxed values is removed.
+This paper presents an elegant, if simple, technique, and demonstrates that
+it's effective in small cases.  The worked example is particularly helpful, and
+would be better if it were worked more thoroughly.  Some of the omitted steps
+are not entirely obvious, and the paper would be improved by making the
+clearer.  In particular, the final program presented on the bottom of page 5,
+first column, still has memory access, boxing, and type checks, which the paper
+then claims can be removed.  There's enough space to show this.
+A nit: Section 7 says that loop peeling never makes runtime
+performance worse, but generating more code can potentially slow
+performance. I assume that non-numeric benchmarks show no slowdown in
+practice, and that might be worth noting.
+Section 7 also mentions performance improvements for a Prolog
+interpreter. Consider adding a brief explanation of the benefit, since
+that example stands out as a non-numeric (I assume) performance
+Providing source code for the benchmarks measured are needed for others to
+reproduce and build on your results.  I believe this should be the minimum
+standard for publishing measurements such as these.
+I would have liked to have benchmark results for some larger applications.
+When is this optimization effective on a large scale, if ever?
+It isn't clear from the paper, but a reader might conclude that the bulk of the
+time savings are from removing boxing/unboxing operations.
+The benchmark results appear quite impressive -- especially the comparison with
+GCC -- but without additional information, I have no idea what is being
+compared.  Are these results from the same sizes of integers and/or floating
+point results?
+This paper is relatively short, and could be significantly improved with a
+couple of pages of additional information about the details of the benchmarks
+-- both on the Python and on the C side.
 The loop peeling optimization was implemented in the PyPy
 framework in about 450 lines of RPython code. That means that the JIT-compilers generated for all
 interpreters implemented within PyPy now can take advantage of
@@ -899,6 +981,10 @@
 implemented as a C++ class. The other benchmarks are implemented in
 plain C. 
+Include the OS with your benchmark specifications.
 Benchmarks were run on Intel i7 M620 @2.67GHz with 4M cache and 8G of RAM in
 32bit mode.
 The machine was otherwise unoccupied. We use the following software
@@ -942,11 +1028,28 @@
 \section{Related Work}
+First sentence of the related work section is kind of
+unfortunate. It is unclear what the reference at the end of the
+sentence is good for. To support the meaning of the entire sentence?
+Or is it just a reference to the standard loop invariant code motion
+techniques? The contribution of your paper seems much smaller than
+in the former case compared to the latter one. While I have not
+checked the content of the book, I believe the latter is the correct
+interpretation. You should remove this opportunity for
 The effect of combining a one pass optimization with loop peeling gives
 completely standard loop invariant code motion optimizations
 \cite{muchnick_advanced_1997}. We do not claim any novelty in the effect, but
 think that our implementation scheme is a very simple one.
+The discussion of LuaJIT is unsatisfying.  It's not clear to me from that one
+quote that Mike is doing the same thing.  It might be worth including LuaJIT in
+the benchmarks, and/or examining the actual implementation of LuaJIT.
 Mike Pall, the author of LuaJIT\footnote{\texttt{http://luajit.org/}} seems to
 have developped the described technique independently. There are no papers about
 LuaJIT but the author of it writes on a mailing list: ``The LOOP pass does

More information about the pypy-commit mailing list