[pypy-svn] extradoc extradoc: more reference, a sentence about inlining

Fri Mar 25 23:56:58 CET 2011

Author: Carl Friedrich Bolz <cfbolz at gmx.de>
Branch: extradoc
Changeset: r3405:d1c265ea1d9f
Date: 2011-03-25 22:12 +0100
http://bitbucket.org/pypy/extradoc/changeset/d1c265ea1d9f/

Log:	more reference, a sentence about inlining

diff --git a/talk/icooolps2011/paper.tex b/talk/icooolps2011/paper.tex
--- a/talk/icooolps2011/paper.tex
+++ b/talk/icooolps2011/paper.tex
@@ -93,7 +93,7 @@
 It has long been an objective of the partial evaluation community to
 automatically produce compilers from interpreters. There has been a recent
 renaissance of this idea using the different technique of tracing just-in-time
-compilers. A number of projects have attempted this approach. SPUR \cite{XXX} is
+compilers. A number of projects have attempted this approach. SPUR \cite{bebenita_spur:_2010} is
 a tracing JIT for .NET together with a JavaScript implementation in C\#. PyPy
 \cite{armin_rigo_pypys_2006} contains a tracing JIT for RPython (a restricted
 subset of Python). This JIT is then used to trace a number of languages
@@ -182,14 +182,18 @@
 
 A recently popular approach to JIT compilers is that of tracing JITs. Tracing
 JITs have their origin in the Dynamo project which used the for dynamic
-assembler optimization \cite{XXX}. Later they were used for to implement
-a lightweight JIT for Java \cite{XXX} and for dynamic languages such as
-JavaScript \cite{XXX}.
+assembler optimization \cite{bala_dynamo:_2000}. Later they were used for to implement
+a lightweight JIT for Java \cite{gal_hotpathvm:_2006} and for dynamic languages such as
+JavaScript \cite{gal_trace-based_2009}.
 
 A tracing JIT works by recording traces of concrete execution paths through the
 program. Those
 traces are therefore linear list of operations, which are optimized and then
-get turned into machine code. To be able to do this recording, VMs with a
+get turned into machine code. This recording automatically inlines functions,
+when a function call is encountered the operations of the called functions are
+simply put into the trace too.
+
+To be able to do this recording, VMs with a
 tracing JIT typically contain an interpreter. After a user program is
 started the interpreter is used until the most important paths through the user
 program are turned into machine code. The tracing JIT tries to produce traces

diff --git a/talk/icooolps2011/paper.bib b/talk/icooolps2011/paper.bib
--- a/talk/icooolps2011/paper.bib
+++ b/talk/icooolps2011/paper.bib
@@ -69,7 +69,87 @@
 	author = {Michael Bebenita and Florian Brandner and Manuel Fahndrich and Francesco Logozzo and Wolfram Schulte and Nikolai Tillmann and Herman Venter},
 	year = {2010},
 	keywords = {cil, dynamic compilation, javascript, just-in-time, tracing},
-	pages = {708--725}
+	pages = {708--725},
+	annote = {{\textless}h3{\textgreater}{\textless}a {href="http://morepypy.blogspot.com/2010/07/comparing-spur-to-pypy.html"{\textgreater}Comparing} {SPUR} to {PyPy{\textless}/a{\textgreater}{\textless}/h3{\textgreater}}
+{{\textless}p{\textgreater}Recently,} I've become aware of the {\textless}a {href="http://research.microsoft.com/en-us/projects/spur/"{\textgreater}SPUR} project{\textless}/a{\textgreater} of Microsoft Research and read some of their papers (the tech report {"SPUR:} A {Trace-Based} {JIT} Compiler for {CIL"} is very cool). I found the project to be very interesting and since their approach is in many ways related to what {PyPy} is doing, I now want to compare and contrast the two projects.{\textless}/p{\textgreater}
+{\textless}div id="a-tracing-jit-for-net"{\textgreater}
+{{\textless}h2{\textgreater}A} Tracing {JIT} for {.NET{\textless}/h2{\textgreater}}
+{{\textless}p{\textgreater}SPUR} consist of two parts: On the one hand it is a {VM} for {CIL,} the bytecode of the {.NET} {VM.} This {VM} uses a tracing {JIT} compiler to compile the programs it is running to machine code. As opposed to most existing {VMs} that have a tracing {JIT} it does not use an interpreter at all. Instead it contains various variants of a {JIT} compiler that produce different versions of each method. Those are:{\textless}/p{\textgreater}
+{\textless}ul{\textgreater}
+{\textless}li{\textgreater}a {\textless}em{\textgreater}profiling {JIT{\textless}/em{\textgreater}} which produces code that does lightweight profiling when running the compiled method{\textless}/li{\textgreater}
+{\textless}li{\textgreater}a {\textless}em{\textgreater}tracing {JIT{\textless}/em{\textgreater}} which produces code that produces a trace when running the compiled method{\textless}/li{\textgreater}
+{\textless}li{\textgreater}a {\textless}em{\textgreater}transfer-tail {JIT{\textless}/em{\textgreater}} which is used to produce code which is run to get from a failing guard back to the normal profiling version of a method{\textless}/li{\textgreater}
+{\textless}li{\textgreater}an {\textless}em{\textgreater}optimizing {JIT{\textless}/em{\textgreater}} that actually optimizes traces and turns them into machine code{\textless}/li{\textgreater}
+{\textless}/ul{\textgreater}
+{\textless}div id="optimizations-done-by-the-optimizing-jit"{\textgreater}
+{{\textless}h3{\textgreater}Optimizations} Done by the Optimizing {JIT{\textless}/h3{\textgreater}}
+{{\textless}p{\textgreater}SPUR's} optimizing {JIT} does a number of powerful optimizations on the traces before it turns them into machine code. Among them are usual compiler optimizations such as register allocation, common subexpression elimination, loop invariant code motion, etc.{\textless}/p{\textgreater}
+{{\textless}p{\textgreater}It} also performs some optimizations that are specific to the tracing context and are thus not commonly found in "normal" compilers:{\textless}/p{\textgreater}
+{\textless}ul{\textgreater}
+{\textless}li{\textgreater}{\textless}em{\textgreater}guard implication{\textless}/em{\textgreater}: if a guard is implied by an earlier guard, it is removed{\textless}/li{\textgreater}
+{\textless}li{\textgreater}{\textless}em{\textgreater}guard strengthening{\textless}/em{\textgreater}: if there is a sequence of guards that become stronger and stronger (i.e. each guard implies the previous one), the first guard in the sequence is replaced by the last one, and all others are removed. This can greatly reduce the number of guards and is generally safe. It can shift a guard failure to an earlier point in the trace, but the failure would have occurred at some point in the trace anyway.{\textless}/li{\textgreater}
+{\textless}li{\textgreater}{\textless}em{\textgreater}load/store optimizations{\textless}/em{\textgreater}: this is an optimization for memory reads/writes. If several loads from the same memory location occur without writes in between, all but the first one are removed. Similarly, if a write to a memory location is performed, this write is delayed as much as possible. If there is a write to the same location soon afterwards, the first write can be removed.{\textless}/li{\textgreater}
+{\textless}li{\textgreater}{\textless}em{\textgreater}escape analysis{\textless}/em{\textgreater}: for allocations that occur in a loop, the optimizer checks whether the resulting object escapes the loop. If not, the allocation is moved before the loop, so that only one object needs to be allocated, instead of one every loop iteration.{\textless}/li{\textgreater}
+{\textless}li{\textgreater}{\textless}em{\textgreater}user-controlled loop unrolling{\textless}/em{\textgreater}: not exactly an optimization, but an interesting feature anyway. It is possible to annotate a {CIL} method with a special decorator {{\textless}tt{\textgreater}[TraceUnfold]{\textless}/tt{\textgreater}} and then the tracing {JIT} will fully unroll the loops it contains. This can be useful for loops than are known to run a small and fixed number of iterations for each call-site.{\textless}/li{\textgreater}
+{\textless}li{\textgreater}{\textless}em{\textgreater}user controlled tracing{\textless}/em{\textgreater}: The user can also control tracing up to a point. Methods can be annotated with {{\textless}tt{\textgreater}[NativeCall]{\textless}/tt{\textgreater}} to tell the tracer to never trace their execution. Instead they appear as a direct call in the trace.{\textless}/li{\textgreater}
+{\textless}/ul{\textgreater}
+{\textless}/div{\textgreater}
+{\textless}/div{\textgreater}
+{\textless}div id="a-javascript-implementation"{\textgreater}
+{{\textless}h2{\textgreater}A} {JavaScript} Implementation{\textless}/h2{\textgreater}
+{{\textless}p{\textgreater}In} addition to the tracing {JIT} I just described, {SPUR} also contains a {JavaScript} implementation for {.NET.} The approach of this implementation is to translate {JavaScript} to {CIL} bytecode, doing some amount of type inference to detect variables that have fixed types. All operations where no precise type could be determined are implemented with calls to a {JavaScript} runtime system, which does the necessary type dispatching. The {JavaScript} runtime is implemented in C\#.{\textless}/p{\textgreater}
+{{\textless}p{\textgreater}The} {JavaScript} implementation and the {CLI} {VM} with a tracing {JIT} sound quite unrelated at first, but together they amplify each other. The tracing {JIT} traces the {JavaScript} functions that have been translated to {CLI} bytecode. Since the {JavaScript} runtime is in C\#, it exists as {CLI} bytecode too. Thus it can be inlined into the {JavaScript} functions by the tracer. This is highly beneficial, since it exposes the runtime type dispatching of the {JavaScript} operations to the optimizations of the tracing {JIT.} Particularly the common expression elimination helps the {JavaScript} code. If a series of operations is performed on the same object, the operations will all do the same type checks. All but the type checks of the first operation can be removed by the optimizer.{\textless}/p{\textgreater}
+{\textless}div id="performance-results"{\textgreater}
+{{\textless}h3{\textgreater}Performance} Results{\textless}/h3{\textgreater}
+{{\textless}p{\textgreater}The} speed results of the combined {JavaScript} implementation and tracing {JIT} are quite impressive. It beats {TraceMonkey} for most benchmarks in {SunSpider} (apart from some string-heavy benchmarks that are quite slow) and can compete with V8 in many of them. However, all this is steady-state performance and it seems {SPUR's} compile time is rather bad currently.{\textless}/p{\textgreater}
+{\textless}/div{\textgreater}
+{\textless}div id="further-possibilities"{\textgreater}
+{{\textless}h3{\textgreater}Further} Possibilities{\textless}/h3{\textgreater}
+{{\textless}p{\textgreater}A} further (so far still hypothetical) advantage of {SPUR} is that the approach can optimize cases where execution crosses the border of two different systems. If somebody wrote an {HTML} layout engine and a {DOM} in C\# to get a web browser and integrated it with the {JavaScript} implementation described above, the tracing {JIT} could optimize {DOM} manipulations performed by {JavaScript} code as well as callbacks from the browser into {JavaScript} code.{\textless}/p{\textgreater}
+{{\textless}p{\textgreater}Of} course the approach {SPUR} takes to implement {JavaScript} is completely generalizable. It should be possible to implement other dynamic languages in the same way as {JavaScript} using {SPUR.} One would have to write a runtime system for the language in C\#, as well as a compiler from the language into {CIL} bytecode. Given these two elements, {SPUR's} tracing {JIT} compiler would probably do a reasonable job at optimizing this other language (of course in practise, the language implementation would need some tweaking and annotations to make it really fast).{\textless}/p{\textgreater}
+{\textless}/div{\textgreater}
+{\textless}/div{\textgreater}
+{\textless}div id="comparison-with-pypy"{\textgreater}
+{{\textless}h2{\textgreater}Comparison} With {PyPy{\textless}/h2{\textgreater}}
+{{\textless}p{\textgreater}The} goals of {PyPy} and {SPUR} are very similar. Both projects want to implement dynamic languages in an efficient way by using a tracing {JIT.} Both apply the tracing {JIT} "one level down", i.e. the runtime system of the dynamic language is visible to the tracing {JIT.} This is the crucial point of the approach of both projects. Since the runtime system of the dynamic language is visible to the tracing {JIT,} the {JIT} can optimize programs in that dynamic language. It does not itself need to know about the semantics of the dynamic language. This makes the tracing {JIT} usable for a variety of dynamic languages. It also means that the two halves can be implemented and debugged independently.{\textless}/p{\textgreater}
+{{\textless}p{\textgreater}In} {SPUR,} C\# (or another language that is compilable to {CIL)} plays the role of {RPython,} and {CIL} is equivalent to the intermediate format that {PyPy's} translation toolchain uses. Both formats operate on a similar abstraction level, they are quite close to C, but still have support for the object system of their respective language and are garbage-collected.{\textless}/p{\textgreater}
+{{\textless}p{\textgreater}SPUR} supports only a {JavaScript} implementation so far, which could maybe change in the future. Thus {JavaScript} in {SPUR} corresponds to Python in {PyPy,} which was the first dynamic language implemented in {PyPy} (and is also the reason for {PyPy's} existence).{\textless}/p{\textgreater}
+{{\textless}p{\textgreater}There} are obviously also differences between the two projects, although many of them are only skin-deep. The largest difference is the reliance of {SPUR} on compilers on all levels. {PyPy} takes the opposite approach of using interpreters almost everywhere. The parts of {PyPy} that correspond to {SPUR's} compilers are {(I} will use the Python implementation of {PyPy} as an example):{\textless}/p{\textgreater}
+{\textless}ul{\textgreater}
+{\textless}li{\textgreater}the {{\textless}em{\textgreater}JavaScript-to-CIL} compiler{\textless}/em{\textgreater} corresponds to the Python interpreter of {PyPy{\textless}/li{\textgreater}}
+{\textless}li{\textgreater}the {\textless}em{\textgreater}profiling {JIT{\textless}/em{\textgreater}} corresponds to a part of {PyPy's} translation toolchain which adds some profiling support in the process of turning {RPython} code into C code,{\textless}/li{\textgreater}
+{\textless}li{\textgreater}the {\textless}em{\textgreater}tracing {JIT{\textless}/em{\textgreater}} corresponds to a special interpreter in the {PyPy} {JIT} which executes an {RPython} program and produces a trace of the execution{\textless}/li{\textgreater}
+{\textless}li{\textgreater}the {\textless}em{\textgreater}transfer-tail {JIT{\textless}/em{\textgreater}} corresponds to {PyPy's} {\textless}a href="http://morepypy.blogspot.com/2010/06/blackhole-interpreter.html"{\textgreater}blackhole interpreter{\textless}/a{\textgreater}, also called fallback interpreter{\textless}/li{\textgreater}
+{\textless}li{\textgreater}the {\textless}em{\textgreater}optimizing {JIT{\textless}/em{\textgreater}} corresponds to the optimizers and backends of {PyPy's} {JIT{\textless}/li{\textgreater}}
+{\textless}/ul{\textgreater}
+{\textless}div id="pypy-s-optimizations"{\textgreater}
+{{\textless}h3{\textgreater}PyPy's} Optimizations{\textless}/h3{\textgreater}
+{{\textless}p{\textgreater}Comparing} the optimizations that the two projects perform, the biggest difference is that {PyPy} does "trace stitching" instead of fully supporting trace trees. The difference between the two concerns what happens when a new trace gets added to an existing loop. The new trace starts from a guard in the existing loop that was observed to fail often. Trace stitching means that the loop is just patched with a jump to the new trace. {SPUR} instead recompiles the whole trace tree, which gives the optimizers more opportunities, but also makes compilation a lot slower. Another difference is that {PyPy} does not perform loop-invariant code motion yet.{\textless}/p{\textgreater}
+{{\textless}p{\textgreater}Many} of the remaining optimizations are very similar. {PyPy} supports guard implication as well as guard strengthening. It has some load/store optimizations, but {PyPy's} alias analysis is quite rudimentary. On the other hand, {PyPy's} escape analysis is very powerful. {PyPy} also has support for the annotations that {SPUR} supports, using some decorators in the {\textless}tt{\textgreater}pypy.rlib.jit{\textless}/tt{\textgreater} module. User-controlled loop unrolling is performed using the {\textless}tt{\textgreater}unroll\_safe{\textless}/tt{\textgreater} decorator, tracing of a function can be disabled with the {\textless}tt{\textgreater}dont\_look\_inside{\textless}/tt{\textgreater} decorator.{\textless}/p{\textgreater}
+{{\textless}p{\textgreater}PyPy} has a few more annotations that were not mentioned in the {SPUR} tech report. Most importantly, it is possible to declare a function as pure, using the {\textless}tt{\textgreater}purefunction{\textless}/tt{\textgreater} decorator. {PyPy's} optimizers will remove calls to a function decorated that way if the arguments to the call are all constant. In addition it is possible to declare instances of classes to be immutable, which means that field accesses on constant instances can be folded away. Furthermore there is the promote hint, which is spelled {\textless}tt{\textgreater}x = hint(x, {promote=True){\textless}/tt{\textgreater}.} This will produce a guard in the trace, to turn {\textless}tt{\textgreater}x{\textless}/tt{\textgreater} into a constant after the guard.{\textless}/p{\textgreater}
+{\textless}/div{\textgreater}
+{\textless}/div{\textgreater}
+{\textless}div id="summary"{\textgreater}
+{{\textless}h2{\textgreater}Summary{\textless}/h2{\textgreater}}
+{{\textless}p{\textgreater}Given} the similarity between the projects' goals, it is perhaps not so surprising to see that {PyPy} and {SPUR} have co-evolved and reached many similar design decisions. It is still very good to see another project that does many things in the same way as {PyPy.{\textless}/p{\textgreater}}
+{\textless}/div{\textgreater}}
+},
+
+ at inproceedings{gal_trace-based_2009,
+	address = {New York, {NY,} {USA}},
+	series = {{PLDI} '09},
+	title = {Trace-based just-in-time type specialization for dynamic languages},
+	isbn = {978-1-60558-392-1},
+	location = {Dublin, Ireland},
+	doi = {10.1145/1542476.1542528},
+	abstract = {Dynamic languages such as {JavaScript} are more difficult to compile than statically typed ones. Since no concrete type information is available, traditional compilers need to emit generic code that can handle all possible type combinations at runtime. We present an alternative compilation technique for dynamically-typed languages that identifies frequently executed loop traces at run-time and then generates machine code on the fly that is specialized for the actual dynamic types occurring on each path through the loop. Our method provides cheap inter-procedural type specialization, and an elegant and efficient way of incrementally compiling lazily discovered alternative paths through nested loops. We have implemented a dynamic compiler for {JavaScript} based on our technique and we have measured speedups of 10x and more for certain benchmark programs.},
+	booktitle = {{ACM} {SIGPLAN} Notices},
+	publisher = {{ACM}},
+	author = {Andreas Gal and Brendan Eich and Mike Shaver and David Anderson and David Mandelin and Mohammad R Haghighat and Blake Kaplan and Graydon Hoare and Boris Zbarsky and Jason Orendorff and Jesse Ruderman and Edwin W Smith and Rick Reitmaier and Michael Bebenita and Mason Chang and Michael Franz},
+	year = {2009},
+	note = {{ACM} {ID:} 1542528},
+	keywords = {code generation, design, dynamically typed languages, experimentation, incremental compilers, languages, measurement, performance, run-time environments, trace-based compilation},
+	pages = {465{\textendash}478}
 },
 
 @article{bolz_allocation_2011,
@@ -86,6 +166,20 @@
 	pages = {43{\textendash}52}
 },
 
+ at article{gal_trace-based_2009-1,
+	series = {{PLDI} '09},
+	title = {Trace-based just-in-time type specialization for dynamic languages},
+	location = {Dublin, Ireland},
+	doi = {10.1145/1542476.1542528},
+	abstract = {Dynamic languages such as {JavaScript} are more difficult to compile than statically typed ones. Since no concrete type information is available, traditional compilers need to emit generic code that can handle all possible type combinations at runtime. We present an alternative compilation technique for dynamically-typed languages that identifies frequently executed loop traces at run-time and then generates machine code on the fly that is specialized for the actual dynamic types occurring on each path through the loop. Our method provides cheap inter-procedural type specialization, and an elegant and efficient way of incrementally compiling lazily discovered alternative paths through nested loops. We have implemented a dynamic compiler for {JavaScript} based on our technique and we have measured speedups of 10x and more for certain benchmark programs.},
+	journal = {{ACM} {SIGPLAN} Notices},
+	author = {Andreas Gal and Brendan Eich and Mike Shaver and David Anderson and David Mandelin and Mohammad R Haghighat and Blake Kaplan and Graydon Hoare and Boris Zbarsky and Jason Orendorff and Jesse Ruderman and Edwin W Smith and Rick Reitmaier and Michael Bebenita and Mason Chang and Michael Franz},
+	year = {2009},
+	note = {{ACM} {ID:} 1542528},
+	keywords = {code generation, design, dynamically typed languages, experimentation, incremental compilers, languages, measurement, performance, run-time environments, trace-based compilation},
+	pages = {465{\textendash}478}
+},
+
 @inproceedings{chang_tracing_2009,
 	address = {Washington, {DC,} {USA}},
 	title = {Tracing for Web 3.0: Trace Compilation for the Next Generation Web Applications},
@@ -156,14 +250,6 @@
 	annote = {{{\textless}p{\textgreater}The} paper evaluates the various ways in which a number of Java papers do their Java benchmarks. It then proposes a statistically correct way to do this and compares common approaches against the statistically correct way. Especially if the results of two alternatives are very close together, many common approaches can lead to systematic errors.{\textless}/p{\textgreater}}
 },
 
- at inproceedings{andreas_gal_trace-based_2009,
-	title = {Trace-based {Just-in-Time} Type Specialization for Dynamic Languages},
-	booktitle = {{PLDI}},
-	author = {Andreas Gal and Brendan Eich and Mike Shaver and David Anderson and Blake Kaplan and Graydon Hoare and David Mandelin and Boris Zbarsky and Jason Orendorff and Michael Bebenita and Mason Chang and Michael Franz and Edwin Smith and Rick Reitmaier and Mohammad Haghighat},
-	year = {2009},
-	keywords = {toappear}
-},
-
 @inproceedings{bolz_tracing_2009,
 	address = {Genova, Italy},
 	title = {Tracing the meta-level: {PyPy's} tracing {JIT} compiler},