[pypy-svn] r80287 - in pypy/extradoc/talk/microsoft-jan-2011: . figures
cfbolz at codespeak.net
cfbolz at codespeak.net
Sat Feb 5 04:29:27 CET 2011
Author: cfbolz
Date: Sat Feb 5 04:29:25 2011
New Revision: 80287
Added:
pypy/extradoc/talk/microsoft-jan-2011/
pypy/extradoc/talk/microsoft-jan-2011/figures/
pypy/extradoc/talk/microsoft-jan-2011/figures/add.dot
pypy/extradoc/talk/microsoft-jan-2011/figures/add.pdf (contents, props changed)
pypy/extradoc/talk/microsoft-jan-2011/talk.pdf (contents, props changed)
pypy/extradoc/talk/microsoft-jan-2011/talk.tex
Log:
talk that I gave at Microsoft Research on Monday
Added: pypy/extradoc/talk/microsoft-jan-2011/figures/add.dot
==============================================================================
--- (empty file)
+++ pypy/extradoc/talk/microsoft-jan-2011/figures/add.dot Sat Feb 5 04:29:25 2011
@@ -0,0 +1,31 @@
+digraph G {
+ a -> b_int [label="Integer"];
+ a -> b_float [label="Float"];
+ a -> b_long [label="Long"];
+ a -> b_string [label="String"];
+ a -> b_list [label="List"];
+ a -> b_tuple [label="Tuple"];
+ a -> b_user [label="User defined"];
+
+ a [label="a", shape=box];
+ b_int [label="b", shape=box];
+ b_float [label="b", shape=box];
+ b_long [label="b", shape=box];
+ b_string [label="b", shape=box];
+ b_list [label="b", shape=box];
+ b_tuple [label="b", shape=box];
+ b_user [label="b", shape=box];
+
+ b_int -> int [label="Integer"];
+ b_int -> float [label="Float"];
+ b_int -> long [label="Long"];
+ b_int -> user [label="User defined"];
+
+ b_float -> float [label="Integer"];
+ b_long -> long [label="Integer"];
+
+ int [label="Integer"];
+ float [label="Float"];
+ long [label="Long"];
+ user [label="User defined"];
+}
Added: pypy/extradoc/talk/microsoft-jan-2011/figures/add.pdf
==============================================================================
Binary file. No diff available.
Added: pypy/extradoc/talk/microsoft-jan-2011/talk.pdf
==============================================================================
Binary file. No diff available.
Added: pypy/extradoc/talk/microsoft-jan-2011/talk.tex
==============================================================================
--- (empty file)
+++ pypy/extradoc/talk/microsoft-jan-2011/talk.tex Sat Feb 5 04:29:25 2011
@@ -0,0 +1,1067 @@
+\documentclass[utf8x]{beamer}
+
+\mode<presentation>
+{
+ \usetheme{Warsaw}
+
+ %\setbeamercovered{transparent}
+}
+
+
+\usepackage[english]{babel}
+
+\usepackage[utf8x]{inputenc}
+
+\usepackage{times}
+\usepackage[T1]{fontenc}
+\usepackage{ifthen}
+\usepackage{fancyvrb}
+\usepackage{color}
+\usepackage{ulem}
+\usepackage{listings}
+\usepackage{amssymb}
+\usepackage{booktabs}
+\usepackage{wrapfig}
+\usepackage{url}
+\usepackage{alltt}
+\usepackage{tikz}
+\usetikzlibrary{arrows,positioning,shadows,shapes,calc,backgrounds,fit}
+
+%\input{pygmentheader.tex}
+
+\title[PyPy Implements Dynamic Languages Using a Tracing JIT]{
+PyPy's Approach to \\
+Implementing Dynamic Languages \\
+Using a Tracing JIT Compiler}
+
+\author{Carl Friedrich Bolz}
+
+\institute[Heinrich-Heine-Universität Düsseldorf]
+{
+ Institut für Informatik\\
+ Heinrich-Heine-Universität Düsseldorf
+}
+
+\date{
+%\includegraphics[scale=0.25]{figures/hpi_logo_kl.jpg}\\
+%IBM Watson Research Center, February 9th, 2011
+Microsoft Research, January 31st, 2011
+}
+
+%\pgfdeclareimage[height=0.5cm]{pypy-logo}{image/py-web.png}
+%\logo{\pgfuseimage{pypy-logo}}
+
+
+
+% Delete this, if you do not want the table of contents to pop up at
+% the beginning of each subsection:
+%\AtBeginSubsection[]
+%{
+% \begin{frame}<beamer>
+% \frametitle{Outline}
+% \tableofcontents[currentsection,currentsubsection]
+% \end{frame}
+%}
+
+
+% If you wish to uncover everything in a step-wise fashion, uncomment
+% the following command:
+
+%\beamerdefaultoverlayspecification{<+->}
+
+
+\begin{document}
+
+\begin{frame}
+ \titlepage
+\end{frame}
+
+
+\begin{frame}
+ \frametitle{Scope}
+ This talk is about:
+
+ \begin{itemize}
+ \item implementing dynamic languages \par(with a focus on complicated ones)
+ \item in a context of limited resources \par(academic, open source, or
+ domain-specific)
+ \item imperative, object-oriented languages
+ \item single-threaded implementations
+ \end{itemize}
+ \pause
+ \begin{block}{Goals}
+ Reconciling:
+ \begin{itemize}
+ \item flexibility, maintainability (because languages evolve)
+ \item simplicity (because teams are small)
+ \item performance
+ \end{itemize}
+ \end{block}
+\end{frame}
+
+\begin{frame}
+ \frametitle{Outline}
+ \tableofcontents[pausesections]
+ % You might wish to add the option [pausesections]
+\end{frame}
+
+\section{The Difficulties of Implementing Dynamic Languages}
+
+\subsection{Technical Factors}
+
+\begin{frame}
+ \frametitle{What is Needed Anyway}
+ A lot of things are not really different from other languages:
+ \begin{itemize}
+ \item lexer, parser
+ \item (bytecode) compiler
+ \item garbage collector
+ \item object system
+ \end{itemize}
+\end{frame}
+
+% say, but don't write:
+% from pypy perspective, but trying to do a general analysis
+% feedback/discussion welcome
+% XXX random thought: a lot of semantics implemented on VM level, operations
+% XXX other direction: PyPy has granularity issues in opposite direction
+
+
+\begin{frame}
+ \frametitle{Control Flow}
+ \begin{itemize}
+ \item every language implementation needs a way to implement the control flow of the language
+ \item trivially and slowly done in interpreters with AST or bytecode
+ \item technically very well understood
+ \item sometimes small difficulties, like generators in Python
+ \pause
+ \item some languages have more complex demands\\
+ but this is rare
+ \item examples: Prolog
+ \end{itemize}
+\end{frame}
+
+\begin{frame}
+ \frametitle{Late Binding}
+ \begin{itemize}
+ \item lookups can be done only at runtime
+ \item historically, dynamic languages have moved to ever later binding times
+ \item a large variety of mechanisms exist in various languages
+ \item mechanism are often very ad-hoc because \\
+ "it was easy to do in an interpreter"
+ \end{itemize}
+\end{frame}
+
+\begin{frame}
+ \frametitle{Late Binding in Python}
+ In Python, the following things are late bound
+ \begin{itemize}
+ \item global names
+ \item modules
+ \item instance variables
+ \item methods
+ \pause
+ \item the class of objects
+ \item class hierarchy
+ \end{itemize}
+\end{frame}
+
+
+\begin{frame}
+ \frametitle{Dispatching}
+ \begin{itemize}
+ \item dispatching is a very important special case of late binding
+ \item how are the operations on objects implemented?
+ \item this is usually very complex, and different between languages
+ \pause
+ \item operations are internally split up into one or several lookup and call steps
+ \item a huge space of paths ensues
+ \item most of the paths are uncommon
+ \end{itemize}
+\end{frame}
+
+\begin{frame}
+ \frametitle{Example: Attribute Reads in Python}
+ What happens when an attribute \texttt{x.m} is read? (simplified)
+ \begin{itemize}
+ \item check for the presence of \texttt{x.\_\_getattribute\_\_}, if there, call it
+ \pause
+ \item look for the name of the attribute in the object's dictionary, if it's there, return it
+ \pause
+ \item walk up the MRO of the object and look in each class' dictionary for the attribute
+ \pause
+ \item if the attribute is found, call its \texttt{\_\_get\_\_} attribute and return the result
+ \pause
+ \item if the attribute is not found, look for \texttt{x.\_\_getattr\_\_}, if there, call it
+ \pause
+ \item raise an \texttt{AttributeError}
+ \end{itemize}
+\end{frame}
+
+\begin{frame}
+ \frametitle{Example: Addition in Python}
+ \includegraphics[scale=0.5]{figures/add.pdf}
+\end{frame}
+
+\begin{frame}
+ \frametitle{Dependencies Between Subsequent Dispatches}
+ \begin{itemize}
+ \item one dispatch operation is complex
+ \item many in a sequence are worse
+ \item take \texttt{(a + b) + c}
+ \item the dispatch decision of the first operation influences the second
+ \end{itemize}
+\end{frame}
+
+
+
+
+\begin{frame}
+ \frametitle{Boxing of Primitive Values}
+ \begin{itemize}
+ \item primitive values often need to be boxed,\\
+ to ensure uniform access
+ \item a lot of pressure is put on the GC by arithmetic
+ \item need a good GC (clear anyway)
+ \item in arithmetic, lifetime of boxes is known
+ \end{itemize}
+% \pause
+% \begin{block}{Escape Analysis?}
+% \begin{itemize}
+% \item should help in theory
+% \item nearly always there are some unlikely paths,\\
+% that lead to escapes
+% \item this is often due to user-overriding of operations
+% \end{itemize}
+% \end{block}
+\end{frame}
+
+\begin{frame}
+ \frametitle{Escaping Paths}
+ \begin{itemize}
+ \item considering again \texttt{(a + b) + c}
+ \item assume \texttt{a} and \texttt{b} are ints
+ \item then the result should not be allocated
+ \item escaping path: if \texttt{c} has a user-defined class
+ \end{itemize}
+\end{frame}
+
+
+
+\begin{frame}
+ \frametitle{(Frames)}
+ \begin{itemize}
+ \item side problem:
+ \item many languages have reified frame access
+ \item e.g. Python, Smalltalk, Ruby, ...
+ \item support for in-language debuggers
+ \item in an interpreter these are trivial,\\
+ because the interpreter needs them anyway
+ \item how should reified frames work efficiently when a compiler is used?
+ \end{itemize}
+\end{frame}
+
+%\begin{frame}
+% \frametitle{(Fast Access to Native Libraries)}
+% \begin{itemize}
+% \item another side problem
+% \item how can native libraries be accessed efficiently?
+% \item important in dynamic languages, because they are often used as glue
+% \item even more important in browser, to interact with the DOM
+% \item can be solved either by compiling extensions, or via an FFI
+% \item both not very efficient
+% \end{itemize}
+%\end{frame}
+
+\subsection{Requirements}
+
+\begin{frame}
+ \frametitle{Summarizing the Requirements}
+ \begin{enumerate}
+ \item control flow
+ \item late binding
+ \item \textbf{dispatching}
+ \item \textbf{dependencies between subsequent dispatches}
+ \item boxing
+ \item (reified frames)
+% \item (access to native libraries)
+ \end{enumerate}
+\end{frame}
+
+
+\section{Approaches For Dynamic Language Implementation}
+
+\begin{frame}
+ \frametitle{Common Approaches to Language Implementation}
+ \begin{itemize}
+ \item Using C/C++
+ \begin{itemize}
+ \item for an interpreter
+ \item for a static compiler
+ \item for a method-based JIT
+ \item for a tracing JIT
+ \end{itemize}
+ \item Building on top of a general-purpose OO VM
+ \end{itemize}
+\end{frame}
+
+\begin{frame}
+ \frametitle{Common Approaches to Language Implementation}
+ \begin{block}{
+ Using C/C++}
+ \begin{itemize}
+ \item CPython (interpreter)
+ \item Ruby (interpreter)
+ \item V8 (method-based JIT)
+ \item TraceMonkey (tracing JIT)
+ \item ...
+ %\item but also: Scheme48, Squeak (interpreters)
+ \end{itemize}
+ \end{block}
+ \pause
+ \begin{block}{Building on top of a general-purpose OO VM}
+ \begin{itemize}
+ \item Jython, IronPython
+ \item JRuby, IronRuby
+ \item various Prolog, Lisp, even Smalltalk implementations
+ \end{itemize}
+ \end{block}
+\end{frame}
+
+\subsection{Implementing VMs in C/C++}
+
+\begin{frame}
+ \frametitle{Implementing VMs in C}
+ When writing a VM in C it is hard to reconcile our goals
+ \begin{itemize}
+ \item flexibility, maintainability
+ \item simplicity
+ \item performance
+ \end{itemize}
+ \pause
+ \begin{block}{Python Case}
+ \begin{itemize}
+ \item \alert{CPython} is a very simple bytecode VM, performance not great
+ \item \alert{Psyco} is a just-in-time-specializer, very complex, hard to
+ maintain, but good performance
+ \item \alert{Stackless} is a fork of CPython adding microthreads. It was
+ never incorporated into CPython for complexity reasons
+ \end{itemize}
+ \end{block}
+\end{frame}
+
+\begin{frame}
+ \frametitle{Interpreters in C/C++}
+ \begin{itemize}
+ \item mostly very easy
+ \item well understood problem
+ \item portable, maintainable
+ \item slow
+ \end{itemize}
+
+\end{frame}
+
+
+%\begin{frame}
+% \frametitle{Fixing of Early Design Decisions}
+% \begin{itemize}
+% \item when starting a VM in C, many design decisions need to be made upfront
+% \item examples: memory management technique, threading model
+% \item such decisions are manifested throughout the VM source
+% \item very hard to change later
+% \item low level details mixed with language semantics
+% \end{itemize}
+% \pause
+% \begin{block}{Python Case}
+% \begin{itemize}
+% \item CPython uses reference counting, increfs and decrefs everywhere
+% \item CPython uses OS threads with one global lock, hard to change to
+% lightweight threads or finer locking
+% \end{itemize}
+% \end{block}
+%\end{frame}
+
+
+\begin{frame}
+ \frametitle{How do Interpreters Meet the Requirements?}
+ \begin{enumerate}
+ \item \alert{control flow slowed by bytecode or AST dispatch overhead}
+ \item \alert{late binding works but slow}
+ \item \alert{dispatching works but slow}
+ \item \alert{no special dependencies support}
+ \item \alert{everything is boxed}
+ \item \alert{reified frames are easy but slow}
+% \item \alert{access to native libraries easy but slow}
+ \end{enumerate}
+\end{frame}
+
+
+
+\begin{frame}
+ \frametitle{Static Compilers to C/C++}
+ \begin{itemize}
+ \item first reflex of many people is to blame it all on bytecode dispatch overhead
+ \item thus static compilers are implemented that reuse the object model of an interpreter
+ \item gets rid of interpretation overhead only
+ \item seems to give about 2x speedup
+ \pause
+ \item dispatch, late-binding and boxing only marginally improved
+ \item static analysis mostly never works
+% \item often easy access to native libraries
+ \end{itemize}
+ \pause
+ \begin{block}{Python Case}
+ \begin{itemize}
+ \item \alert{Cython}, \alert{Pyrex} are compilers\\
+ from large subsets of Python to C
+ \item lots of older experiments, most discontinued
+ \end{itemize}
+ \end{block}
+\end{frame}
+
+
+\begin{frame}
+ \frametitle{How do Static Compilers Meet the Requirements?}
+ \begin{enumerate}
+ \item control flow works well
+ \item \alert{late binding not improved}
+ \item \alert{dispatching not improved}
+ \item \alert{dependencies not improved}
+ \item \alert{everything is boxed}
+ \item \alert{reified frames often not supported}
+% \item direct access to native libraries
+ \end{enumerate}
+\end{frame}
+
+
+\subsection{Method-Based JIT Compilers}
+
+\begin{frame}
+ \frametitle{Method-Based JIT Compilers}
+ \begin{itemize}
+ \item to fundamentally attack some of the problems, \\
+ a dynamic compiler is needed
+ \item a whole new can of worms
+ \pause
+ \begin{itemize}
+ \item type profiling
+ \item inlining based on that
+ \item general optimizations
+ \item complex backends
+ \end{itemize}
+ \item very hard to pull off for a volunteer team
+ \end{itemize}
+ \pause
+ \begin{block}{Examples}
+ \begin{itemize}
+ \item Smalltalk and SELF JITs
+ \item V8 and JägerMonkey
+ \item Psyco, sort of
+ \end{itemize}
+ \end{block}
+\end{frame}
+
+\begin{frame}
+ \frametitle{Compilers are a bad encoding of Semantics}
+ \begin{itemize}
+ \item to improve all complex corner cases of the language, \\
+ a huge effort is needed
+ \item often needs a big "bag of tricks"
+ \item the interactions between all tricks is hard to foresee
+ \item the encoding of language semantics in the compiler is thus often obscure and hard to change
+ \end{itemize}
+ \pause
+ \begin{block}{
+ Python Case}
+ \begin{itemize}
+ \item Psyco is a dynamic compiler for Python
+ \item synchronizing with CPython's development is a lot of effort
+ \item many of CPython's new features not supported well
+ \item not ported to 64-bit machines, and probably never will
+ \end{itemize}
+ \end{block}
+\end{frame}
+
+\begin{frame}[containsverbatim]
+ \frametitle{Method-Based JITs and Dispatching Dependencies}
+\begin{verbatim}
+ x = add(a, b)
+ r = add(x, c)
+\end{verbatim}
+\end{frame}
+
+\begin{frame}[containsverbatim, plain]
+ \frametitle{Method-Based JITs and Dispatching Dependencies}
+\begin{verbatim}
+ if isinstance(a, Integer):
+ if isinstance(b, Integer):
+ x = Integer(<perform int addition>)
+ else:
+ x = Float(<perform float addition>)
+ else:
+ x = Float(<perform float addition>)
+ # x can be Float or Integer here
+ if isinstance(x, Integer):
+ if isinstance(c, Integer):
+ r = Integer(<perform int addition>)
+ else:
+ r = Float(<perform float addition>)
+ else:
+ r = Float(<perform float addition>)
+\end{verbatim}
+\end{frame}
+
+\begin{frame}
+ \frametitle{How do Method-Based JIT Compilers Meet the Requirements?}
+ \begin{enumerate}
+ \item control flow works well
+ \item late binding can be handled
+ \item dispatching can be handled
+ \item \alert{dependencies not necessarily improved}
+ \item boxing hard to support
+ \item \alert{reified frames hard to support}
+% \item \alert{access to native libraries not improved}
+ \end{enumerate}
+\end{frame}
+
+
+
+\subsection{Tracing JIT Compilers}
+
+\begin{frame}
+ \frametitle{Tracing JIT Compilers}
+ \begin{itemize}
+ \item relatively recent approach to JIT compilers
+ \item pioneered by Michael Franz and Andreas Gal for Java
+ \item turned out to be well-suited for dynamic languages
+ \end{itemize}
+ \pause
+ \begin{block}{Examples}
+ \begin{itemize}
+ \item \alert{TraceMonkey}
+ \item \alert{LuaJIT}
+ \item \alert{SPUR}, sort of
+ \item \alert{PyPy}, sort of
+ \end{itemize}
+ \end{block}
+\end{frame}
+
+\begin{frame}
+ \frametitle{Tracing JIT Compilers}
+ \begin{itemize}
+ \item idea from Dynamo project: \\
+ dynamic rewriting of machine code
+ \item conceptually simpler than type profiling
+ \end{itemize}
+ \pause
+ \begin{block}{Basic Assumption of a Tracing JIT}
+ \begin{itemize}
+ \item programs spend most of their time executing loops
+ \item several iterations of a loop are likely to take similar code paths
+ \end{itemize}
+ \end{block}
+\end{frame}
+
+\begin{frame}
+ \frametitle{Tracing VMs}
+ \begin{itemize}
+ \item mixed-mode execution environment
+ \item at first, everything is interpreted
+ \item lightweight profiling to discover hot loops
+ \item code generation only for common paths of hot loops
+ \item when a hot loop is discovered, start to produce a trace
+ \end{itemize}
+\end{frame}
+
+\begin{frame}
+ \frametitle{Tracing}
+ \begin{itemize}
+ \item a \emph{trace} is a sequential list of operations
+ \item a trace is produced by recording every operation the interpreter executes
+ \item tracing ends when the tracer sees a position in the program it has seen before
+ \item a trace thus corresponds to exactly one loop
+ \item that means it ends with a jump to its beginning
+ \end{itemize}
+ \pause
+ \begin{block}{Guards}
+ \begin{itemize}
+ \item the trace is only one of the possible code paths through the loop
+ \item at places where the path \emph{could} diverge, a guard is placed
+ \end{itemize}
+ \end{block}
+\end{frame}
+
+\begin{frame}
+ \frametitle{Code Generation and Execution}
+ \begin{itemize}
+ \item being linear, the trace can easily be turned into machine code
+ \item execution stops when a guard fails
+ \item after a guard failure, go back to interpreting program
+ \end{itemize}
+\end{frame}
+
+\begin{frame}
+ \frametitle{Dealing With Control Flow}
+ \begin{itemize}
+ \item an if statement in a loop is turned into a guard
+ \item if that guard fails often, things are inefficient
+ \item solution: attach a new trace to a guard, if it fails often enough
+ \item new trace can lead back to same loop
+ \item or to some other loop
+ \end{itemize}
+\end{frame}
+
+%\begin{frame}
+% \frametitle{Stages of Execution}
+% \includegraphics[scale=0.5]{figures/tracing.pdf}
+%\end{frame}
+
+\begin{frame}
+ \frametitle{Dispatching in a Tracing JIT}
+ \begin{itemize}
+ \item trace contains bytecode operations
+ \item bytecodes often have complex semantics
+ \item optimizer often type-specializes the bytecodes
+ \item according to the concrete types seen during tracing
+ \item need to duplicate language semantics in optimizer for that
+ \end{itemize}
+\end{frame}
+
+
+\begin{frame}
+ \frametitle{Dispatching Dependencies in a Tracing JIT}
+ \begin{itemize}
+ \item one consequence of the tracing approach:
+ \item paths are split aggressively
+ \item control flow merging happens at beginning of loop only
+ \item after a type check, the rest of the trace can assume that type
+ \item only deal with paths that are actually seen
+ \end{itemize}
+\end{frame}
+
+\begin{frame}
+ \frametitle{Boxing Optimizations in a Tracing JIT}
+ \begin{itemize}
+ \item possibility to do escape analysis within the trace
+ \item only optimize common path
+ \item i.e. the one where the object doesn't escape
+ \end{itemize}
+\end{frame}
+
+
+\begin{frame}
+ \frametitle{Advantages of Tracing JITs}
+ \begin{itemize}
+ \item can be added to an existing interpreter unobtrusively
+ \item interpreter does most of the work
+ \item automatic inlining
+ \item deals well with finding the few common paths through the large space
+ \end{itemize}
+\end{frame}
+
+\begin{frame}
+ \frametitle{Bad Points of the Approach}
+ \begin{itemize}
+ \item switching between interpretation and machine code execution takes time
+ \item problems with really complex control flow
+ \item granularity issues: often interpreter bytecode is too coarse
+ \item if this is the case, the optimizer needs to carefully re-add the decision tree
+ \end{itemize}
+\end{frame}
+
+\begin{frame}
+ \frametitle{How do Tracing JITs Meet the Requirements?}
+ \begin{enumerate}
+ \item control flow works rather well
+ \item late binding can be handled
+ \item dispatching can be handled
+ \item dependencies improved by path splitting
+ \item unboxing optimization much simpler
+ \item reified frames can be implemented \\
+ by falling back to the interpreter
+% \item \alert{access to native libraries not improved}
+ \end{enumerate}
+\end{frame}
+
+
+\subsection{Building on Top of an OO VM}
+
+\begin{frame}
+ \frametitle{Implementing Languages on Top of OO VMs}
+ \begin{itemize}
+ \item approach: implement on top of the JVM or the CLR
+ \item usually by compiling to the target bytecode
+ \item plus an object model implementation
+ \item brings its own set of benefits of problems
+ \end{itemize}
+ \pause
+ \begin{block}{
+ Python Case}
+ \begin{itemize}
+ \item \alert{Jython} is a Python-to-Java-bytecode compiler
+ \item \alert{IronPython} is a Python-to-CLR-bytecode compiler
+ \end{itemize}
+ \end{block}
+\end{frame}
+
+\begin{frame}
+ \frametitle{Benefits of Implementing on Top of OO VMs}
+ \begin{itemize}
+ \item higher level of implementation
+ \item the VM supplies a GC and a JIT
+ \item better interoperability than what the C level provides
+ \end{itemize}
+ \pause
+ \begin{block}{
+ Python Case}
+ \begin{itemize}
+ \item both Jython and IronPython integrate well with their host OO VM
+ \item both have proper threading
+ \end{itemize}
+ \end{block}
+\end{frame}
+
+\begin{frame}
+ \frametitle{The Problems of OO VMs}
+ \begin{itemize}
+ \item often hard to map concepts of the dynamic language
+ \item performance not improved because of the semantic mismatch
+ \item untypical code in most object models
+ \item object model typically has many megamorphic call sites
+ \pause
+ \item escape analysis cannot help with boxing,\\
+ due to escaping paths
+ \item to improve, very careful manual tuning is needed
+ \item VM does not provide enough customization/feedback
+ \end{itemize}
+\end{frame}
+
+\begin{frame}
+ \frametitle{Examples of Problems}
+ \begin{itemize}
+ \item both Jython and IronPython are quite a bit slower than CPython
+ \item IronPython misses reified frames
+ \pause
+ \item for languages like Prolog it is even harder to map the concepts
+ \end{itemize}
+\end{frame}
+
+
+\begin{frame}
+ \frametitle{The Future of OO VMs?}
+ \begin{itemize}
+ \item the problems described might improve in the future
+ \item JVM will add extra support for more languages
+ \item i.e. tail calls, \texttt{InvokeDynamic}, ...
+ \item has not really landed yet
+ \item good performance needs a huge amount of tweaking
+ \item controlling the VM's behaviour is brittle:\\
+ VMs not meant for people who care about exact shape of assembler
+ \end{itemize}
+ \pause
+ \begin{block}{Ruby Case}
+ \begin{itemize}
+ \item JRuby tries really hard to be a very good implementations
+ \item took an enormous amount of effort
+ \item tweaking is essentially Hotspot-specific
+ \end{itemize}
+ \end{block}
+\end{frame}
+
+\begin{frame}
+ \frametitle{How do OO VMs Meet the Requirements?}
+ \begin{enumerate}
+ \item control flow works well
+ \item late binding can be handled with a lot of effort
+ \item dispatching can be handled with a lot of effort
+ \item \alert{dependencies not improved}
+ \item \alert{boxing not necessarily improved}
+ \item \alert{reified frames are inefficient}
+% \item access to native libraries not improved, but VM libraries help
+ \end{enumerate}
+\end{frame}
+
+\section{PyPy's Approach to VM Construction}
+
+
+\begin{frame}
+ \frametitle{The PyPy Project}
+ \begin{itemize}
+ \item started in 2003, received funding from the EU, Google, Nokia and some smaller companies
+ \item goal: "The PyPy project aims at producing a flexible and fast Python implementation."
+ \item technology should be reusable for other dynamic languages
+ \end{itemize}
+ \pause
+ \begin{block}{Language Status}
+ \begin{itemize}
+ \item the fastest Python implementation, very complete
+ \item contains a reasonably good Prolog
+ \item full Squeak, but no JIT for that yet
+ \item various smaller experiments (JavaScript, Scheme, Haskell)
+ \end{itemize}
+
+ \end{block}
+\end{frame}
+
+\begin{frame}
+ \frametitle{PyPy's Approach to VM Construction}
+ \emph{Goal: achieve flexibility, simplicity and performance together}
+
+ \begin{itemize}
+ \item Approach: auto-generate VMs from high-level descriptions of the language
+ \item ... using meta-programming techniques and \emph{aspects}
+ \item high-level description: an interpreter written in a high-level language
+ \item ... which we translate (i.e.\ compile) to a VM running in various target
+ environments, like C/Posix\pause, CLR, JVM
+ \end{itemize}
+\end{frame}
+
+\begin{frame}
+ \frametitle{PyPy}
+ \begin{itemize}
+ \item PyPy = Python interpreter written in RPython + translation toolchain
+ for RPython
+ \end{itemize}
+ \pause
+ \begin{block}{What is RPython}
+ \begin{itemize}
+ \item RPython is a (large) subset of Python
+ \item subset chosen in such a way that type-inference can be performed
+ \item still a high-level language (unlike SLang or PreScheme)
+ \end{itemize}
+ \end{block}
+\end{frame}
+
+\begin{frame}
+ \frametitle{Auto-generating VMs}
+ \begin{itemize}
+ \item we need a custom \emph{translation toolchain} to compile the interpreter
+ to a full VM
+ \item many aspects of the final VM are orthogonal from the interpreter source:
+ they are inserted during translation
+ \end{itemize}
+ \pause
+ \begin{block}{
+ Examples}
+ \begin{itemize}
+ \item Garbage Collection strategy
+% \item Threading models (e.g.\ coroutines with CPS...)
+ \item non-trivial translation aspect: auto-generating a tracing JIT compiler from
+ the interpreter
+ \end{itemize}
+ \end{block}
+\end{frame}
+
+
+\begin{frame}
+ \frametitle{Good Points of the Approach}
+ {\bf Simplicity:} separation of language semantics from low-level details
+ \pause
+
+ {\bf Flexibility} high-level implementation language eases things (meta-programming)
+ \pause
+
+ {\bf Performance:} ``reasonable'' baseline performance, can be very good with JIT
+\end{frame}
+
+
+\begin{frame}[plain]
+ \includegraphics[scale=0.3]{figures/all_numbers.png}
+\end{frame}
+
+\subsection{PyPy's Meta-Tracing JIT Compiler}
+
+%\begin{frame}
+% \frametitle{Example}
+%XXX the idea of this section is to have a running example of a small
+%interpreter that is extended with language constructs to provide examples for
+%all the technologies
+%
+%\end{frame}
+
+\begin{frame}
+ \frametitle{Meta-Tracing}
+ Problems of Tracing JITs:
+ \begin{itemize}
+ \item specific to one language's bytecode
+ \item bytecode has wrong granularity
+ \item internals of an operation not visible in trace
+ \end{itemize}
+ \pause
+ \begin{block}{PyPy's Idea:}
+ \begin{itemize}
+ \item write interpreters in RPython
+ \item trace the execution of the RPython code
+ \item using one generic RPython tracer
+ \item the process is customized via hints in the interpreter
+ \item no language-specific bugs
+ \end{itemize}
+ \end{block}
+\end{frame}
+
+\begin{frame}
+ \frametitle{Interpreter Overhead}
+ \begin{itemize}
+ \item most immediate problem with meta-tracing
+ \item interpreter typically has a bytecode dispatch loop
+ \item not a good idea to trace that
+ \pause
+ \item solved by a simple trick:
+ \item \emph{unroll} the bytecode dispatch loop
+ \item control flow then taken care of
+ \end{itemize}
+\end{frame}
+
+\begin{frame}
+ \frametitle{Optimizing Late Binding and Dispatching}
+ \begin{itemize}
+ \item late binding and dispatching code in the interpreter is traced
+ \item as in a normal tracing JIT, the meta-tracer is good at picking common paths
+ \item a number of hints to fine-tune the process
+ \end{itemize}
+\end{frame}
+
+\begin{frame}
+ \frametitle{Optimizing Boxing Overhead}
+ \begin{itemize}
+ \item boxing optimized by a powerful general optimization on traces
+ \item tries to defer allocations for as long as possible
+ \item allocations only happen in those (rare) paths where they are needed
+ \end{itemize}
+ \pause
+ \begin{block}{Use Cases}
+ \begin{itemize}
+ \item arithmetic
+ \item argument holder objects
+ \item frames of inlined functions
+ \end{itemize}
+ \end{block}
+\end{frame}
+
+\begin{frame}
+ \frametitle{Dealing With Reified Frames}
+ \begin{itemize}
+ \item interpreter needs a frame object to store its data anyway
+ \item those frame objects are specially marked
+ \item JIT special-cases them
+ \item their attributes can live in CPU registers/stack
+ \item on reflective access, machine code is left, interpreter continues
+ \pause
+ \item nothing deep, but a lot of engineering
+ \end{itemize}
+\end{frame}
+
+\begin{frame}
+ \frametitle{Feedback from the VM}
+ \begin{itemize}
+ \item in the beginning the hints are often not optimal yet
+ \item to understand how to improve them, the traces must be read
+ \item traces are in a machine-level intermediate representation
+ \item not machine code
+ \item corresponds quite closely to RPython interpreter code
+ \item visualization and profiling tools
+ \end{itemize}
+\end{frame}
+
+
+%\subsection{Customizing the Tracing of the Interpreter}
+%
+%\begin{frame}
+% \frametitle{Customizing the Tracing of the Interpreter}
+% - general building blocks for implementing many different dynamic languages well
+%
+% - more expedient than minimal
+%
+% without hints, behaviour is correct but not very fast
+%\end{frame}
+%
+%
+%\begin{frame}
+% \frametitle{The Extent of Tracing}
+% \begin{itemize}
+% \item tracer always starts at bytecode dispatch loop
+% \item at any interpreter-level function call, the tracer can decide to trace into call, or not
+% \item tracing should not continue arbitrarily deep into implementation of operations
+% \pause
+% \item basic mechanism: stop at functions that contain loops
+% \item hint to stop tracing explicitly
+% \item hint to force tracing into function with loops, unrolling them
+% \end{itemize}
+%\end{frame}
+%
+%\begin{frame}
+% \frametitle{Example}
+% XXX
+%\end{frame}
+%
+%
+%\begin{frame}
+% \frametitle{Mechanisms for Dispatching and Late-Binding}
+% - tracing time computation
+%
+% - promotion, turn a variable into a constant to insert a guard
+%
+% - immutability, to fold away reads out of constants
+%
+% - pure function hints, remove function calls if the arguments are pure
+%
+%\end{frame}
+%
+%\begin{frame}
+% \frametitle{Example Promotion}
+% XXX maps
+%\end{frame}
+%
+%\begin{frame}
+% \frametitle{Example Immutability}
+% XXX code objects
+%\end{frame}
+%
+%
+%\begin{frame}
+% \frametitle{Example Pure Function}
+% XXX version tags
+%\end{frame}
+%
+%\begin{frame}
+% \frametitle{Out-of-Line Guards}
+% - soon: slow-changing fields into out-of-line guards
+%\end{frame}
+%
+%\begin{frame}
+% \frametitle{Example Out-of-Line Guards}
+% XXX version tags
+%\end{frame}
+
+\begin{frame}
+ \frametitle{Drawbacks / Open Issues / Further Work}
+ \begin{itemize}
+ \item writing the translation toolchain in the first place takes lots of effort
+ (but it can be reused)
+ \item writing a good GC was still necessary, not perfect yet
+ \item dynamic compiler generation seems to work now, but took \emph{very} long to get right
+ \item granularity of tracing is sometimes not optimal, very low level
+ \end{itemize}
+\end{frame}
+
+\begin{frame}
+ \frametitle{Conclusion}
+ \begin{itemize}
+ \item PyPy solves many of the problems of dynamic language implementations
+ \item it uses a high-level language
+ \begin{itemize}
+ \item to ease implementation
+ \item for better analyzability
+ \end{itemize}
+ \item it gives good feedback to the language implementor
+ \item and provides various mechanisms to express deeply different language semantics
+ \pause
+ \item only one solution in this design space (SPUR is another)
+ \item more experiments needed
+ \end{itemize}
+\end{frame}
+
+
+\end{document}
+
+
More information about the Pypy-commit
mailing list