arigo at codespeak.net arigo at codespeak.net
Sun Jul 15 12:01:38 CEST 2007

Author: arigo
Date: Sun Jul 15 12:01:37 2007
New Revision: 45097

Modified:
Log:
More clarifications based on reviewers' feedback.

==============================================================================
+++ pypy/extradoc/talk/dyla2007/dyla.bib	Sun Jul 15 12:01:37 2007
@@ -20,6 +20,12 @@
url = "http://psyco.sourceforge.net/"
}

+ at misc{ invokedynamic,
+    title = "Java Specification Request 292: Supporting Dynamically Typed Languages on the Java Platform",
+    note = "http://web1.jcp.org/en/jsr/detail?id=292",
+    url = "http://web1.jcp.org/en/jsr/detail?id=292"
+}
+
% P\#

@inproceedings{ psharp,

==============================================================================
+++ pypy/extradoc/talk/dyla2007/dyla.tex	Sun Jul 15 12:01:37 2007
@@ -2,6 +2,7 @@

\usepackage{makeidx}
\usepackage{graphicx}
+\sloppy

\begin{document}

@@ -63,10 +64,18 @@
object model supporting the high level dynamic language's objects.  It
typically provides features like automatic garbage collection.  Recent
languages like Python, Ruby, Perl and JavaScript have complicated
-semantics which are most easily mapped to a simple interpreter operating
-on syntax trees or bytecode; simpler languages like Lisp and Self
-typically have more efficient implementations based on code
-generation.
+semantics which are most easily mapped to a naive interpreter operating
+on syntax trees or bytecode; simpler languages\footnote
+{
+In the sense of the primitive semantics.  Simple'' here is
+as opposed to complicated'', not as opposed to complex'': Common
+Lisp for example is not a small language, but it can at least in theory
+be expressed from a smaller core of primitives.  In Python, all
+primitive operations have complicated semantics.  The argument developed
+in the present paper is more relevant to complicated'' dynamic languages.
+}
+like Lisp, Smalltalk and Self typically have more
+efficient implementations based on code generation.

The effort required to build a new virtual machine is relatively
large.  This is particularly true for languages which are complex
@@ -122,10 +131,10 @@
instead that VMs should not be \emph{written} in the first place -- they
should be generated from simple interpreters written in any suitable
high-level\footnote{High-level'' is taken by opposition to languages
-like PreScheme \cite{kelsey-prescheme} or the subset of Smalltalk that the
-Squeak VM is written in \cite{Squeak} which use the syntax and
+like Scheme48's PreScheme \cite{kelsey-prescheme} or Squeak's \cite{Squeak}
+SLang which use the syntax and
metaprogramming facilities of a high-level language but encode
-low-level details like memory management.} language.
+low-level details like object layout and memory management.} language.

In section \ref{sect:approaches} we will explore the way VMs are typically
implemented in C and on top of OO VMs and some of the problems of these
@@ -237,9 +246,8 @@
\item
\emph{Better GCs:} While this is obvious in theory, OO VMs tend to have a
-Java VM which just loaded Jython consumes XXX MB of non-shared memory
-CPython process fits in XXX MB.
+Java VM which just loaded Jython consumes 34-42 MB of memory, while a
+CPython process fits in 3-4 MB.

\item
\emph{Cross-platform portability:} While this is true to some extent, the
@@ -268,8 +276,8 @@
that are quite unnatural both for the OO VM and for Prolog.

Another important point that makes implementation of languages on top of OO VMs
-harder is that typically OO VMs don't support meta-programming very well, or do so only
-at the bytecode level.
+harder is that typically general-purpose OO VMs don't support meta-programming
+very well, or do so only at the bytecode level.
\end{itemize}

Nevertheless, some of the benefits are real and very useful, the most
@@ -500,14 +508,22 @@
enough and write the dynamic language implementation accordingly so
that most of the bookkeeping work involved in running the dynamic
language can be removed -- dispatching, boxing, unboxing...  However
-this has not been demonstrated yet.
-
-By far the fastest Python implementation, Psyco \cite{psyco-software}, contains a
-hand-written language-specific dynamic compiler.  PyPy's translation
-tool-chain is able to extend the generated VMs with an automatically
-generated dynamic compiler that uses techniques similar to those of Psyco
-\cite{Psyco-paper}, derived from the
-interpreter.  This is achieved by a pragmatic application of partial
+this has not been demonstrated yet.\footnote
+{Still in the draft stage, a proposed
+extension to the Java bytecode \cite{invokedynamic} might help achieve
+better integration between the Java JITs and dynamic language
+implementations running on top of JVMs.}
+
+By far the fastest Python implementation, Psyco \cite{psyco-software}
+contains a hand-written language-specific dynamic compiler.  It works by
+specializing (parts of) Python functions by feeding runtime information
+back into the compiler (typically, but not exclusively, object types).
+The reader is referred to \cite{Psyco-paper} for more details.
+
+PyPy abstracts on this approach: its translation tool-chain is able to
+extend the generated VMs with an \emph{automatically generated} dynamic
+compiler that uses techniques similar to those of Psyco, derived from
+the interpreter.  This is achieved by a pragmatic application of partial
evaluation techniques guided by a few hints added to the source of the
interpreter.  In other words, it is possible to produce a reasonably
good language-specific JIT compiler and insert it into a VM, alongside
@@ -531,16 +547,17 @@

%XXX doesn't look entirely nice
\begin{itemize}
-\item \emph{Do not write dynamic language implementations by hand''.}
-Writing them more abstractly, at a higher level, has primarily only
-advantages, among them the avoidance of a proliferation of diverging
-implementations.  Writing interpreters both flexibly and efficiently
-is difficult and meta-programming is a good way to achieve it.
+\item \emph{High-level languages are suitable to implement dynamic languages.}
+They allow an interpreter to be written more abstractly, which has many
+advantages -- among them the avoidance of a proliferation of diverging
+implementations, and better ways to combine flexibility with efficiency.
Moreover, this is not incompatible with targeting and benefiting from
-existing high-quality object-oriented virtual machines like those of the Java and .NET.
-
+existing high-quality object-oriented virtual machines like those of the
+Java and .NET.

\item \emph{Do not write VMs by hand''.}
+In other words, write an \emph{interpreter} but not a
+\emph{virtual machine} for the language.
Writing language-specific virtual machines is a time-consuming task for
medium to large languages.  Unless large amounts of resources can be
invested, the resulting VMs are bound to have limitations which lead to
@@ -558,8 +575,8 @@
Aside from the advantages described in section
\ref{sect:metaprogramming}, a translation toolchain need not be
standardized for inter-operability but can be tailored to the needs of
-each project.  Diversity is good; there is no need to attempt to
-standardize on a single OO VM.
+each project.  Diversity is good; translation toolchains offset the need
+to attempt to standardize on a single OO VM.
\end{itemize}

The approach we outlined is actually just one in a very large, mostly