[pypy-svn] rev 1362 - pypy/trunk/doc/funding

arigo at codespeak.net arigo at codespeak.net
Fri Sep 19 12:55:14 CEST 2003

Author: arigo
Date: Fri Sep 19 12:55:13 2003
New Revision: 1362

First try at this one.

Modified: pypy/trunk/doc/funding/B1.0_objectives.txt
--- pypy/trunk/doc/funding/B1.0_objectives.txt	(original)
+++ pypy/trunk/doc/funding/B1.0_objectives.txt	Fri Sep 19 12:55:13 2003
@@ -1,9 +1,12 @@
-B.1  Scientific and technological objectives of the project and state of 
-     the art  In the event of inconsistency between information given in 
+B.1 Scientific and technological objectives of the project and state of the art
+     In the event of inconsistency between information given in 
      Part A and that  given in Part B, the Part A version will prevail
-Describe in detail the proposed project's S&T objectives. The
-objectives should be those achievable within the project, not through
+Describe in detail the proposed project's Scientific & Technical objectives.
+The objectives should be those achievable within the project, not through
 subsequent development, and should be stated in a measurable and
 verifiable form. The progress of the project work will be measured
 against these goals in later reviews and assessments. Describe the
@@ -11,4 +14,243 @@
 will enhance the state-of-the-art in that area. 
 (Recommended length ­ three pages)
+Interpretation is a common implementation technique for a wide range of
+programming languages whose primary objective is not efficiency.
+Interpreters are well-suited for small-to-medium languages, favouring a
+simple and clear engine.
+As a language becomes popular, however, the costs inherent to this
+approach are increasingly pointed out. These costs are twofold. Firstly,
+no matter how much effort is made to optimize the core components, the
+remaining interpretative overhead is large, both in term of processor
+cycles and memory consumption. Secondly, interpreters are notoriously
+difficult to factor into relatively independent components, which makes
+structural changes to a large interpreter cumbersome. While the
+efficiency costs are well-known, the costs inherent to the lack
+flexibility [REF] are often underestimated by the community of language
+users. In practice, what often occurs for newly popular languages is
+that the user base creates pressure about the efficiency issue and
+motivates a group of developers to write a native Just In Time (JIT)
+compiler, adding a significant amount of code and complexity and further
+impairing the flexibility.
+The flexibility goal
+The flexibility problem is pernicious. It affects almost all aspects of
+the language implementation:
+* user pressure tends to shape interpreters towards performance at the
+expense of other aspects, like small footprints and code simplicity.
+* there are a lot of other trade-offs; experimenting with different
+solutions can be close to impossible for aspects that affect the
+interpreter globally. These aspects are usually frozen in early design
+decisions (memory management, object model, multithreading,...).
+* optimization drives the interpreter code base, an error-prone and
+scope-limiting process. It is impossible to reuse the code base for
+implementation on new, non-C platforms; you need to start from scratch.
+Moreover, conflicting with performance and manageable complexity, there
+is a large number of aspects and features that users would like to see
+integrated in the interpreter (or would even do so themselves, if it
+were possible at all). These features are well-established as middleware
+tools that can be accessed from common languages, but would benefit a
+lot from direct support from the language implementation -- which they
+generally only have in a couple of experimental languages. Some
+* distributed execution (SMP or Networked)
+* persistency
+* FIXME list our application-level goals here
+The first major goal of the PyPy project is to produce an interpreter
+with flexibility in mind, and put this flexibility to good use. More
+specifically, our objectives are:
+1. to create a implementation of the Python language in Python.
+2. to implement a subset of the above-mentioned features (FIXME which
+ones?) as an immediately useful proof-of-concept.
+3. to allow third-parties to implement any other interpreter-level feature
+more easily than they would in a traditional interpreter.
+The main building piece behind the proposed flexibility is -- besides
+using a high-level language in the first place -- the concept of Object
+Space, which captures at the level of the interpreter the semantics of
+individual operations between objects. The interpreter's main dispatch
+loop handles control flow and supporting data structures (frame stack,
+bytecode objects...); each individual operation is dispatched to the
+object space. Object spaces are pluggable, which means that we can very
+easily replace an object space implementing the normal Python semantics
+with another one doing different things, like accessing remote objects
+over a network, keeping logs, measuring performance or doing persistence
+-- all things that are better implemented in the interpreter itself, so
+that they can be fully transparent to the user of the language.
+More theoretically, replacing an object space with another one
+performing abstract operations turns our interpreter into an abstract
+interpreter at no effort. Such a use will be shown below.
+The performance goal, phase 1
+Flexibility is often associated with poor performances, although
+theoretically, high-level languages are more expressive and thus are
+open to a wider range of optimization than low-level languages, and
+should then outperform them. Of course this trend is not exactly
+confirmed in practice so far; the only examples that one could give are
+derived from bad or buggy low-level algorithms being repeatedly
+reimplemented in C programs while higher-level languages natively
+provide the correct algorithms. Also note that languages can be static
+or dynamic independently of being high or low-level: high-level
+expressive languages can be static and efficient [OCAML] while
+inexpressive ones can be dynamic and much less efficient [PROLOG].
+The PyPy project -- phase one -- aims to write a Python interpreter
+using Python itself as the implementation language for flexibility, with
+the additional constraint that core parts should be written in a more
+static subset of the language, a restriction that opens the door to
+static analysis and translation to a low-level language like C. This
+approach has already been taken, e.g. for the Scheme language [REF], but
+we have an original approach to the translation process that we will
+describe later.
+The net result will be an interpreter whose performance is comparable to
+today's C-based implementation, but more flexible. The translation is
+not a one-shot: the maintained source is the Python one. Moreover,
+aspects that were design decisions in the current C Python are now
+merely customizable behaviour of the translator. In other words, the
+translator is not merely a restricted-Python-to-C translator; it is an
+essential piece towards the flexibility goals of the previous section.
+Aspects like memory layout of objects are not defined by the PyPy Python
+source, but derived from it by the translator with the support of a
+small run-time system written in C. It is thus expected that the
+translator should be a highly configurable tool, with pluggable
+components, that will allow us (and third-parties) to produce customized
+C versions of PyPy with tailored trade-offs. Moreover, the exact set of
+Python source that the translator will be applied to can be varied as
+well, producing C versions with different capabilities.
+Thus, our objectives are:
+4. to produce several different C versions for different identified usages
+of Python;
+5. to allow third-parties to produce customized C version easily;
+6. to adapt the translator to target other non-C platforms (Java and/or
+.NET and/or other).
+Moreover, we will also consider targetting a virtual machine that
+already has a high-performance JIT compiler, possibly Self (see below).
+JIT compilers
+JIT compilers have been reasonably well studied; an account of their
+history is given in [REF:jit-history.ps]. But actually writing a JIT for
+a given language is generally a major task. Different techniques to ease
+this path have been recently explored; let us cite:
+* to implement a dynamic language in a flexible way, it can be written
+on top of another one, e.g. as a dynamic translator that produces
+bytecodes for an existing virtual machine. If the virtual machine is
+already equipped with state-of-the-art JIT compilation, it is possible
+to leverage its benefits to the new language. This path is not only much
+sorter than designing a complete custom JIT, but it is also easier to
+maintain and evolve. This idea is explored for the Self virtual machine
+in [REF:http://research.sun.com/research/kanban/oopsla-vm-wkshp.pdf]. As
+pointed out in that paper, some source language features may not match
+any of the target virtual machine features. When this issue araises, we
+are left with the hard problem of refactoring an efficient JIT-based
+virtual machine.
+* a completely different approach: making it easier to derive a JIT from
+an existing C interpreter. DynamoRIO [REF] instrumentates the execution
+of compiled programs and optimizes them dynamically. It has been
+extended with specific support for interpreters. With minimal amounts of
+changes in the source of an interpreter, it can significantly reduce the
+processor-time interpretative overhead
+[REF:http://www.ai.mit.edu/~gregs/dynamorio.htm]. While this offers a
+highly competitive gain/effort ratio, performance does not reach the
+levels of a hand-crafted JIT.
+The performance goal, phase 2
+Building on phase 1, we aim to add a JIT compiler to PyPy. This will be
+done using a different technique than described in the previous
+paragraph: our goal is to *generate* the JIT compiler from the
+high-level Python source of PyPy.
+This will be accomplished by integrating the technology developed in
+Psyco [REF], a prototype JIT for the Python programming language. It is
+more precisely a specializing JIT compiler based on abstract
+Although Psyco was hand-written, large parts have a one-to-one
+correspondence to whole sections of the C Python interpreter. This is
+uncommon for JITs, but Psyco's good results give ample proof-of-concept.
+Moreover, this property can be explicitely related to the DynamoRIO
+project cited above, which instrumentates the interpreter itself.
+Indeed, the one-to-one correspondance between parts of C Python and
+Psyco is as follows: to each expression in C Python corresponds a more
+complex expression in Psyco, which does instrumentation and
+optimizations. The difference with DynamoRIO is that the latter analyses
+at run-time the machine code resulting from the compilation of the
+interpreter. In Psyco the one-to-one correspondance is with the C source
+instead. It could certainly have been done by automated analysis of the
+C source of C Python, but this is difficult and error-prone.
+In the PyPy project, on the other hand, the source code of the
+interpreter will be in Python instead of C. It is much easier to analyse
+safely. Thus, the plan is to raise the level further and rely on yet
+another customization of the translator, to let it emit the complex
+Psyco-style expressions instead of (or in addition to) the normal C
+expressions that would be the direct translation.
+The last objective is thus:
+7. to provide a flexible Python interpreter running at JIT speed.
+The translator
+Finally, if this were still needed, let us further demonstrate the
+benefits of writing an interpreter flexibly in a high-level language. So
+far, the translator sounds like a magic piece of difficult-to-write
+software. This is not so. Analysing pieces of Python source code would
+not be a complex task anyway. Some type inference is needed, but it can
+be kept simple by suitably restricting the amount of allowed dynamism.
+However, instead of analysing source code, we propose another approach:
+doing the type inference by abstract interpretation [REF]. This is
+essentially the way Psyco works, albeit it does it at run-time. [FIXME:
+has it already been done elsewhere?] [otherwise explain in a FOOTNOTE:
+the essential idea is to "interpret" the Python source code that we want
+to translate; all manipulated objects are abstracted into their type
+only. Operations between two such "abstract objects" return a new
+"abstract object", which represents the type of the result.] As we have
+seen above, abstract interpretation is easily acheived by plugging a
+custom object space into the PyPy code base. The translator can be built
+as an extension of this "type inference" object space, emitting
+translated code as a side-effect along the analysis. Thus the translator
+is nothing more than PyPy itself with a custom object space: when you
+use this special version of PyPy to "interpret" a Python function, it is
+in fact translated...
+In summary, to translate PyPy itself to C, we will be using PyPy!

More information about the Pypy-commit mailing list