[pypy-svn] rev 1659 - pypy/trunk/doc/funding

pedronis at codespeak.net pedronis at codespeak.net
Fri Oct 10 13:40:56 CEST 2003

Author: pedronis
Date: Fri Oct 10 13:40:56 2003
New Revision: 1659

extracted unused longer text for objectives.

Added: pypy/trunk/doc/funding/long_old_objectives.txt
--- (empty file)
+++ pypy/trunk/doc/funding/long_old_objectives.txt	Fri Oct 10 13:40:56 2003
@@ -0,0 +1,317 @@
+Interpretation is a common implementation technique for a wide range of
+programming languages whose primary objective is not efficiency.
+Interpreters are well-suited for small-to-medium languages, favouring a
+simple and clear engine. XXX aren't there "large" interpreted languages? 
+XXX what is a large language? example?
+XXX armin et.al..: isn't the key in the following para that today's language design is 
+"monolithic" and not configurable, i.e. all bigger programming languages are not 
+designed in a modular way and thus give the user no choice which components/aspects 
+they want to use for their target application?  But that today's applications
+are to be developed for an increasing variety of devices and environments (Big-Iron, cars, 
+watches, mobile phones, banks, whatever).  What about a security-goal btw? 
+XXX nico: modularity in langage is a very interesting feature. see below.
+As a language becomes popular, however, the costs inherent to this
+approach are increasingly pointed out. These costs are twofold. Firstly,
+no matter how much effort is made to optimize the core components, the
+remaining interpretative overhead is large, both in term of processor
+cycles and memory consumption. Secondly, interpreters are notoriously
+difficult to factor into relatively independent components, which makes
+structural changes to a large interpreter cumbersome. While the
+efficiency costs are well-known, the costs inherent to the lack
+flexibility [LHJ95]_ [H98]_ are often underestimated by the community of language
+users. In practice, what often occurs for newly popular languages is
+that the user base creates pressure about the efficiency issue and
+motivates a group of developers to write a native Just In Time (JIT)
+compiler, adding a significant amount of code and complexity and further
+impairing the flexibility. [S03]_
+The Flexibility Goal
+The flexibility problem is pernicious. It affects almost all aspects of
+the language implementation:
+* user pressure tends to shape interpreters towards performance at the
+  expense of other aspects, like small footprints and code simplicity.
+* there are a lot of other trade-offs; experimenting with different
+  solutions can be close to impossible for aspects that affect the
+  interpreter globally. These aspects are usually frozen in early design
+  decisions (memory management, object model, multithreading,...).
+* optimisation drives the interpreter code base, an error-prone and
+  scope-limiting process. It is impossible to reuse the code base for
+  implementation on new, non-C platforms; you need to start from scratch.
+Moreover, conflicting with performance and manageable complexity, there
+is a large number of aspects and features that users would like to see
+integrated in the interpreter (or would even do so themselves, if it
+were possible at all). These features are well-established as middleware
+tools that can be accessed from common languages, but would benefit a
+lot from direct support from the language implementation -- which they
+generally only have in a couple of experimental languages. Some
+* distributed execution (SMP or Networked)
+* persistency
+* FIXME list our application-level goals here
+* XXX nico: could talk about security, logic programming, aspect-oriented programming, etc.
+The first major goal of the PyPy project is to produce an interpreter
+with flexibility in mind, and put this flexibility to good use. More
+specifically, our objectives are:
+1. to create a implementation of the Python language in Python.
+2. to implement a subset of the above-mentioned features (FIXME which
+   ones?) as an immediately useful proof-of-concept.
+3. to allow third-parties to implement any other interpreter-level feature
+   more easily than they would in a traditional interpreter.
+The main building piece behind the proposed flexibility is -- besides
+using a high-level language in the first place -- the concept of Object
+Space, which captures at the level of the interpreter the semantics of
+individual operations between objects. The interpreter's main dispatch
+loop handles control flow and supporting data structures (frame stack,
+bytecode objects...); each individual operation is dispatched to the
+object space. Object spaces are pluggable, which means that we can very
+easily replace an object space implementing the normal Python semantics
+with another one doing different things, like accessing remote objects
+over a network, keeping logs, measuring performance or doing persistence
+-- all things that are better implemented in the interpreter itself, so
+that they can be fully transparent to the user of the language.
+More theoretically, replacing an object space with another one
+performing abstract operations turns our interpreter into an abstract
+interpreter at no effort. Such a use will be shown below.
+The Performance Goal, Phase 1
+Flexibility is often associated with poor performances, although
+theoretically, high-level languages are more expressive and thus are
+open to a wider range of optimization than low-level languages, and
+should then outperform them. Of course this trend is not exactly
+confirmed in practice so far; the only examples that one could give are
+derived from bad or buggy low-level algorithms being repeatedly
+reimplemented in C programs while higher-level languages natively
+provide the correct algorithms.
+Also note that languages can be static
+or dynamic independently of being high or low-level: high-level
+expressive languages can be static and efficient [OCAML] while
+inexpressive ones can be dynamic and much less efficient [PROLOG].
+The PyPy project -- phase one -- aims to write a Python interpreter
+using Python itself as the implementation language for flexibility, with
+the additional constraint that core parts should be written in a more
+static subset of the language, a restriction that opens the door to
+static analysis and translation to a low-level language like C. This
+approach has already been taken, e.g. for the Scheme [K97]_
+and Squeak [IKM97]_ language, but we have an original approach
+to the translation process that we will describe later.
+The net result will be an interpreter whose performance is comparable to
+today's C-based implementation, but more flexible. The translation is
+not a one-shot: the maintained source is the Python one. Moreover,
+aspects that were design decisions in the current C Python are now
+merely customizable behaviour of the translator. In other words, the
+translator is not merely a restricted-Python-to-C translator; it is an
+essential piece towards the flexibility goals of the previous section.
+Aspects like memory layout of objects are not defined by the PyPy Python
+source, but derived from it by the translator with the support of a
+small run-time system written in C. It is thus expected that the
+translator should be a highly configurable tool, with pluggable
+components, that will allow us (and third-parties) to produce customised
+C versions of PyPy with tailored trade-offs. Moreover, the exact set of
+Python source that the translator will be applied to can be varied as
+well, producing C versions with different capabilities.
+Thus, our objectives are:
+4. to produce several different C versions for different identified usages
+   of Python;
+5. to allow third-parties to produce customised C version easily;
+6. to adapt the translator to target other non-C platforms (Java and/or
+   .NET and/or other).
+Moreover, we will also consider targetting a virtual machine that
+already has a high-performance JIT compiler, possibly Self (see below).
+JIT compilers
+JIT compilers have been reasonably well studied; an account of their
+history is given in [A03]_ . But actually writing a JIT for
+a given language is generally a major task. Different techniques to ease
+this path have been recently explored; let us cite:
+* to implement a dynamic language in a flexible way, it can be written
+  on top of another one, e.g. as a dynamic translator that produces
+  bytecodes for an existing virtual machine. If the virtual machine is
+  already equipped with state-of-the-art JIT compilation, it is possible
+  to leverage its benefits to the new language. This path is not only much
+  shorter than designing a complete custom JIT, but it is also easier to
+  maintain and evolve. This idea is explored for the Self virtual machine
+  in [WAU99]_ . 
+  As pointed out in that paper, some source language features may not match
+  any of the target virtual machine features. When this issue araises, we
+  are left with the hard problem of refactoring an efficient JIT-based
+  virtual machine.
+  XXX state-of-the-art in Self: [HCU91]_
+  [HU94]_ .
+* a completely different approach: making it easier to derive a JIT from
+  an existing C interpreter. DynamoRIO instrumentates the execution
+  of compiled programs and optimizes them dynamically. It has been
+  extended with specific support for interpreters. With minimal amounts of
+  changes in the source of an interpreter, it can significantly reduce the
+  processor-time interpretative overhead [S03]_ . While this offers a
+  highly competitive gain/effort ratio, performance does not reach the
+  levels of a hand-crafted JIT.
+The Performance Goal, Phase 2
+Building upon phase 1, we aim to add a JIT compiler to PyPy. This will be
+done using a different technique than described in the previous
+paragraph: our goal is to *generate* the JIT compiler from the
+high-level Python source of PyPy.
+This will be accomplished by integrating the technology developed in
+Psyco [R03]_ , a prototype JIT for the Python programming language.
+XXX [APJ03]_
+It is
+more precisely a specializing JIT compiler based on abstract
+Although Psyco was hand-written, large parts have a one-to-one
+correspondence to whole sections of the C Python interpreter. This is
+uncommon for JITs, but Psyco's good results give ample proof-of-concept.
+Moreover, this property can be explicitely related to the DynamoRIO
+project cited above, which instrumentates the interpreter itself.
+Indeed, the one-to-one correspondence between parts of C Python and
+Psyco is as follows: to each expression in C Python corresponds a more
+complex expression in Psyco, which does instrumentation and
+optimisations. The difference with DynamoRIO is that the latter analyses
+the machine code resulting from the compilation of the interpreter
+at run-time. In Psyco, the one-to-one correspondance is with the C source
+instead. It could certainly have been done by automated analysis of the
+C source of C Python, but this is difficult and error-prone.
+In the PyPy project, on the other hand, the source code of the interpreter
+will be written in Python instead of C. This is much easier to analyse
+safely. Thus, the plan is to raise the level further and rely on yet
+another customisation of the translator, to let it emit the complex
+Psyco-style expressions instead of (or in addition to) the normal C
+expressions that would be the direct translation.
+The last objective is thus:
+7. to provide a flexible Python interpreter running at JIT speed.
+The Translator
+Finally, let us further demonstrate the
+benefits of writing an interpreter flexibly in a high-level language. So
+far, the translator sounds like a magic piece of difficult-to-write
+software. This is not so for analysing pieces of Python source code is
+not a very complex task. Some type inference is needed, but it can
+be kept simple by suitably restricting the amount of allowed dynamism.
+XXX rewrite all this stuff
+However, instead of analysing source code, we propose another approach:
+doing the type inference by abstract interpretation. This is
+essentially the way Psyco works, albeit it does it at run-time. [FIXME:
+has it already been done elsewhere?] [otherwise explain in a FOOTNOTE:
+the essential idea is to "interpret" the Python source code that we want
+to translate; all manipulated objects are abstracted into their type
+only. Operations between two such "abstract objects" return a new
+"abstract object", which represents the type of the result.] As we have
+seen above, abstract interpretation is easily achieved by plugging a
+custom object space into the PyPy code base. The translator can be built
+as an extension of this "type inference" object space, emitting
+translated code as a side-effect along the analysis. Thus the translator
+is nothing more than PyPy itself with a custom object space: when you
+use this special version of PyPy to "interpret" a Python function, it
+becomes in fact translated...
+In summary, to translate PyPy itself to C, we will be using PyPy!
+.. [S03] Gregory Sullivan et. al., "Dynamic Native Optimization of Native Interpreters". IVME 03. 2003.
+   http://www.ai.mit.edu/~gregs/dynamorio.html
+.. file = ivme03.pdf 
+.. [LHJ95] Sheng Liang, Paul Hudak and Mark Jones. "Monad Transformers
+   and Modular Interpreters". 22nd ACM Symposium on Principles of Programming
+   Languages (POPL'95). January 1995.
+   http://java.sun.com/people/sl/papers/popl95.ps.gz
+.. file = popl95.ps.gz
+.. [H98] Paul Hudak. "Modular Domain Specific Languages and Tools". ICSR 98. 1998.
+   http://haskell.org/frp/dsl.ps
+.. file = dsl.ps.gz
+.. [K97] Richard Kelsey, "Pre-Scheme: A Scheme Dialect for Systems Programming". 1997.
+   http://citeseer.nj.nec.com/kelsey97prescheme.html
+.. file = kelsey97prescheme.pdf
+.. [IKM97] Dan Ingalls, Ted Kaehler, John Maloney, Scott Wallace, and Alan Kay. "Back to the future: The story of Squeak, A practical Smalltalk written in itself." In Proceedings OOPSLA'97, pages 318--326, November 1997.
+   ftp://st.cs.uiuc.edu/Smalltalk/Squeak/docs/OOPSLA.Squeak.html
+.. file = OOPSLA.Squeak.htm
+.. [A03] John Aycock, "A Brief History of Just-In-Time", ACM Computing Surveys 35, 2 (June 2003), pp. 97-113.
+.. file = jit-history.pdf
+.. [WAU99] Mario Wolczko, Ole Agesen, David Ungar, 
+   "Towards a Universal Implementation Substrate for Object-Oriented Languages", OOPSLA 99 workshop
+   on Simplicity, Performance and Portability in Virtual Machine Design,
+   OOPSLA '99, Denver, CO, Nov 2, 1999.
+   http://research.sun.com/research/kanban/oopsla-vm-wkshp.pdf
+.. file = oopsla-vm-wkshp.pdf
+.. [R03] Armin Rigo, http://psyco.sourceforge.net
+.. file = psycoguide.ps.gz
+.. [APJ03] John Aycock and David Pereira and Georges Jodoin, 
+   "UCPy: Reverse-Engineering Python", PyCon DC 2003, 9pp.
+   http://pages.cpsc.ucalgary.ca/~aycock/papers/ucpy.pdf
+.. file = ucpy-reverse-engineering-python.pdf
+.. [HCU91] Urs Hölzle, Craig Chambers, and David Ungar,
+   "Optimizing Dynamically-Typed Object-Oriented Languages with Polymorphic
+   Inline Caches", ECOOP'91 Conference Proceedings, Geneva, 1991. Published
+   as Springer Verlag Lecture Notes in Computer Science 512, Springer Verlag,
+   Berlin, 1991.
+   http://self.sunlabs.com/papers/ecoop91.ps.Z
+.. file = hlzle91optimizing.ps.gz
+.. [HU94] Urs Hölzle, David Ungar,
+   "Reconciling Responsiveness with Performance in Pure Object-Oriented Languages",
+   PLDI '94 and OOPSLA '94
+   http://www.cs.ucsb.edu/oocsb/papers/toplas96.pdf/reconciling-responsiveness-with-performance.pdf
+.. file = reconciling-responsiveness-with-performance.pdf
+.. [KLM97] Gregor Kiczales and John Lamping and
+   Anurag Menhdhekar and Chris Maeda and Cristina Lopes and Jean-Marc
+   Loingtier and John Irwin, "Aspect-Oriented Programming", ECOOP'97
+   http://www2.parc.com/csl/groups/sda/publications/papers/Kiczales-ECOOP97/for-web.pdf
+.. file = kiczales97aspectoriented.pdf
\ No newline at end of file

More information about the Pypy-commit mailing list