[pypy-dev] needed 18 month plan of what we would do in developing PyPy if we got funding.

Armin Rigo arigo at tunes.org
Sun Sep 7 20:29:05 CEST 2003

Hello again,

Sorry, forgot to attach the document.

-------------- next part --------------
Draft of a PyPy work plan

1. The PyPy Interpreter

The goal is to make a complete Python interpreter that runs under any
existing Python implementation.

 a) develop and complete the PyPy interpreter itself, as a regular
Python program, until it contains all the parts of CPython that we don't
want to move to (b). Further investigate the unorthodox multimethod
concepts that the standard object space is based on, and how to hook in
the bytecode compiler.

 b) translate all other parts of CPython into regular Python libraries.
These ones should also work without PyPy, being just plain-Python
replacements for existing CPython functionality. This includes the
bytecode compiler.

2. Translation of RPython

The goal is to be able to translate arbitrary RPython source code (e.g. 
the one produced in 1a) into low-level code (C, Pyrex, Java, others). 
This includes making a stand-alone, not-PyPy-related tool for general
optimization of arbitrary but suitably restricted Python application or
parts thereof.

 a) analyse code to produce the relevant typing information. Investigate
if we can use the annotation object space only or if additional
AST-based control flow analysis is needed.

 b) produce low-level code out of the data gathered in (a). Again
investigate how this is best done (AST-guided translation or
reverse-engeneering of the low-level control flow gathered by the
annotation object space). Compare different low-level environment that
we could target (C, Pyrex, others?).

3. Bootstrapping PyPy

The goal is to put (1) and (2) together.

 a) investigate the particular problems specific to the global
translation of PyPy, as opposed to general to any RPython program. 
According to the requirements and insights of (2) we will probably have
to redesign specific parts of PyPy, e.g. make the various
app-level/interp-level interface designs converge.

 b) build the low-level-specific run-time components of PyPy, most
notably the object layout, the memory management, possibly threading
support, and multimethod dispatch. Here, if we target C code, important
parts can be directly re-used from CPython.

4. High-performance PyPy-Python

The goal is to optimize (3) in possibly various ways, building on its
flexibility to go beyond CPython.

 a) develop several object implementations for the same types, as
explicitely allowed by the standard object space, and develop heuristics
to switch between implementations during execution.

 b) identify which optimizations would benefit from support from the
translator (2). These are the optimizations not easily available to
CPython because they would require large-scale code rewrites.

 c) for each issue, work on several solutions when no one is obviously
better than the other ones. The meta-programming underlying (b) --
namely the work on the translator instead of on the resulting code -- is
what gives us the possibility of actually implementing several very
different schemes.

 d) integrate existing technology that traditionally depended on closely
following CPython's code base, notably Psyco and Stackless. Rewrite each
one as a meta-component that hooks into the translator (2) plus a
dedicated run-time component (3b). Further develop these technologies
based on the results gathered in (c), e.g. identify when these
technologies would guide specific choices among the solutions developed 
in (a) and (b).

Annex to (a)

Some major uses for several implementations of the built-in types:

 * dictionaries as hash-table vs. plain (key, value) lists vs. b-trees, 
or with string-only or integer-only keys. Dictionaries with specific 
support for "on-change" callbacks (useful for Psyco).

 * strings as plain immutable memory buffers vs. immutable but more 
complex data structures (see functional languages) vs. internally 
mutable data structures (e.g. Psyco's concatenated strings)

 * ints as machine words vs. two machine words vs. internal longs vs. 
external bignum library (investigate if completely unifying ints and
longs is possible in the Python language at this stage).

 * etc. (lists as range() or chained lists, ...)

The above are mostly independent from any particular low-level run-time 

Annex to (b)

Here are some of the main issues and tricks. Note that compatibility
with legacy C extensions can be acheived by choosing, for each of the
following issues, the same one as CPython did.

 * object layout and memory management strategy or strategies, e.g.
reference counting vs. Boehm garbage collection vs. our own. Includes
speed vs. data size trade-offs.

 * code size vs. speed trade-offs (e.g. whether the final interpreter
should still include compact precompiled bytecode or be completely
translated into C).

 * the complex issue of threading (global interpreter lock vs.

 * multimethod dispatching

 * pointer tagging, e.g. encoding an integer object as a pointer with a 
special value instead of a real pointer to a data structure representing 
the integer.

The above are mostly specific to a particular low-level run-time.

5. Low-level targets, tools and releases

The goal is to identify, among those low-level targets that are in
widespread use (e.g. workstation usage vs. web server vs.
high-performance computing vs. memory-starved hand-held device; C/Unix
vs. Java vs. .NET environment), which ones would benefit most from a
high-performance Python interpreter. For each of these, focus will be
given to:

 a) develop the translation process, run-time and those optimizations
that depend on low-level details.

 b) design interfaces for extension modules. Some can be very general
(e.g. a pure Python one that should allow generic third-party code to
hook into the PyPy interpreter source code without worrying about the
translation process). Others depend on the low-level environment and on
the choices made for the issues of (4).

 c) combine different solutions for the different issues discussed in
(4). Gather statistics with real-work Python application. Compare the
results. This is where the flexibility of the whole project is vital.
Typically, very different trade-offs need to be made on different

 d) most importantly, develop tools to easily allow third-parties to
repeat (c) in their own domain and build their own tailored versions of

 e) release a few official versions pre-tailored for various common
environments. Develop in particular a version whose goal is to simulate
the existing CPython interpreter to support legacy extension modules. 
Investigate if the PyPy core can make internal choices that are very
different from CPython's without sacrifying legacy extension modules

6. Infrastructure

The goal is to address the development and maintenance issues.

 a) PyPy's own development needs an infrastructure that must
continuously be kept up-to-date and further developed.

 b) write tests. All parts of PyPy should be extensively covered by
stress tests. Investigate the use of test-coverage analysers.

 c) investigate means of keeping PyPy in sync with the future
developments of CPython, e.g. ways to relate pieces of PyPy source and
pieces of CPython source. Look for existing solutions.

7. Extension of PyPy

The goal is to add functionalities in PyPy that are not present in
existing Python implementations. This is an open goal. We only list a
few promizing directions:

 a) build alternate object spaces provides features that are essentially
language-transparent, e.g. distributed computing (via a network proxy
object space), compatibility layers (e.g. a Python-1.5.2-compliant
object space), persistance (via a persistant object space).

 b) build language features that rely on translator support (2), i.e. 
which can be turned on or off during the production of individual 
versions of PyPy, e.g. Stackless and continuations.

 c) work on the interaction between the compiler and the main loop to 
allow custom opcodes to be defined, generated by the compiler, and 
interpreted by the main loop, thus allowing syntactic extension of the 
language by user code

 d) conversely, develop interfaces to use object spaces without the main
loop to provide Python-like object semantics to other programming
languages, using their own syntax and execution environment, e.g. Java.

More information about the Pypy-dev mailing list