[pypy-svn] r20729 - pypy/dist/pypy/doc

Tue Dec 6 01:12:53 CET 2005

Author: mwh
Date: Tue Dec  6 01:12:53 2005
New Revision: 20729

Modified:
   pypy/dist/pypy/doc/translation-aspects.txt
Log:
massaging of the language of about the first half of translation-aspects.


Modified: pypy/dist/pypy/doc/translation-aspects.txt
==============================================================================

--- pypy/dist/pypy/doc/translation-aspects.txt	(original)
+++ pypy/dist/pypy/doc/translation-aspects.txt	Tue Dec  6 01:12:53 2005
@@ -10,7 +10,7 @@
 
 One of the goals of the PyPy project is it to have the memory and threading
 models flexible and changeable without having to manually reimplement the
-interpreter.  In fact, PyPy with the 0.7 and 0.8 releases contain code for
+interpreter.  In fact, PyPy by time of the 0.8 release contains code for
 memory management and threading models which allows experimentation without
 requiring early design decisions.  This document describes many details of the
 current state of the implementation of the memory object model, automatic
@@ -21,15 +21,13 @@
 Introduction
 ============
 
-
-
-The main emphasis of the PyPy project is that of integration: we want to make
-changing memory management and threading techniques possible while at the same
-time influencing the interpreter as little as possible. It is not the current
-goal to optimize the current approaches in extreme ways rather to produce solid
-implementations and to provide an environment where experiments with
-fundamentally different ways to implement these things is possible and
-reasonably easy. 
+The main emphasis of the PyPy project is that of flexible integration: we want
+to make changing memory management and threading techniques possible while at
+the same time influencing the source code of interpreter as little as
+possible. It is not the current goal to optimize the current approaches in
+extreme ways but rather to produce solid implementations and to provide an
+environment where experiments with fundamentally different ways to implement
+these things are possible and reasonably easy.
 
 
 The low level object model
@@ -38,19 +36,20 @@
 XXX proper references to translation.txt and dynamic-language-translation
 
 One important part of the translation process is *rtyping*. Before that step
-all objects in our flow graphs are annotated with types on the level of the
+all objects in our flow graphs are annotated with types at the level of the
 RPython type system which is still quite high-level and target-independent.
 During rtyping they are transformed into objects that match the model of the
 specific target platform. For C or C-like targets this model consists of a set
 of C-like types like structures, arrays and functions in addition to primitive
 types (integers, characters, floating point numbers). This multi-stage approach
-gives a lot of flexibility how a particular object is represented on the 
+gives a lot of flexibility in how a given object is represented at the 
 target's level. The RPython process can decide what representation to use based
 on the type annotation and on the way the object is used.
 
 In the following the structures used to represent RPython classes are described.
-There is one "vtable" per RPython class, with the following structure: A root
-class "object" has::
+There is one "vtable" per RPython class, with the following structure: The root
+class "object" has a vtable of the following type (expressed in a C-like 
+syntax)::
 
     struct object_vtable {
         struct object_vtable* parenttypeptr;
@@ -62,7 +61,8 @@
     }
 
 The structure members ``subclassrange_min`` and ``subclassrange_max`` are used
-for subclass checking. Every other class X, with parent Y, has the structure::
+for subclass checking (see below). Every other class X, with parent Y, has the
+structure::
 
     struct vtable_X {
         struct vtable_Y super;   // inlined
@@ -70,12 +70,12 @@
     }
 
 The extra class attributes usually contain function pointers to the methods
-of that class. In addition the class attributes (which are
-supported by the RPython object model) are stored there.
+of that class, although the data class attributes (which are supported by the
+RPython object model) are stored there.
 
 The type of the instances is::
 
-   struct object {       // for instance of the root class
+   struct object {       // for instances of the root class
        struct object_vtable* typeptr;
    }
 
@@ -94,47 +94,48 @@
 
 The way we do subclass checking is a good example of the flexibility provided
 by our approach: in the beginning we were using a naive linear lookup
-algorithm. Since subclass checking is quite common (it is also used to check
-whether an object is an instance of a certain class) we wanted to replace it
-with the more efficient relative numbering algorithm. This was a matter of just
-changing the appropriate code of the rtyping process to calculate the class-ids
-during rtyping and insert the necessary fields into the class structure. It
-would be similarly easy to switch to another implementation.
+algorithm. Since subclass checking is quite a common operation (it is also
+used to check whether an object is an instance of a certain class) we wanted
+to replace it with the more efficient relative numbering algorithm. This was a
+matter of just changing the appropriate code of the rtyping process to
+calculate the class-ids during rtyping and insert the necessary fields into
+the class structure. It would be similarly easy to switch to another
+implementation.
 
 XXX reference to the paper
 
-ID hashes
----------
+Identity hashes
+---------------
 
 In the RPython type system class instances can be used as dictionary keys using
 a default hash implementation based on identity which in practise is
-implemented using the memory address. This is similar to how standard Python
-behaves if no user-defined hash function is present. The annotator keeps track
-for which classes this hashing is ever used.
+implemented using the memory address. This is similar to how CPython behaves if
+no user-defined hash function is present. The annotator keeps track of the
+classes for which this hashing is ever used.
 
 One of the peculiarities of PyPy's approach is that live objects are analyzed
-by our translation toolchain. This leads to the presence of instances of user
+by our translation toolchain. This leads to the presence of instances of RPython
 classes that were built before the translation started. These are called
-prebuilt-constants (PBCs for short). During rtyping, these instances have to be
+"pre-built constants" (PBCs for short). During rtyping, these instances must be
 converted to the low level model. One of the problems with doing this is that
 the standard hash implementation of Python is to take the id of an object, which
 is just the memory address. If the RPython program explicitely stores the hashes
-of PBCS somewhere (for example in the implementation of a data structure) then
-the stored hash value would not match the value of the object's address after
-translation anymore.
+of a PBC somewhere (for example in the implementation of a data structure) then
+the stored hash value would be extremely unlikely to match the value of the object's
+address after translation.
 
-To prevent this the following strategy is used: for every class with instances
-that are hashed somewhere in the program (either when storing them in a
+To prevent this the following strategy is used: for every class whose instances
+are hashed somewhere in the program (either when storing them in a
 dictionary or by calling the hash function) an extra field is introduced in the
 structure used for the instances of that class. For PBCs of such a class this
-field is used to store the memory address of the original object, new objects
+field is used to store the memory address of the original object and new objects
 have this field initialized to zero. The hash function for instances of such a
 class stores the object's memory address in this field if it is zero. The
 return value of the hash function is the content of the field. This means that
 instances of such a class that are converted PBCs retain the hash values they
 had before the conversion whereas new objects of the class have their memory
-address as hash values. A strategy along these lines will be required if we ever 
-switch to using a copying garbage collector.
+address as hash values. A strategy along these lines would in any case have been
+required if we ever switch to using a copying garbage collector.
 
 Cached functions with PBC arguments
 ------------------------------------
@@ -153,7 +154,7 @@
 One example of the flexibility the RTyper provides is how we deal with lists.
 Based on information gathered by the annotator the RTyper chooses between two
 different list implementations. If a list never changes its size after creation
-a low-level array is used directly. For lists which get resized a
+a low-level array is used directly. For lists which might be resized a
 representation consisting of a structure with a pointer to an array is used and
 overallocation is performed.
 
@@ -187,23 +188,23 @@
 Using the Boehm garbage collector
 -----------------------------------
 
-At the moment there are two different garbage collectors implemented in the C
+Currently there are two different garbage collectors implemented in the C
 backend (which is the most complete backend right now). One of them uses the
 existing Boehm-Demers-Weiser garbage collector [BOEHM]_. For every memory
 allocating operation in a low level flow graph the C backend introduces a call
 to a function of the boehm collector which returns a suitable amount of memory.
-Since the C backends has a lot of information avaiable about the data structure
+Since the C backend has a lot of information avaiable about the data structure
 being allocated it can choose the memory allocation function out of the Boehm
-API that fits best. For example for objects that do not contain references to
+API that fits best. For example, for objects that do not contain references to
 other objects (e.g. strings) there is a special allocation function that
 signals to the collector that it does not need to consider this memory when
 tracing pointers.
 
 Using the Boehm collector has disadvantages as well. The problems stem from the
 fact that the Boehm collector is conservative which means that it has to
-consider every word in memory to be a potential pointer. Since PyPy's toolchain
+consider every word in memory as a potential pointer. Since PyPy's toolchain
 has complete knowledge of the placement of data in memory we can generate an
-exact garbage collector that considers only pointers.
+exact garbage collector that considers only genuine pointers.
 
 Using a simple reference counting garbage collector
 -----------------------------------------------------
@@ -215,7 +216,7 @@
 every reference decrement operations a check is performed whether the reference
 count has dropped to zero. If this is the case the memory of the object will be
 reclaimed after the references counts of the objects the original object
-references are decremented as well.
+refers to are decremented as well.
 
 The current placement of reference counter updates is far from optimal: The
 reference counts are updated much more often than theoretically necessary (e.g.