[pypy-svn] r29018 - pypy/dist/pypy/doc

mwh at codespeak.net mwh at codespeak.net
Tue Jun 20 18:20:15 CEST 2006


Author: mwh
Date: Tue Jun 20 18:20:13 2006
New Revision: 29018

Modified:
   pypy/dist/pypy/doc/dynamic-language-translation.txt
   pypy/dist/pypy/doc/garbage_collection.txt
   pypy/dist/pypy/doc/translation.txt
Log:
part of a rewrite of translation.txt.
not finished yet...


Modified: pypy/dist/pypy/doc/dynamic-language-translation.txt
==============================================================================
--- pypy/dist/pypy/doc/dynamic-language-translation.txt	(original)
+++ pypy/dist/pypy/doc/dynamic-language-translation.txt	Tue Jun 20 18:20:13 2006
@@ -624,7 +624,7 @@
 For more information see `The Interplevel Back-End`_ in the reference
 documentation.
 
-.. _`The Interplevel Back-End`: translation.html#the-interplevel-back-end
+.. _`The Interplevel Back-End`: geninterp.html
 
 
 
@@ -904,7 +904,7 @@
 `pypy/annotation/model.py`_ and described in the `annotator reference
 documentation`_ [TR]_.
 
-.. _`annotator reference documentation`: translation.html#annotator
+.. _`annotator reference documentation`: annotationref.html
 
 
 Rules
@@ -2354,12 +2354,12 @@
 .. _`Hindley-Milner`: http://en.wikipedia.org/wiki/Hindley-Milner_type_inference
 .. _SSA: http://en.wikipedia.org/wiki/Static_single_assignment_form
 .. _LLVM: http://llvm.org/
-.. _`RTyper reference`: translation.html#rpython-typer
+.. _`RTyper reference`: rtyper.html
 .. _`GenC back-end`: translation.html#genc
 .. _`LLVM back-end`: translation.html#llvm
 .. _JavaScript: http://www.ecma-international.org/publications/standards/Ecma-262.htm
 .. _Squeak: http://www.squeak.org/
-.. _lltype: translation.html#low-level-types
+.. _lltype: rtyper.html#low-level-types
 .. _`post on the Perl 6 compiler mailing list`: http://www.nntp.perl.org/group/perl.perl6.compiler/1107
 
 .. include:: _ref.txt

Modified: pypy/dist/pypy/doc/garbage_collection.txt
==============================================================================
--- pypy/dist/pypy/doc/garbage_collection.txt	(original)
+++ pypy/dist/pypy/doc/garbage_collection.txt	Tue Jun 20 18:20:13 2006
@@ -116,7 +116,7 @@
 id to find pointers that are contained in the object and to find out the size
 of the object.
 
-.. _`low-level type`: translation.html#low-level-type
+.. _`low-level type`: rtyper.html#low-level-type
 
 .. _above:
 
@@ -266,7 +266,7 @@
         of ``addr`` and decrement that of the object stored at the old address
         in ``addr_to``.
 
-.. _LLInterpreter: translation.html#llinterpreter
+.. _LLInterpreter: rtyper.html#llinterpreter
 
 Description of Implemented Garbage Collectors
 ---------------------------------------------

Modified: pypy/dist/pypy/doc/translation.txt
==============================================================================
--- pypy/dist/pypy/doc/translation.txt	(original)
+++ pypy/dist/pypy/doc/translation.txt	Tue Jun 20 18:20:13 2006
@@ -5,50 +5,88 @@
 .. contents::
 .. sectnum::
 
-This document describes the tool chain that we developed to analyze and
-"compile" RPython_ programs (like PyPy itself) to various lower-level
-languages.
+This document describes the tool chain that we have developed to analyze
+and "compile" RPython_ programs (like PyPy itself) to various target
+platforms.
 
 .. _RPython: coding-guide.html#restricted-python
 
+It consists of three broad sections: a slightly simplified overview, a
+brief introduction to each of the major components of our tool chain and
+then a more comprehensive section describing how the pieces fit together.
+If you are reading this document for the first time, the Overview_ is
+likely to be most useful, if you are trying to refresh your PyPy memory
+then the `How It Fits Together`_ is probably what you want.
 
 Overview
 ========
 
-XXX very preliminary documentation!
-
-The module `translator.py`_ is the common entry point to the various parts
-of the translation process.  It is available as an interactive utility to
-`play around`_.
-
-There is a graph_ that gives an overview over the different steps of the
-translation process. These are:
-
-1. The complete program is imported.  If needed, extra initialization is
-   performed.  Once this is done, the program must be present in memory as
-   a form that is "static enough" in the sense of RPython_.
-
-2. The `Flow Object Space`_ processes the input program, turning each
-   function independently into a `control flow graph`_ data structure
-   recording sequences of basic operations in
-   static single assignment form `SSA`_.
-
-3. Optionally, the Annotator_ performs global type inference on the
-   control flow graphs.  Each variable gets annotated with an inferred
-   type.
-
-4. The `RPython typer`_ can use the high-level types inferred by the
-   Annotator to turn the operations in the control flow graphs into
-   low-level operations over low-level types (close to the C types: struct,
-   array, pointer...).
-
-5. One of the Code Generators (back-ends) turns the
-   optionally annotated/typed flow graphs and produces a source file in
-   a lower-level language: C_, LLVM_, `Common Lisp`_, Pyrex_, JavaScript_,
-   Squeak_, CLI_ (.NET), or `Python again`_ (this is used in PyPy to turn
-   sufficiently RPythonic app-level code into interp-level code).
-
-6. This lower-level source file is compiled to produce an executable.
+The job of translation tool-chain is to translate RPython_ progams into an
+efficient version of that progam for one of various target platforms,
+generally one that is considerably lower-level than Python.  It divides
+this task into several steps, and the purpose of this document is to
+introduce them.
+
+As of the 0.9 release, RPython_ programs can be translated into the
+following languages/platforms: C/POSIX, LLVM/POSIX, CLI/.NET, Squeak and
+Common Lisp (in addition, there's `a backend`_ that translates
+`application-level`_ into `interpreter-level`_ code, but this is a special
+case in several ways).
+
+.. _`a backend`: geninterp.html
+.. _`application-level`: coding-guide.html#application-level
+.. _`interpreter-level`: coding-guide.html#interpreter-level
+
+
+The choice of the target platform affects the process somewhat, but to
+start with we describe the process of translating an RPython_ program into
+C (which is the default and original target).
+
+Most of the steps in the translation process operate on control flow
+graphs, which are produced from functions in the input program, analyzed,
+transformed and then translated to the target language.
+
+It is helpful to consider translation as being made up of the following
+steps:
+
+1. The complete program is imported, which means extra initialization can
+   be performed.  Once this is done, the program must be present in memory
+   as a form that is "static enough" in the sense of RPython_.
+
+2. The Annotator_ performs a global analysis starting from an specified
+   entry point to deduce type and other information about what each
+   variable can contain at run-time, building flow graphs using the `Flow
+   Object Space`_ as it encounters them.
+
+3. The `RPython Typer`_ (or RTyper) uses the high-level information
+   inferred by the Annotator to turn the operations in the control flow
+   graphs into low-level operations.
+
+4. After RTyping there are two, rather different, `optional
+   transformations`_ which can be applied -- the "backend optimizations"
+   which are make the resulting program go faster, and the "stackless
+   transform" which transforms the program into a form of continuation
+   passing style which allows the implementation of coroutines and other
+   forms of non-standard control flow.
+
+5. The next step is `preparing the graphs for source generation`_, which
+   involves computing the names that the various functions and types in
+   the program will have in the final source and applying transformations
+   which insert explicit exception handling and memory management
+   operations.
+
+6. The `C backend`_ (colloquially known as "GenC") produces a number of C
+   source files (as noted above, we are ignoring the other backends for
+   now).
+
+7. These source files are compiled to produce an executable.
+
+(although these steps are not quite as distinct as you might think from
+this presentation).
+
+There is an `interactive interface`_ called `translatorshell.py`_ to the
+translation process which allows you to interactively work through these
+stages.
 
 .. _`SSA`: http://en.wikipedia.org/wiki/Static_single_assignment_form
 .. _`translator.py`: http://codespeak.net/pypy/dist/pypy/translator/translator.py
@@ -61,45 +99,65 @@
 .. _Squeak: http://codespeak.net/pypy/dist/pypy/translator/squeak/
 .. _CLI: http://codespeak.net/pypy/dist/pypy/translator/cli/
 .. _graph: image/translation.pdf
+.. _`interactive interface`: getting-started.html#try-out-the-translator
+.. _`translatorshell.py`: http://codespeak.net/pypy/dist/pypy/bin/translatorshell.py
 
+The Flow Model
+==============
 
-.. _Annotator:
+The `Flow Object Space`_ is described in detail in the `document
+describing object spaces`_, but as the data structures produced by the
+Flow Object Space are the basic data structures of the translation
+process, we quickly summarize them here.
 
-The annotation pass
-===================
+All these types are defined in `pypy.objspace.flow.model`_ (which is the
+most imported module in the PyPy source base, to reinforce the point).
+
+The flow graph of a function is represented by the class ``FunctionGraph``.
+It contains a reference to a collection of ``Block``\ s connected by ``Link``\ s.
+
+A ``Block`` contains a list of ``SpaceOperation``\ s.  Each ``SpaceOperation``
+has an ``opname`` and a list of ``args`` and ``result``, which are either
+``Variable``\ s or ``Constant``\ s.
 
-(INCOMPLETE DRAFT)
+We have an extremely useful PyGame viewer, which allows you to visually
+inspect the graphs at various stages of the translation process (very
+useful to try to work out why things are breaking).  It looks like this:
 
-We describe below how a control flow graph can be "annotated" to
-discover the types of the objects.  This annotation pass is a form of
-type inference.  It is done after control flow graphs are built by the
-FlowObjSpace, but before these graphs are translated into low-level code
-(e.g. C/Lisp/Pyrex).
+   .. image:: image/bpnn_update.png
 
+.. _`document describing object spaces`: objspace.html
+.. _`pypy.objspace.flow.model`: http://codespeak.net/pypy/dist/pypy/objspace/flow/model.py
 
-Model
-------------------------
+.. _Annotator:
+
+The Annotation Pass
+===================
+
+We describe briefly below how a control flow graph can be "annotated" to
+discover the types of the objects.  This annotation pass is a form of type
+inference.  It operates on the control flow graphs built by the Flow
+Object Space.
+
+For a more comprehensive description of the annotation process, see
+sections XXX of `Compiling Dynamic Language Implementations`_.
 
 The major goal of the annotator is to "annotate" each variable that
 appears in a flow graph.  An "annotation" describes all the possible
 Python objects that this variable could contain at run-time, based on a
 whole-program analysis of all the flow graphs --- one per function.
 
-An "annotation" is an instance of ``SomeObject``.  There are subclasses
-that are meant to represent specific families of objects.  Note that
-these classes are all meant to be instantiated; the classes ``SomeXxx``
-themselves are not the annotations.
+An "annotation" is an instance of a subclass of ``SomeObject``.  Each
+subclass that represents a specific family of objects.
 
 Here is an overview (see ``pypy.annotation.model``):
 
-* ``SomeObject`` is the base class.  An instance ``SomeObject()``
-  represents any Python object.  It is used for the case where we don't
-  have enough information to be more precise.  In practice, the presence
-  of ``SomeObject()`` means that we have to make the annotated source code
-  simpler or the annotator smarter.
+* ``SomeObject`` is the base class.  An instance of ``SomeObject()``
+  represents any Python object, and as such usually means that the input
+  program was not fully RPython.
 
-* ``SomeInteger()`` represents any integer.
-  ``SomeInteger(nonneg=True)`` represent a non-negative integer (``>=0``).
+* ``SomeInteger()`` represents any integer.  ``SomeInteger(nonneg=True)``
+  represent a non-negative integer (``>=0``).
 
 * ``SomeString()`` represents any string; ``SomeChar()`` a string of
   length 1.
@@ -109,331 +167,19 @@
   annotations.  For example, ``SomeTuple([SomeInteger(), SomeString()])``
   represents a tuple with two items: an integer and a string.
 
+(other ``SomeXxx`` classes are described in the `annotator reference`).
+
 There are more complex subclasses of ``SomeObject`` that we describe in
 more details below.
 
-All the ``SomeXxx`` instances can optionally have a ``const`` attribute,
-which means that we know exactly which Python object the Variable will
-contain.
-
-All the ``SomeXxx`` instances are supposed to be immutable.  The
-annotator manages a dictionary mapping Variables (which appear in flow
-graphs) to ``SomeXxx`` instances; if it needs to revise its belief about
-what a Variable can contain, it does so by updating this dictionary, not
-the ``SomeXxx`` instance.
-
-
-
-Annotator
---------------------------
-
-The annotator itself (``pypy.translator.annrpython``) works by
-propagating the annotations forward in the flow graphs, starting at some
-entry point function, possibly with explicitly provided annotations
-about the entry point's input arguments.  It considers each operation in
-the flow graph in turn.  Each operation takes a few input arguments
-(Variables and Constants) and produce a single result (a Variable).
-Depending on the input argument's annotations, an annotation about the
-operation result is produced.  The exact rules to do this are provided
-by the whole ``pypy.annotation`` subdirectory, which defines all the
-cases in detail according to the R-Python semantics.  For example, if
-the operation is 'v3=add(v1,v2)' and the Variables v1 and v2 are
-annotated with ``SomeInteger()``, then v3 also receives the annotation
-``SomeInteger()``.  So for example the function::
-
-    def f(n):
-        return n+1
-
-corresponds to the flow graph::
-
-    start ----------.
-                    |
-                    V
-           +-------------------+
-           |  v2 = add(v1, 1)  |
-           +-------------------+
-                    |
-                    `---> return block
-
-If the annotator is told that v1 is ``SomeInteger()``, then it will
-deduce that v2 (and hence the function's return value) is
-``SomeInteger()``.
-
-.. _above:
-
-This step-by-step annotation phase proceeds through all the operations
-in a block, and then along the links between the blocks of the flow
-graph.  If there are loops in the flow graph, then the links will close
-back to already-seen blocks, as in::
-
-    def g(n):
-        i = 0
-        while n:
-            i = i + n
-            n = n - 1
-        return i
-
-whose flow graph is::
-
-    start -----.
-               | n1 0
-               V
-           +-------------------+
-           |   input: n2 i2    |
-           |  v2 = is_true(n2) | <-----------.
-           +-------------------+       m3 j3 |
-               |             |               |
-               |ifFalse      |ifTrue         |
-    return <---' i2          | n2 i2         |
-                             V               |
-                    +--------------------+   |
-                    |   input: n3 i3     |   |
-                    |  j3 = add(i3, n3)  |   |
-                    |  m3 = sub(n3, 1)   |---'
-                    +--------------------+
-
-Be sure to follow the variable renaming that occurs systematically
-across each link in a flow graph.  In the above example the Variables
-have been given names similar to the name of the original variables in
-the source code (the FlowObjSpace tries to do this too) but keep in mind
-that all Variables are different: n1, n2, i2, v2, n3, i3, j3, m3.
-
-Assume that we call the annotator with an input annotation of
-``SomeInteger()`` for n1.  Following the links from the start, the
-annotator will first believe that the Variable i2, whose value comes
-from the constant 0 of the first link, must always be zero.  It will
-thus use the annotation ``SomeInteger(const=0)`` for i2.  Then it will
-propagate the annotations through both blocks, and find that v2 is
-``SomeBool()`` and all other variables are ``SomeInteger()``.  In
-particular, the annotation of j3 is different from the annotation of the
-Variable i2 into which it is copied (via the back-link).  More
-precisely, j3 is ``SomeInteger()`` but i2 is the more specific
-``SomeInteger(const=0)``.  This means that the assumption that i2 must
-always be zero is found to be wrong.  At this point, the annotation of
-i2 is *generalized* to include both the existing and the new annotation.
-(This is the purpose of ``pypy.annotation.model.unionof()``).  Then
-these more general annotations must again be propagated forward.
-
-This process of successive generalizations continues until the
-annotations stabilize.  In the above example, it is sufficient to
-re-analyse the first block once, but in general it can take several
-iterations to reach a fixpoint.  Annotations may also be propagated from
-one flow graph to another and back repeatedly, across ``call``
-operations.  The overall model should ensure that this process
-eventually terminates under reasonable conditions.  Note that as long as
-the process is not finished, the annotations given to the Variables are
-wrong, in the sense that they are too specific; at run-time, the
-Variables will possibly contain Python objects outside the set defined
-by the annotation, and the annotator doesn't know it yet.
-
-
-Description of the available types
------------------------------------------------
-
-The reference and the details for the annotation model is found in the
-module ``pypy.annotation.model``.  We describe below the issues related
-to the various kinds of annotations.
-
-
-Simple Types
-++++++++++++
-
-``SomeInteger``, ``SomeBool``, ``SomeString``, ``SomeChar`` all stands
-for the obvious corresponding set of immutable Python objects.
-
-
-Tuples
-++++++
-
-``SomeTuple`` only considers tuples of known length.  We don't try to
-handle tuples of varying length (the program should use lists instead).
-
-
-Lists and Dictionaries
-++++++++++++++++++++++
-
-``SomeList`` stands for a list of homogeneous type (i.e. all the
-elements of the list are represented by a single common ``SomeXxx``
-annotation).
-
-``SomeDict`` stands for a homogeneous dictionary (i.e. all keys have
-the same ``SomeXxx`` annotation, and so have all values).
-
-These types are mutable, which requires special support for the
-annotator.  The problem is that in code like::
-
-   lst = [42]
-   update_list(lst)
-   value = lst[0]
-
-the annotation given to ``value`` depends on the order in which the
-annotator progresses.  As ``lst`` is originally considered as a list
-of ``SomeInteger(const=42)``, it is possible that ``value`` becomes
-``SomeInteger(const=42)`` as well if the analysis of ``update_list()``
-is not completed by the time the third operation is first considered.
-
-
-To solve this problem, each ``SomeList`` or ``SomeDict`` is linked to
-so-called *list-definition* or *dict-definition*
-(``pypy.annotation.listdef.ListDef``,
-``pypy.annotation.dictdef.DictDef``).  The list definitions and dict
-definitions contain information about the type of list items, dict
-keys and values respectively.
-
-At each creation point, i.e. each 'newlist' or 'newdict', a
-``SomeList`` or ``SomeDict`` is created with a fresh definition object
-if its the first time we annotate this operation, otherwise the
-definition setup (which has been cached) the first time is reused.
-While proceeding the annotator also records on the definition objects
-all flow graph positions where values are read from the list or dict.
-
-For example, in code like::
-
-   lst = [42]
-   f(lst[0])
-   lst.append(43)
-
-the definition associated with the list at creation time (first line)
-represents list whose items are all constants equal to 42; when a
-value is read from the list (second line) this position is recorded on
-the definition; when the ``append(43)`` call is then found, the item
-type information in the definition is generalized (in this case to
-general list of integers) and the annotator schedule all the so far
-recorded read positions for reflowing, in order to keep the
-annotations consistent.
-
-Our model is not sensitive to timing: it doesn't know that the same
-list object may contain different items at different times.  It only
-computes how general the items in the list must be to cover all cases.
-
-For initially empty lists, as created by ``lst = []``, we build a list
-whose items have the annotation ``SomeImpossibleValue``.  This is an
-annotation that denotes that no Python object at all can possibly
-appear here at run-time.  It is the least general annotation.  The
-rationale is that::
-
-   lst = []
-   oups = lst[0]
-
-will give the variable ``oups`` the annotation
-``SomeImpossibleValue``, which is reasonable given that no concrete
-Python object can ever be put in ``oups`` at run-time.  In a more
-usual example::
-
-   lst = []
-   lst.append(42)
-
-the list is first built with ``SomeImpossibleValue`` items, and then
-the factory is generalized to produce a list of
-``SomeInteger(const=42)``.  With this "impossible object" trick we
-don't have to do anything special about empty lists.
-
-When the annotator has to unify different list or dict annotations it
-effectively unifies the corresponding definitions, the item types are
-generalized as necessary and the union of the read position sets is
-used, also things are internally setup in such a way that the involved
-definitions now are considered interchangeably the same for the rest of
-the process (that means that union for lists and dicts is a not
-reversible operation).  If the new item type is more general reflowing
-from all the read positions is also scheduled.
-
-User-defined Classes and Instances
-++++++++++++++++++++++++++++++++++
-
-``SomeInstance`` stands for an instance of the given class or any
-subclass of it.  For each user-defined class seen by the annotator, we
-maintain a ClassDef (``pypy.annotation.classdef``) describing the
-attributes of the instances of the class; essentially, a ClassDef gives
-the set of all class-level and instance-level attributes, and for each
-one, a corresponding ``SomeXxx`` annotation.
-
-Instance-level attributes are discovered progressively as the annotation
-progresses.  Assignments like::
-
-   inst.attr = value
-
-update the ClassDef of the given instance to record that the given
-attribute exists and can be as general as the given value.
-
-For every attribute, the ClassDef also records all the positions where
-the attribute is *read*.  If, at some later time, we discover an
-assignment that forces the annotation about the attribute to be
-generalized, then all the places that read the attribute so far are
-marked as invalid and the annotator will have to restart its analysis
-from there.
-
-The distinction between instance-level and class-level attributes is
-thin; class-level attributes are essentially considered as initial
-values for instance-level attributes.  Methods are not special in this
-respect, except that they are bound to the instance (i.e. ``self =
-SomeInstance(cls)``) when considered as the initial value for the
-instance.
-
-The inheritance rules are as follows: the union of two ``SomeInstance``
-annotations is the ``SomeInstance`` of the most precise common base
-class.  If an attribute is considered (i.e. read or written) through a
-``SomeInstance`` of a parent class, then we assume that all subclasses
-also have the same attribute, and that the same annotation applies to
-them all (so code like ``return self.x`` in a method of a parent class
-forces the parent class and all its subclasses to have an attribute
-``x``, whose annotation is general enough to contain all the values that
-all the subclasses might want to store in ``x``).  However, distinct
-subclasses can have attributes of the same names with different,
-unrelated annotations if they are not used in a general way through the
-parent class.
-
-
-Prebuilt Constants and instance methods
-+++++++++++++++++++++++++++++++++++++++
-
-Constants in the flowgraph are annotated with a corresponding
-``SomeXxx`` instance with 'const' attribute set to their value.
-
-Constant instances of user-defined classes, callables (which include
-functions but also class types themself) and staticmethod are treated
-specially.  Constant user-defined class instances can declare themself
-immutable by having a '_freeze_' method returning true, otherwise they
-will be assumed mutable and be annotated with usual ``SomeInstance``
-annotation without 'const' set.
-
-For user-defined constant instances that declared themself immutable,
-staticmethods and other callables ``SomePBC`` is used (PBC = pre-built
-constant). Its instances contain a 'prebuiltinstances' dictionary. For
-the normal case and single value ``x`` this will be set to ``{x :
-True}``. For a single value the 'const' attribute will also be set.
-
-The union of ``SomePBC`` instances will result in an instance with the
-merge of the original dictionaries.  So for example a dictionary
-pointing to functions, will usually have as its value annotation such
-a ``SomePBC`` with a 'prebuiltinstances' dict having all the functions
-as keys.
-
-For a large part of operations when encountering ``SomeXxx`` with
-'const' set the annotator will do constant propagation and produce
-results with also 'const' set. This also means that based on 'const'
-truth values the annotator will not flow into code that is not
-reachable given global constant values. A later graph transformation
-will remove such dead code.
-
-XXX None, how methods annotation storage work
-
-
-XXX complete
-
-
-Built-in functions and methods
-++++++++++++++++++++++++++++++
-
-(to be completed)
-
-
-Others
-++++++
-
-(to be completed)
-
+All the ``SomeXxx`` instances are immutable.  If the annotator needs to
+revise its belief about what a Variable can contain, it does so creating a
+new annotation, not mutating the existing one.
 
+The result of the annotation pass is essentially a large dictionary
+mapping ``Variable``\ s to annotations.
 
+.. _`Compiling Dynamic Language Implementations`: dynamic-language-translation.html
 
 .. _`RPython typer`:
 
@@ -442,21 +188,22 @@
 
 http://codespeak.net/pypy/dist/pypy/rpython/
 
-
-Overview
---------
-
-The RPython Typer is the bridge between the Annotator_ and the low-level code
-generators.  The annotator computes types (or "annotations") that are
-high-level, in the sense that they describe RPython types like lists or
-instances of user-defined classes.  In general, though, to emit code we need
-to represent these high-level annotations into the low-level model of the
-target language; for C, this means structures and pointers and arrays.  The
-Typer both determines the appropriate low-level type for each annotation, and
-tries to replace *all* operations in the control flow graphs with one or a few
-low-level operations.  Just like low-level types, there is only a fairly
-restricted set of low-level operations, along the lines of reading or writing
-from or to a field of a structure.
+The RPython Typer is the bridge between the Annotator_ and the code
+generator.  The information computed by the annotator is high-level, in
+the sense that it describe RPython types like lists or instances of
+user-defined classes.
+
+The RTyper is the first place where the choice of backend makes a
+difference; as outlined above we are assuming that ANSI C is the target.
+
+In general, though, to emit code we need to represent these high-level
+annotations into the low-level model of the target language; for C, this
+means structures and pointers and arrays.  The Typer both determines the
+appropriate low-level type for each annotation, and tries to replace *all*
+operations in the control flow graphs with one or a few low-level
+operations.  Just like low-level types, there is only a fairly restricted
+set of low-level operations, along the lines of reading or writing from or
+to a field of a structure.
 
 In theory, this step is optional; some code generators might be able to read
 directly the high-level types.  However, we expect that case to be the
@@ -466,6 +213,9 @@
 contain very few operations, which makes the job of the code generators much
 simpler.
 
+For more information, see the `documentation for the RTyper`_.
+
+.. _`documentation for the RTyper`: rtyper.html
 
 Example: Integer operations
 ---------------------------
@@ -492,501 +242,40 @@
 ``int_add`` is that code generators no longer have to worry about what kind
 of addition (or concatenation maybe?) it means.
 
+.. _`optional transformations`:
 
-The process in more details
----------------------------
+The Optional Transformations
+============================
 
-The RPython Typer has a structure similar to that of the Annotator_: both
-consider each block of the flow graphs in turn, and perform some analysis on
-each operation.  In both cases the analysis of an operation depends on the
-annotations of its input arguments.  This is reflected in the usage of the same
-``__extend__`` syntax in the source files (compare e.g.
-`annotation/binaryop.py`_ and `rpython/rint.py`_).
-
-The analogy stops here, though: while it runs, the Annotator is in the middle
-of computing the annotations, so it might need to reflow and generalize until
-a fixpoint is reached.  The Typer, by contrast, works on the final annotations
-that the Annotator computed, without changing them, assuming that they are
-globally consistent.  There is no need to reflow: the Typer considers each
-block only once.  And unlike the Annotator, the Typer completely modifies the
-flow graph, by replacing each operation with some low-level operations.
-
-In addition to replacing operations, the RTyper creates a ``concretetype``
-attribute on all Variables and Constants in the flow graphs, which tells code
-generators which type to use for each of them.  This attribute is a
-`low-level type`_, as described below.
-
-
-Representations
----------------
-
-Representations -- the Repr classes -- are the most important internal classes
-used by the RTyper.  (They are internal in the sense that they are an
-"implementation detail" and their instances just go away after the RTyper is
-finished; the code generators should only use the ``concretetype`` attributes,
-which are not Repr instances but `low-level types`_.)
-
-A representation contains all the logic about mapping a specific SomeXxx()
-annotation to a specific low-level type.  For the time being, the RTyper
-assumes that each SomeXxx() instance needs only one "canonical" representation.
-For example, all variables annotated with SomeInteger() will correspond to the
-``Signed`` low-level type via the ``IntegerRepr`` representation.  More subtly,
-variables annotated SomeList() can correspond either to a structure holding an
-array of items of the correct type, or -- if the list in question is just a
-range() with a constant step -- a structure with just start and stop fields.
-
-This example shows that two representations may need very different low-level
-implementations for the same high-level operations.  This is the reason for
-turning representations into explicit objects.
-
-The base Repr class is defined in `rpython/rmodel.py`_.  Most of the
-``rpython/r*.py`` files define one or a few subclasses of Repr.  The method
-getrepr() of the RTyper will build and cache a single Repr instance per
-SomeXxx() instance; moreover, two SomeXxx() instances that are equal get the
-same Repr instance.
-
-The key attribute of a Repr instance is called ``lowleveltype``, which is what
-gets copied into the attribute ``concretetype`` of the Variables that have been
-given this representation.  The RTyper also computes a ``concretetype`` for
-Constants, to match the way they are used in the low-level operations (for
-example, ``int_add(x, 1)`` requires a ``Constant(1)`` with
-``concretetype=Signed``, but an untyped ``add(x, 1)`` works with a
-``Constant(1)`` that must actually be a PyObject at run-time).
-
-In addition to ``lowleveltype``, each Repr subclass provides a set of methods
-called ``rtype_op_xxx()`` which define how each high-level operation ``op_xxx``
-is turned into low-level operations.
-
-
-.. _`low-level type`:
-
-Low-Level Types
----------------
-
-The RPython Typer uses a standard low-level model which we believe can
-correspond rather directly to various target languages from C to LLVM_ to Java.
-This model is implemented in the first part of 
-`rpython/lltypesystem/lltype.py`_.
-
-The second part of `rpython/lltypesystem/lltype.py`_ is a runnable
-implementation of these types, for testing purposes.  It allows us to write
-and test plain Python code using a malloc() function to obtain and manipulate
-structures and arrays.  This is useful for example to implement and test
-RPython types like 'list' with its operations and methods.
-
-The basic assumption is that Variables (i.e. local variables and function
-arguments and return value) all contain "simple" values: basically, just
-integers or pointers.  All the "container" data structures (struct and array)
-are allocated in the heap, and they are always manipulated via pointers.
-(There is no equivalent to the C notion of local variable of a ``struct`` type.)
-
-Here is a quick tour:
-
-    >>> from pypy.rpython.lltypesystem.lltype import *
-
-Here are a few primitive low-level types, and the typeOf() function to figure
-them out:
-
-    >>> Signed
-    <Signed>
-    >>> typeOf(5)
-    <Signed>
-    >>> typeOf(r_uint(12))
-    <Unsigned>
-    >>> typeOf('x')
-    <Char>
-
-Let's say that we want to build a type "point", which is a structure with two
-integer fields "x" and "y":
-
-    >>> POINT = GcStruct('point', ('x', Signed), ('y', Signed))
-    >>> POINT
-    <GcStruct point { x: Signed, y: Signed }>
-
-The structure is a ``GcStruct``, which means a structure that can be allocated
-in the heap and eventually freed by some garbage collector.  (For platforms
-where we use reference counting, think about ``GcStruct`` as a struct with an
-additional reference counter field.)
-
-Giving a name ('point') to the GcStruct is only for clarity: it is used in the
-representation.
-
-    >>> p = malloc(POINT)
-    >>> p
-    <* struct point { x=0, y=0 }>
-    >>> p.x = 5
-    >>> p.x
-    5
-    >>> p
-    <* struct point { x=5, y=0 }>
-
-``malloc()`` allocates a structure from the heap, initalizes it to 0
-(currently), and returns a pointer to it.  The point of all this is to work with
-a very limited, easily controllable set of types, and define implementations of
-types like list in this elementary world.  The ``malloc()`` function is a kind
-of placeholder, which must eventually be provided by the code generator for the
-target platform; but as we have just seen its Python implementation in
-`rpython/lltypesystem/lltype.py`_ works too, which is primarily useful for
-testing, interactive exploring, etc.
-
-The argument to ``malloc()`` is the structure type directly, but it returns a
-pointer to the structure, as ``typeOf()`` tells you:
-
-    >>> typeOf(p)
-    <* GcStruct point { x: Signed, y: Signed }>
-
-For the purpose of creating structures with pointers to other structures, we can
-declare pointer types explicitly:
-
-    >>> typeOf(p) == Ptr(POINT)
-    True
-    >>> BIZARRE = GcStruct('bizarre', ('p1', Ptr(POINT)), ('p2', Ptr(POINT)))
-    >>> b = malloc(BIZARRE)
-    >>> b.p1
-    <* None>
-    >>> b.p1 = b.p2 = p
-    >>> b.p1.y = 42
-    >>> b.p2.y
-    42
-
-The world of low-level types is more complicated than integers and GcStructs,
-though.  The next pages are a reference guide.
-
-
-Primitive Types
-+++++++++++++++
-
-Signed
-    a signed integer in one machine word (a ``long``, in C)
-
-Unsigned
-    a non-signed integer in one machine word (``unsigned long``)
-
-Float
-    a 64-bit float (``double``)
-
-Char
-    a single character (``char``)
-
-Bool
-    a boolean value
-
-Void
-    a constant.  Meant for variables, function arguments, structure fields, etc.
-    which should disappear from the generated code.
-
-
-Structure Types
-+++++++++++++++
-
-Structure types are built as instances of 
-``pypy.rpython.lltypesystem.lltype.Struct``::
-
-    MyStructType = Struct('somename',  ('field1', Type1), ('field2', Type2)...)
-    MyStructType = GcStruct('somename',  ('field1', Type1), ('field2', Type2)...)
-
-This declares a structure (or a Pascal ``record``) containing the specified
-named fields with the given types.  The field names cannot start with an
-underscore.  As noted above, you cannot directly manipulate structure objects,
-but only pointer to structures living in the heap.
-
-By contrast, the fields themselves can be of primitive, pointer or container
-type.  When a structure contains another structure as a field we say that the
-latter is "inlined" in the former: the bigger structure contains the smaller one
-as part of its memory layout.
-
-A structure can also contain an inlined array (see below), but only as its last
-field: in this case it is a "variable-sized" structure, whose memory layout
-starts with the non-variable fields and ends with a variable number of array
-items.  This number is determined when a structure is allocated in the heap.
-Variable-sized structures cannot be inlined in other structures.
-
-GcStructs have a platform-specific GC header (e.g. a reference counter); only
-these can be dynamically malloc()ed.  The non-GC version of Struct does not have
-any header, and is suitable for being embedded ("inlined") inside other
-structures.  As an exception, a GcStruct can be embedded as the first field of a
-GcStruct: the parent structure uses the same GC header as the substructure.
-
-
-Array Types
-+++++++++++
-
-An array type is built as an instance of 
-``pypy.rpython.lltypesystem.lltype.Array``::
-
-    MyIntArray = Array(Signed)
-    MyOtherArray = Array(MyItemType)
-    MyOtherArray = GcArray(MyItemType)
-
-Or, for arrays whose items are structures, as a shortcut::
-
-    MyArrayType = Array(('field1', Type1), ('field2', Type2)...)
-
-You can build arrays whose items are either primitive or pointer types, or
-(non-GC non-varsize) structures.
-
-GcArrays can be malloc()ed.  The length must be specified when malloc() is
-called, and arrays cannot be resized; this length is stored explicitly in a
-header.
-
-The non-GC version of Array can be used as the last field of a structure, to
-make a variable-sized structure.  The whole structure can then be malloc()ed,
-and the length of the array is specified at this time.
-
-
-Pointer Types
-+++++++++++++
-
-As in C, pointers provide the indirection needed to make a reference modifiable
-or sharable.  Pointers can only point to a structure, an array, a function
-(see below) or a PyObject (see below).  Pointers to primitive types, if needed,
-must be done by pointing to a structure with a single field of the required
-type.  Pointer types are declared by::
-
-   Ptr(TYPE)
-
-At run-time, pointers to GC structures (GcStruct, GcArray and PyObject) hold a
-reference to what they are pointing to.  Pointers to non-GC structures that can
-go away when their container is deallocated (Struct, Array) must be handled
-with care: the bigger structure of which they are part of could be freed while
-the Ptr to the substructure is still in use.  In general, it is a good idea to
-avoid passing around pointers to inlined substructures of malloc()ed structures.
-(The testing implementation of `rpython/lltypesystem/lltype.py`_ checks to some
-extent that you are not trying to use a pointer to a structure after its
-container has been freed, using weak references.  But pointers to non-GC
-structures are not officially meant to be weak references: using them after what
-they point to has been freed just crashes.)
-
-The malloc() operation allocates and returns a Ptr to a new GC structure or
-array.  In a refcounting implementation, malloc() would allocate enough space
-for a reference counter before the actual structure, and initialize it to 1.
-Note that the testing implementation also allows malloc() to allocate a non-GC
-structure or array with a keyword argument ``immortal=True``.  Its purpose is to
-declare and initialize prebuilt data structures which the code generators will
-turn into static immortal non-GC'ed data.
-
-
-Function Types
-++++++++++++++
-
-The declaration::
-
-    MyFuncType = FuncType([Type1, Type2, ...], ResultType)
-
-declares a function type taking arguments of the given types and returning a
-result of the given type.  All these types must be primitives or pointers.  The
-function type itself is considered to be a "container" type: if you wish, a
-function contains the bytes that make up its executable code.  As with
-structures and arrays, they can only be manipulated through pointers.
-
-The testing implementation allows you to "create" functions by calling
-``functionptr(TYPE, name, **attrs)``.  The extra attributes describe the
-function in a way that isn't fully specified now, but the following attributes
-*might* be present:
-
-    :_callable:  a Python callable, typically a function object.
-    :graph:      the flow graph of the function.
-
-
-The PyObject Type
-+++++++++++++++++
-
-This is a special type, for compatibility with CPython: it stands for a
-structure compatible with PyObject.  This is also a "container" type (thinking
-about C, this is ``PyObject``, not ``PyObject*``), so it is usually manipulated
-via a Ptr.  A typed graph can still contain generic space operations (add,
-getitem, etc.) provided they are applied on objects whose low-level type is
-``Ptr(PyObject)``.  In fact, code generators that support this should consider
-that the default type of a variable, if none is specified, is ``Ptr(PyObject)``.
-In this way, they can generate the correct code for fully-untyped flow graphs.
-
-The testing implementation allows you to "create" PyObjects by calling
-``pyobjectptr(obj)``.
-
-
-Opaque Types
-++++++++++++
-
-Opaque types represent data implemented in a back-end specific way.  This data cannot be inspected or manipulated.
-
-There is a predefined opaque type ``RuntimeTypeInfo``; at run-time, a value of type ``RuntimeTypeInfo`` represents a low-level type.  In practice it is probably enough to be able to represent GcStruct and GcArray types.  This is useful if we have a pointer of type ``Ptr(S)`` which can at run-time point either to a malloc'ed ``S`` alone, or to the ``S`` first field of a larger malloc'ed structure.  The information about the exact larger type that it points to can be computed or passed around as a ``Ptr(RuntimeTypeInfo)``.  Pointer equality on ``Ptr(RuntimeTypeInfo)`` can be used to check the type at run-time.
-
-At the moment, for memory management purposes, some back-ends actually require such information to be available at run-time in the following situation: when a GcStruct has another GcStruct as its first field.  A reference-counting back-end needs to be able to know when a pointer to the smaller structure actually points to the larger one, so that it can also decref the extra fields.  Depending on the situation, it is possible to reconstruct this information without having to store a flag in each and every instance of the smaller GcStruct.  For example, the instances of a class hierarchy can be implemented by nested GcStructs, with instances of subclasses extending instances of parent classes by embedding the parent part of the instance as the first field.  In this case, there is probably already a way to know the run-time class of the instance (e.g. a vtable pointer), but the back-end cannot guess this.  This is the reason for which ``RuntimeTypeInfo`` was originally introduced: just after the GcStruct is created, the function attachRuntimeTypeInfo() should be called to attach to the GcStruct a low-level function of signature ``Ptr(GcStruct) -> Ptr(RuntimeTypeInfo)``.  This function will be compiled by the back-end and automatically called at run-time.  In the above example, it would follow the vtable pointer and fetch the opaque ``Ptr(RuntimeTypeInfo)`` from the vtable itself.  (The reference-counting GenC back-end uses a pointer to the deallocation function as the opaque ``RuntimeTypeInfo``.)
-
-
-Implementing RPython types
---------------------------
-
-As hinted above, the RPython types (e.g. 'list') are implemented in some
-"restricted-restricted Python" format by manipulating only low-level types, as
-provided by the testing implementation of malloc() and friends.  What occurs
-then is that the same (tested!) very-low-level Python code -- which looks really
-just like C -- is then transformed into a flow graph and integrated with the
-rest of the user program.  In other words, we replace an operation like ``add``
-between two variables annotated as SomeList, with a ``direct_call`` operation
-invoking this very-low-level list concatenation.
-
-This list concatenation flow graph is then annotated as usual, with one
-difference: the annotator has to be taught about malloc() and the way the
-pointer thus obtained can be manipulated.  This generates a flow graph which is
-hopefully completely annotated with SomePtr() annotation.  Introduced just for
-this case, SomePtr maps directly to a low-level pointer type.  This is the only
-change needed to the Annotator to allow it to perform type inference of our
-very-low-level snippets of code.
-
-See for example `rpython/rlist.py`_.
-
-
-HighLevelOp interface
-+++++++++++++++++++++
-
-In the absence of more extensive documentation about how RPython types are
-implemented, here is the interface and intended usage of the 'hop'
-argument that appears everywhere.  A 'hop' is a HighLevelOp instance,
-which represents a single high-level operation that must be turned into
-one or several low-level operations.
-
-    ``hop.llops``
-        A list-like object that records the low-level operations that
-        correspond to the current block's high-level operations.
-
-    ``hop.genop(opname, list_of_variables, resulttype=resulttype)``
-        Append a low-level operation to ``hop.llops``.  The operation has
-        the given opname and arguments, and returns the given low-level
-        resulttype.  The arguments should come from the ``hop.input*()``
-        functions described below.
-
-    ``hop.gendirectcall(ll_function, var1, var2...)``
-        Like hop.genop(), but produces a ``direct_call`` operation that
-        invokes the given low-level function, which is automatically
-        annotated with low-level types based on the input arguments.
-
-    ``hop.inputargs(r1, r2...)``
-        Reads the high-level Variables and Constants that are the
-        arguments of the operation, and convert them if needed so that
-        they have the specified representations.  You must provide as many
-        representations as the operation has arguments.  Returns a list of
-        (possibly newly converted) Variables and Constants.
-
-    ``hop.inputarg(r, arg=i)``
-        Same as inputargs(), but only converts and returns the ith
-        argument.
-
-    ``hop.inputconst(lltype, value)``
-        Returns a Constant with a low-level type and value.
-
-Manipulation of HighLevelOp instances (this is used e.g. to insert a
-'self' implicit argument to translate method calls):
-
-    ``hop.copy()``
-        Returns a fresh copy that can be manipulated with the functions
-        below.
-
-    ``hop.r_s_popfirstarg()``
-        Removes the first argument of the high-level operation.  This
-        doesn't really changes the source SpaceOperation, but modifies
-        'hop' in such a way that methods like inputargs() no longer see
-        the removed argument.
-
-    ``hop.v_s_insertfirstarg(v_newfirstarg, s_newfirstarg)``
-        Insert an argument in front of the hop.  It must be specified by
-        a Variable (as in calls to hop.genop()) and a corresponding
-        annotation.
-
-    ``hop.swap_fst_snd_args()``
-        Self-descriptive.
-
-Exception handling:
-
-    ``hop.has_implicit_exception(cls)``
-        Checks if hop is in the scope of a branch catching the exception 
-        'cls'.  This is useful for high-level operations like 'getitem'
-        that have several low-level equivalents depending on whether they
-        should check for an IndexError or not.  Calling
-        has_implicit_exception() also has a side-effect: the rtyper
-        records that this exception is being taken care of explicitely.
-
-    ``hop.exception_is_here()``
-        To be called with no argument just before a llop is generated.  It
-        means that the llop in question will be the one that should be
-        protected by the exception catching.  If has_implicit_exception()
-        was called before, then exception_is_here() verifies that *all*
-        except links in the graph have indeed been checked for with an
-        has_implicit_exception().  This is not verified if
-        has_implicit_exception() has never been called -- useful for
-        'direct_call' and other operations that can just raise any exception.
-
-    ``hop.exception_cannot_occur()``
-        The RTyper normally verifies that exception_is_here() was really
-        called once for each high-level operation that is in the scope of
-        exception-catching links.  By saying exception_cannot_occur(),
-        you say that after all this particular operation cannot raise
-        anything.  (It can be the case that unexpected exception links are
-        attached to flow graphs; e.g. any method call within a
-        ``try:finally:`` block will have an Exception branch to the finally
-        part, which only the RTyper can remove if exception_cannot_occur()
-        is called.)
+Between RTyping and C source generation there are two optional transforms:
+the "backend optimizations" and the "stackless transform".
 
+Backend Optimizations
+---------------------
 
-.. _LLInterpreter:
+Inlining, malloc removal, ...
 
-The LLInterpreter
------------------
+The Stackless Transform
+-----------------------
+
+XXX write this bit
 
-The LLInterpreter is a simply piece of code that is able to interpret flow
-graphs. This is very useful for testing purposes, especially if you work on
-the `RPython Typer`_. The most useful interface for it is the ``interpret``
-function in the file `pypy/rpython/test/test_llinterp.py`_. It takes as
-arguments a function and a list of arguments with which the function is
-supposed to be called. Then it generates the flow graph, annotates is
-according to the types of the arguments you passed to it and runs the
-LLInterpreter on the result. Example::
-
-    def test_invert():
-        def f(x):
-            return ~x
-        res = interpret(f, [3])
-        assert res == ~3
-
-Furthermore there is a function ``interpret_raises`` which behaves much like
-``py.test.raises``. It takes an exception as a first argument, the function to
-be called as a second and the list of function arguments as a third. Example::
-
-    def test_raise():
-        def raise_exception(i):
-            if i == 42:
-                raise IndexError
-            elif i == 43:
-                raise ValueError
-            return i
-        res = interpret(raise_exception, [41])
-        assert res == 41
-        interpret_raises(IndexError, raise_exception, [42])
-        interpret_raises(ValueError, raise_exception, [43])
+.. or steal it from Carl...
+
+.. _`preparing the graphs for source generation`:
+
+Preparation for Source Generation
+=================================
 
 .. _C:
 .. _GenC:
+.. _`c backend`:
 
 The C Back-End
 ==============
 
 http://codespeak.net/pypy/dist/pypy/translator/c/
 
-
-Overview
---------
-
-The task of GenC is to convert a flow graph into C code.  By itself, GenC does
-not use the annotations in the graph.  It can actually convert unannotated
-graphs to C.  However, to make use of the annotations if they are present, an
-extra pass is needed: the `RPython Typer`_, whose task is to modify the flow
-graph according to the annotations, replacing operations with lower-level C-ish
-equivalents.
-
-This version of GenC, using the RPython Typer, is recent but quite stable and
-already flexible (for example, it can produce code with two different memory
-management policies, reference-counting or using the Boehm garabage collector).
-
 GenC is not really documented at the moment.  The basic principle of
 creating code from flowgraphs is similar to the `Python back-end`_.
 See also `Generating C code`_ in another draft.
@@ -994,25 +283,41 @@
 .. _`Generating C code`: dynamic-language-translation.html#generating-c-code
 
 
+A Historical Note
+=================
+
+As this document has shown, the translation step is divided into more
+steps than one might at first expect.  It is certainly divided into more
+steps than we expected when the project started; the very first version of
+GenC operated on the high-level flow graphs and the output of the
+annotator, and even the concept of the RTyper didn't exist yet.  More
+recently, the fact that preparing the graphs for source generation
+("databasing") and actually generating the source are best considered
+separately has become clear.
+
+This process is reflected in the source somewhat; for example, the LLVM
+and C backends use different implementations of the graph preparation
+code, although there is no real reason for this.
+
+
+Other backends
+==============
+
 .. _LLVM:
 
 The LLVM Back-End
-=================
+-----------------
 
 http://codespeak.net/pypy/dist/pypy/translator/llvm/
 
 For information on getting started on the LLVM (`low level virtual machine`_)
 backend - please see `here`_. 
 
-Overview
---------
 Similar to the task of GenC, GenLLVM translates a flow graph to low level LLVM
 bytecode.  GenLLVM requires annotation and the `RPython Typer`_ pass, to modify
 the annotated flow graph to replace operations with lower-level equivalents
 which are suitable to be easily translated to LLVM bytecode.
 
-History
--------
 The LLVM backend would not have been possible without all the people
 contributing to PyPy. Carl Friedrich did an amazing amount of groundwork during
 the first half of 2005. Carl Friedrich and Holger then initiated a revamped
@@ -1029,217 +334,15 @@
 .. _`Python again`:
 .. _`Python back-end`:
 
-The Interplevel Back-End
-========================
+The Object-Oriented Backends
+----------------------------
 
-http://codespeak.net/pypy/dist/pypy/translator/geninterplevel.py
+The Interpreter-Level backend
+-----------------------------
 
-Motivation
-----------
-
-PyPy often makes use of `application-level`_ helper methods.
-The idea of the 'geninterplevel' backend is to automatically transform
-such application level implementations to their equivalent representation
-at interpreter level.  Then, the RPython to C translation hopefully can
-produce more efficient code than always re-interpreting these methods.
-
-One property of translation from application level Python to
-Python is, that the produced code does the same thing as the
-corresponding interpreted code, but no interpreter is needed
-any longer to execute this code.
-
-.. _`application-level`: coding-guide.html#app-preferable
-.. _exceptions: http://codespeak.net/pypy/dist/pypy/lib/_exceptions.py
-.. _oldstyle: http://codespeak.net/pypy/dist/pypy/lib/_classobj.py
-
-Examples are exceptions_ and oldstyle_ classes. They are
-needed in a very early phase of bootstrapping StdObjspace, but
-for simplicity, they are written as RPythonic application
-level code. This implies that the interpreter must be quite
-completely initialized to execute this code, which is
-impossible in the early phase, where we have neither
-exceptions implemented nor classes available.
-
-Solution
---------
-
-This bootstrap issue is solved by invoking a new bytecode interpreter which
-runs on FlowObjspace. FlowObjspace is complete without complicated
-initialization. It is able to do abstract interpretation of any
-Rpythonic code, without actually implementing anything. It just
-records all the operations the bytecode interpreter would have done by
-building flowgraphs for all the code. What the Python backend does is
-just to produce correct Python code from these flowgraphs and return
-it as source code.
-
-Example
--------
-
-.. _implementation: http://codespeak.net/pypy/dist/pypy/translator/geninterplevel.py
-
-Let's try the little example from above_. You might want to look at the
-flowgraph that it produces. Here, we directly run the Python translation
-and look at the generated source. See also the header section of the implementation_
-for the interface::
-
-    >>> from pypy.translator.geninterplevel import translate_as_module
-    >>> entrypoint, source = translate_as_module("""
-    ...
-    ... def g(n):
-    ...     i = 0
-    ...     while n:
-    ...         i = i + n
-    ...         n = n - 1
-    ...     return i
-    ...
-    ... """)
-
-This call has invoked a PyPy bytecode interpreter running on FlowObjspace,
-recorded every possible codepath into a flowgraph, and then rendered the
-following source code:: 
-
-    >>> print source
-    #!/bin/env python
-    # -*- coding: LATIN-1 -*-
-
-    def initapp2interpexec(space):
-      """NOT_RPYTHON"""
-
-      def g(space, __args__):
-        funcname = "g"
-        signature = ['n'], None, None
-        defaults_w = []
-        w_n_2, = __args__.parse(funcname, signature, defaults_w)
-        return fastf_g(space, w_n_2)
-
-      f_g = g
-
-      def g(space, w_n_2):
-        goto = 3 # startblock
-        while True:
-
-            if goto == 1:
-                v0 = space.is_true(w_n)
-                if v0 == True:
-                    w_n_1, w_0 = w_n, w_i
-                    goto = 2
-                else:
-                    assert v0 == False
-                    w_1 = w_i
-                    goto = 4
-
-            if goto == 2:
-                w_2 = space.add(w_0, w_n_1)
-                w_3 = space.sub(w_n_1, space.w_True)
-                w_n, w_i = w_3, w_2
-                goto = 1
-                continue
-
-            if goto == 3:
-                w_n, w_i = w_n_2, space.w_False
-                goto = 1
-                continue
-
-            if goto == 4:
-                return w_1
-
-      fastf_g = g
-
-      g3dict = space.newdict([])
-      gs___name__ = space.wrap('__name__')
-      gs_app2interpexec = space.wrap('app2interpexec')
-      space.setitem(g3dict, gs___name__, gs_app2interpexec)
-      gs_g = space.wrap('g')
-      from pypy.interpreter import gateway
-      gfunc_g = space.wrap(gateway.interp2app(f_g, unwrap_spec=[gateway.ObjSpace, gateway.Arguments]))
-      space.setitem(g3dict, gs_g, gfunc_g)
-      return g3dict
-
-You see that actually a single function is produced: ``initapp2interpexec``. This is the
-function that you will call with a space as argument. It defines a few functions and then
-does a number of initialization steps, builds the global objects the function need,
-and produces the interface function ``gfunc_g`` to be called from interpreter level.
-
-The return value is ``g3dict``, which contains a module name and the function we asked for.
-
-Let's have a look at the body of this code: The first definition of ``g`` is just
-for the argument parsing and is used as ``f_g`` in the ``gateway.interp2app``.
-We look at the second definition, ``fastf_g``, which does the actual
-computation. Comparing to the flowgraph from above_,
-you see a code block for every block in the graph.
-Since Python has no goto statement, the jumps between the blocks are implemented
-by a loop that switches over a ``goto`` variable.
-
-::
-
-    .       if goto == 1:
-                v0 = space.is_true(w_n)
-                if v0 == True:
-                    w_n_1, w_0 = w_n, w_i
-                    goto = 2
-                else:
-                    assert v0 == False
-                    w_1 = w_i
-                    goto = 4
-
-This is the implementation of the "``while n:``". There is no implicit state,
-everything is passed over to the next block by initializing its
-input variables. This directly resembles the nature of flowgraphs.
-They are completely stateless.
-
-
-::
-
-    .       if goto == 2:
-                w_2 = space.add(w_0, w_n_1)
-                w_3 = space.sub(w_n_1, space.w_True)
-                w_n, w_i = w_3, w_2
-                goto = 1
-                continue
-
-The "``i = i + n``" and "``n = n - 1``" instructions.
-You see how every instruction produces a new variable.
-The state is again shuffled around by assigning to the
-input variables ``w_n`` and ``w_i`` of the next target, block 1.
-
-Note that it is possible to rewrite this by re-using variables,
-trying to produce nested blocks instead of the goto construction
-and much more. The source would look much more like what we
-used to write by hand. For the C backend, this doesn't make much
-sense since the compiler optimizes it for us. For the Python interpreter it could
-give a bit more speed. But this is a temporary format and will
-get optimized anyway when we produce the executable.
-
-Interplevel Snippets in the Sources
------------------------------------
-
-.. _`_exceptions.py`: http://codespeak.net/pypy/dist/pypy/lib/_exceptions.py
-.. _`_classobj.py`: http://codespeak.net/pypy/dist/pypy/lib/_classobj.py
-
-Code written in application space can consist of complete files
-to be translated (`_exceptions.py`_, `_classobj.py`_), or they
-can be tiny snippets scattered all over a source file, similar
-to our example from above.
-
-Translation of these snippets is done automatically and cached
-in pypy/_cache with the modulename and the md5 checksum appended
-to it as file name. If you have run your copy of pypy already,
-this folder should exist and have some generated files in it.
-These files consist of the generated code plus a little code
-that auto-destructs the cached file (plus .pyc/.pyo versions)
-if it is executed as __main__. On windows this means you can wipe
-a cached code snippet clear by double-clicking it. Note also that
-the auto-generated __init__.py file wipes the whole directory
-when executed.
-
-XXX this should go into some interpreter.doc, where gateway should be explained
-
-
-How it works
-------------
-
-XXX to be added later
+http://codespeak.net/pypy/dist/pypy/translator/geninterplevel.py
 
+See `geninterp's documentation <geninterp.html>`__.
 
 .. _extfunccalls:
 
@@ -1292,6 +395,8 @@
 implemented by calling appropriate low level to high level conversion
 functions and then calling the original funtion again.
 
+.. _`low-level type`: rtyper.html#low-level-type
+
 If the function is supposed to really be implemented by the backend then the
 low level function should have an attribute ``.suggested_primitive = True``
 attached. If this is not the case the low level function itself will be
@@ -1410,4 +515,7 @@
 .. _`mixed posix module`: ../module/posix/
 
 
+How It Fits Together
+====================
+
 .. include:: _ref.txt



More information about the Pypy-commit mailing list