[pypy-dev] planning the 0.6.2 PyPy release

Armin Rigo arigo at tunes.org
Mon Jun 13 15:08:40 CEST 2005

Hi Ben,

On Mon, Jun 13, 2005 at 10:06:15AM +0100, Ben.Young at risk.sungard.com wrote:
> >    (There are still some open questions there; for
> >example, it's not too difficult to detect list comprehensions by looking
> >at the flow graphs; then instead of heuristic over-allocation for the
> >list that is being built, it's often possible to determine in advance
> >the size of the list -- e.g. because it will be the same length as
> >another list that we iterate over.)
> Does this mean that when you eventually come to *dynamically* annotate 
> code that lists will be an annotated version of the std.listobject, or do 
> you expect it to use the rpython typer as well?

The dynamic annotation part is still unclear, but I suppose that it will
generally be done at a lower level than std.listobject.  The goal is to
keep it mostly independent on the language being interpreted/compiled,
and only dependent on RPython.

If we start with an approach based on Psyco, it might even not use the
annotator/rtyper directly: just like we obtain a CPython-like piece of C
code by translating the rtyped flow graphs of PyPy into C, we would
obtain a Psyco-like piece of code by translating the same rtyped flow
graphs of PyPy into something different -- i.e. the JIT would work at
the level of low-level operations (getfield, setfield...) which is more
concrete even than RPython.

Assuming for now, for simplicity, that we don't worry about the time the
JIT takes to compile code, we would obtain the following diagram (we're
working on the rtyper and genc parts now; anything above is basically
done, and the rest some potential draft plan):

  PyPy interpreter ------------> flow graphs

            -----annotation----> annotated flow graphs

            ------rtyper-------> low-level flow graphs

                                  /        \
                           genc  /          \
                                /            \ genpickle-like
                              |/              \
                              `-               \
                        C/LLVM code             \|
     user app----------[interpreter]            -'
     (without JIT)                    file containing compact data
                                      describing the ll flow graph
                                                   | input file
     user app-----------------------[flow graph abstract interpreter]
     (with JIT)                            (manually written)

The key component in the JIT part is the flow graph abstract
interpreter.  Its role is to take a specific, frozen flow graph as input
file (which is the flow graph of PyPy) and interpret it.  Doing so
naively would just interpret PyPy interpreting the user program, so it
would be slow again -- double interpretation, even if the flow graph
interpreter itself can probably be quite fast because the flow graphs at
this point contain very simple low-level operations.  Instead, the flow
graph interpreter is "abstract", which means that it will compute and
propagate some annotations on its own.  These annotations are much
simpler than the ones in the current annotator, because they are about
low-level types; they would look like "the field 'ob_type' of this
GcStruct here is known to contain such-and-such constant value", or
"this GcStruct doesn't really have to be allocated on the heap so far
because we can keep its few non-constant fields in local variables".
This process is dynamic because it follows PyPy following the user
application.  For example, if we say that all the data structures
corresponding in PyPy to the PyCode objects are completely constant
(with values that comes from the user program -- they are the constant
code objects present in the user program) then all the bytecode decoding
and dispatching logic done by PyPy (present in the flow graphs of PyPy)
is constant, computable in advance as annotations.

Using internally a process similar to annotation / rtyper / code
generator, but much simpler thanks to the low-level types, the flow
graph abstract interpreter would be able to generate dynamically new
flow graphs that are simplified versions of the input flow graphs, where
only the operations between things that couldn't be annotated as
constants are kept.

This is essentially how Psyco works.  I am not sure about the details or
how to integrate it better with the existing annotation framework
(which, with more efforts, would give interesting hints about user
applications too, instead of just about PyPy).  I am not sure if any of
this will be done as I describe above, actually. :-)  What I am
reasonably sure is only that it *could* be done like this.

A bientot,


More information about the Pypy-dev mailing list