[pypy-dev] Questions

Tue Dec 9 23:18:59 CET 2003

Hi Richard,

[Richard Emslie Tue, Dec 09, 2003 at 04:04:35PM +0000]
> I've been reading through the source code and the docs, and getting some
> jist of what is going on.  I guess what I was expecting to see something
> more like the CPython code but in python (like why do we have different
> object spaces, although I see the errors of my ways now :-) ) and was
> failing to understand the big picture.

understandable.  Reverse engeneering documentation from plain code
is not always easy :-) 

> So reading between the lines, does this sound anything quite like what we
> are trying to achieve...
> 
> The abstraction of the object spaces are so we can perform abstract
> interpretation with one set, a working interpreter with another, a minimal
> interpreter with another, and goodness knows what else ;-)

right. 

> So to create our initial interpreter, we take the interpreter code, multi-method
> dispatcher and the standard object space and we can abstractly interpret
> with the interpreter/flow object space/annotation. 

yes, more precisely the interpreter/flowobjspace combination should be
able to perform abstract interpretation on any RPython program. RPython
is our acronym for "not quite as dynamic as python". But note, that
we basically allow *full dynamism* including metaclasses and all the
fancy stuff during *initialization* of the interpreter and its object
spaces. Only when we actually interprete code objects from an
app-level program we restrict the involved code to be RPythonic. 

The interpreter/flowobjspace combination will start abstract
interpretation on some initial function object, say e.g.  frame.run().
The frame and the bytecode/opcode implementations it invokes will work
with e.g. the StdObjSpace. The flowobjspace doesn't care on which
objspace the frame/opcodes execute. The flowobjspace and its interpreter
instance don't care if they run on something else than pypy :-) 

Actually thinking in more detail about this will probably lead us into the 
still muddy waters of the whole bootstrapping process but let's not get 
distracted here:-) 

> That stage involves
> building up a set of basic blocks, building a flow graph, type inference
> and then translating (sorry I get a bit lost here with what happens where,
> ie when does the flow object space stop and annotation start, but the
> answer to that one is to read more code ;-) ) to pyrex/CL/other low level
> code.

exactly. 

> Does that sound about right so far?   Then do either of these make sense
> (purely speculation... and most likely nonsense)
> 
> Also if we write the flow object space and annotation in RPython we can
> pipe that through itself, to generate low level code too.  Now my main
> question is - how do we combine the two object spaces such that we do
> abstract intepretation and annotation in a running interpreter (also I
> guess we would either need some very low level translation, ie machine
> code or some LLVM like architecture to do this?)

(first: see my above reference of muddy waters :-) 

In theory, we can annotate/translate flowobjspace itself, thus producing
a low-level (pyrex/lisp/c/llvm) representation of our abstract
interpretation code. When executing this lower-level representation
on ourself again we should produce to same representation we are
currently running.  I think this is similar to the 3-stage gcc building
process: First it uses some external component to build itsself
(stage1). It uses stage1 to compile itself again to stage2. It then uses
stage2 to recompile itself again to stage3 and sees if it still works. 
Thus the whole program serves as a good testbed if everything works right.  

> Once we have broken the interpeter - standard object space into a finite -
> into a set of blocks and graph, and translate those blocks into low level
> code - we could view any python bytecode operating on this as a traversal
> over the blocks.

Hmm, yes i think that's right although i would rephrase a bit: the flowgraph 
obtained from abstract interpretation is just another representation of a/our
python program.  Code objects (which contain the bytecodes) are
themselves a representation of python source text. 

The flowgraph of course provides a lot of interesting information (like
all possible code pathes and low-level identification of variable state)
and makes it explicitely available for annotation and translation.
Btw, at the moment annotation justs *uses* the flowgraph but not he
other way round.  (In the future we might want to drive them more in
parallel in order to allow the flowobjspace code to consult the
annotation module. Then the flowgraph code could possibly avoid
producing  representations where annotation/type inference is not able
anymore to produce exact types). 

> Therefore we could create a new flow graph from this
> traversal, and feed it into some LLVM like architecture which does the low
> level translation and optimisation phase for us??

There is no need to take this double-indirection. We can produce LLVM 
bytecode directly from python-code with a specific translator (similar to
genpyrex/genclisp). We could translate ourself to make this faster, of
course.  For merging Psyco techniques we will probably want to rely on something
like LLVM to do this dynamically. Generating C-code is usually a pretty
static thing and cannnot easily be done at runtime. 

> Thanks for any feedback... :-)

you are welcome. Feel free to followup ...

cheers,

    holger

P.S.: please note that everything in pypy/annotation/* is just evolving 
      code which is not used anywhere.  the in-use annotation stuff
      is currently in translator ...