rxe at ukshells.co.uk
Fri Dec 12 00:11:30 CET 2003
On Tue, 9 Dec 2003, holger krekel wrote:
> Hi Richard,
> [Richard Emslie Tue, Dec 09, 2003 at 04:04:35PM +0000]
> > I've been reading through the source code and the docs, and getting
> > jist of what is going on. I guess what I was expecting to see
> > more like the CPython code but in python (like why do we have
> > object spaces, although I see the errors of my ways now :-) ) and was
> > failing to understand the big picture.
> understandable. Reverse engeneering documentation from plain code
> is not always easy :-)
Thanks Holger for great responses... it has certainly cleared up a few
One thing that is really interesting in understanding PyPy thus far, is
puzzle has two sides; how does it work and why is it done in such a way.
instance we can count 10 different types of frame object in the
What would be a really nice part of the architecture introduction
imagine there are other, better ideas) - is to step through a few simple
examples running in an initialised stdobjspace "interactive.py" session,
describing the various object creation/interactions on the way
(ExecutionContext, Code, Frame, objects) and how method dispatching to the
object spaces flows from Code/Frames.
And then some idea of how the current bootstrapping is working for
(see * below).
It might serve as a nice basis for documenting too (yup I'm volunteering
> > So reading between the lines, does this sound anything quite like what
> > are trying to achieve...
> > The abstraction of the object spaces are so we can perform abstract
> > interpretation with one set, a working interpreter with another, a
> > interpreter with another, and goodness knows what else ;-)
> > So to create our initial interpreter, we take the interpreter code,
> > dispatcher and the standard object space and we can abstractly
> > with the interpreter/flow object space/annotation.
> yes, more precisely the interpreter/flowobjspace combination should be
> able to perform abstract interpretation on any RPython program. RPython
> is our acronym for "not quite as dynamic as python". But note, that
> we basically allow *full dynamism* including metaclasses and all the
> fancy stuff during *initialization* of the interpreter and its object
> spaces. Only when we actually interprete code objects from an
> app-level program we restrict the involved code to be RPythonic.
That explains a lot, I was ironically starting to think RPython is really
dynamic, but after the dust settles I guess that's it. I am assuming
on the call to initialize() [do europeans generally follow american
;-)] we are free to do all sorts of dynamic manipulation to our classes
objects - However, during the course of building the sys module & builtins
we seem start interpreting some bytecodes!! How is that possible if we
have any object spaces ready to act on?
> The interpreter/flowobjspace combination will start abstract
> interpretation on some initial function object, say e.g. frame.run().
> The frame and the bytecode/opcode implementations it invokes will work
> with e.g. the StdObjSpace. The flowobjspace doesn't care on which
> objspace the frame/opcodes execute. The flowobjspace and its interpreter
> instance don't care if they run on something else than pypy :-)
> Actually thinking in more detail about this will probably lead us into
> still muddy waters of the whole bootstrapping process but let's not get
> distracted here:-)
Do you mean what was described above with the bytecode being interpreted
initialisation is complete - or are we talking about memory management,
representation of basic object types in the object space (lists, ints,
system calls (block/nonblocking), system resources (file descriptors),
collection and whatnot. Ok lets not get distracted... :-)
> > That stage involves
> > building up a set of basic blocks, building a flow graph, type
> > and then translating (sorry I get a bit lost here with what happens
> > ie when does the flow object space stop and annotation start, but the
> > answer to that one is to read more code ;-) ) to pyrex/CL/other low
> > code.
> > Does that sound about right so far? Then do either of these make
> > (purely speculation... and most likely nonsense)
> > Also if we write the flow object space and annotation in RPython we
> > pipe that through itself, to generate low level code too. Now my main
> > question is - how do we combine the two object spaces such that we do
> > abstract intepretation and annotation in a running interpreter (also I
> > guess we would either need some very low level translation, ie machine
> > code or some LLVM like architecture to do this?)
> (first: see my above reference of muddy waters :-)
> In theory, we can annotate/translate flowobjspace itself, thus producing
> a low-level (pyrex/lisp/c/llvm) representation of our abstract
> interpretation code. When executing this lower-level representation
> on ourself again we should produce to same representation we are
> currently running.
Yes, I see now. For some reason I thought they would be different.
>I think this is similar to the 3-stage gcc building
> process: First it uses some external component to build itsself
> (stage1). It uses stage1 to compile itself again to stage2. It then uses
> stage2 to recompile itself again to stage3 and sees if it still works.
> Thus the whole program serves as a good testbed if everything works
Funny I used to compile twice doing stage 1 and 2 manually back when
producing buggy versions, if I only knew! ;-)
> > Once we have broken the interpeter - standard object space into a
> > into a set of blocks and graph, and translate those blocks into low
> > code - we could view any python bytecode operating on this as a
> > over the blocks.
> Hmm, yes i think that's right although i would rephrase a bit: the
> obtained from abstract interpretation is just another representation of
> python program. Code objects (which contain the bytecodes) are
> themselves a representation of python source text.
It does have other cool implications if we have a low enough translation
could do away with stacks and frames for execution... :-)
> The flowgraph of course provides a lot of interesting information (like
> all possible code pathes and low-level identification of variable state)
> and makes it explicitely available for annotation and translation.
> Btw, at the moment annotation justs *uses* the flowgraph but not he
> other way round. (In the future we might want to drive them more in
> parallel in order to allow the flowobjspace code to consult the
> annotation module. Then the flowgraph code could possibly avoid
> producing representations where annotation/type inference is not able
> anymore to produce exact types).
Can I ask the silly question of what does annotation actually mean? Is it
seperate from type inference? Don't really follow the parallel part.
With RPython are we assuming that we can always produce exact types?
Is the idea for non-determinsitic points (ie nodes where we cannot infer
types) to be revealed and then propagated up the graph to highest node
first can be determined and create a new snapshot of nodes when any new
enters that point and translate, and adding caching so we don't have to
the snapshot/translation each time (high chance it is going to be the same
> > Therefore we could create a new flow graph from this
> > traversal, and feed it into some LLVM like architecture which does the
> > level translation and optimisation phase for us??
> There is no need to take this double-indirection. We can produce LLVM
> bytecode directly from python-code with a specific translator (similar
> genpyrex/genclisp). We could translate ourself to make this faster, of
> course. For merging Psyco techniques we will probably want to rely on
> like LLVM to do this dynamically. Generating C-code is usually a pretty
> static thing and cannnot easily be done at runtime.
:-) Yes not the best way with the double interpretation.
> > Thanks for any feedback... :-)
> you are welcome. Feel free to followup ...
Yes thanks again! Looking forward to next week... :-)
More information about the Pypy-dev