[pypy-dev] Re: [pypy-sprint] vilnius sprint planning progress

Armin Rigo arigo at tunes.org
Mon Oct 18 11:19:30 CEST 2004

Hi Holger,

I don't know why, but your answer never reached pypy-dev (as far as I can 
tell) so I missed it.  Sorry for the delay.  I will answer with extensive 
quotes, in case nobody else received it either.

On Mon, Oct 11, 2004 at 10:26:30PM +0200, holger krekel wrote:
> > A. PyPy -> FlowGraph
> > ====================
> Because of the current way 'genc.py' is done we can mostly
> forget about "module completeness" for this goal, right? But
> for example, we need to properly get stdout/file interaction
> working otherwise we will never get any output from our first
> translated PyPy/C version. I am not quite clear on how far the
> current genc.py goes in letting us use the CPython runtime.

This is not a problem.  The extension module generated by genc.py manipulates
PyObject* pointers borrowed from CPython.  In this respect genc.py is exactly
like the old Python2C: if our quasi-RPython code currently uses 'file' or
other borrowed objects then what you get in the C code is a lot of calls like
PyObject_Call(&PyFile_Type) and PyObject_GetAttrString(f, "read").

> > The goal would be to get a complete graph, which the
> > existing genc.py can (mostly) already translate to C and run, for testing.
> Well, i consider the how-to-do-exceptions question still a major 
> problem.

It wasn't that complex.  I did it in a recent check-in :-)  Of course, it's
cheating -- it uses CPython's exception mechanisms and conventions like
returning NULL in case of error.  But it's just fine for now.

> > B. FlowGraph -> Optimized FlowGraph
> > ===================================
> Actually the latter sounds to me like there must already be some 
> algorithms out there that do it.  Maybe we should invite Donald
> Knuth to one of our sprints, anyway :-)   More seriously, though, 
> it would be helpful to get some gcc or other compiler people to
> one of our next sprints to give a lecture and help doing flowgraph 
> transformations. 

Yes, there are probably existing techniques out there.  However, I've compared
with what common compilers do for various languages and I can't really find
anything similar.  You have on the one hand languages like C++ or Java where
when the programmer declares a structures (or class), the compiler really puts
a structure in the heap, and cannot do things like inlining it automatically.
In C++ you can inline it yourself: e.g. to inline a structure B in a structure 
A you would declare A as

struct A {    // or class A
  B fieldname;

as opposed to 'B* fieldname' when you just want a pointer to the structure.  
In Java you cannot inline structures at all, all class instances are allocated 
in the heap.

You have another class of programming languages which have more lightweight
structures than Java: the functionnal languages.  But there, all data is
immutable, so it is (more or less) irrelevant if two pointers point to the
same structure or to two equal copies.  This leads the common compilers to
follow a very different memory model.

An exception is maybe the Mercury language.  Here, you have use type
declarations to constrain the usage of structures: for example, you can say
that a function's argument, whose type is "pointer to a struct of type A",
should receive the last pointer in existence to that structure.  This allows
the compiler to optimize the function.  Remember that you cannot modify
anything in these languages, so it is typical to have functions that take a
(big) structure as input argument, and return a copy of the same (big)  
structure with just a couple of fields modified.  In this case, if the input
structure is declared as "you have the last pointer to it", then the compiler
can actually modify the couple of fields of structure in-place and return
that, instead of having to copy it completely first, because we know that
nobody else can notice that we have actually modified an existing structure
(which is forbidden in the language!).

But again this is a bit different.  Moreover in Mercury the programmer must
provide the "uniqueness" annotations; they are not computed algorithmically.

Something else, I now know how to write down the graph transformation rules,
in particular the new flow graph that should replace an operation.  Forget a
new textual syntax for flow graphs, with a parser and everything.  We can use
the plain RPython syntax and build flow graphs with the FlowObjSpace.  For 

def inplace_add__List(lst, iterable):
    for x in iterable:
    return lst

is a clear way to express that the operation 'inplace_add', when the first
argument is a list, should be replaced by (the flow graph of) the body of this

> > C. Optimized FlowGraph -> C code
> > ================================
> We also need to think about ways to test this.  This is most often currently 
> done by transparently compiling the generated C file and calling functions
> in it to see if they produce the expected result.  However, for the
> optimizations we should write test in a more fine grained way, feeding
> it graphs and checking the resulting graphs.  Otherwise we will over time
> get subtle errors and segmentation faults :-) 

Indeed.  Testing that flow graphs themselves are "right" is difficult.  There
is now a checkgraph() function in pypy.objspace.flow.model which checks that
there is no structural error in the flow graph; for the semantics, maybe we
should use Richard's flow graph interpreter, feeding it sample inputs and
checking that we get the correct output.  That's better than compiling because
it doesn't involve so much code, and we don't get segfaults if something goes
wrong :-)


More information about the Pypy-dev mailing list