[pypy-dev] Thought experiments

Tue Jan 14 17:07:25 CET 2003

I'd like to throw out some thought experiments, just for fun.

As I understand it, the essence of psyco is that it can generate efficient
machine code corresponding to cases in the C interpreter based primarily on
knowledge of the _runtime types_ of operands (variables of the _interpreter_
corresponding to Python operands and intermediate operands).  pyrex
generates C code based on the _declared_ types of objects.  Is this summary
basically correct?

Might we not combine these two approaches? Are there any circumstances in
which we might generate _optimized_ C/assembly code based on the runtime
types of objects in functions?

This suggests a multi-layered approach: use psyco to determine the types of
objects, then _afterwards_ use something like a real C compiler to generate
code.  This can only work if the types of object to functions remains
constant, which is usually true, but certainly not always true.  This would
be similar to other optimization techniques that gather statistics about the
frequency that basic code blocks are executed.  The difference is that here
the first-stage optimizer (psyco) would gather statistics about the _types_
of variables.

Yes, there are problems here, but they correspond, I think, to similar
problems in psyco.  Indeed, psyco must continually check (using
psyco_compatible, if I am not mistaken) to see that the types presented to
the function match the types used to create the assembly-language code.

In other words, might we not consider a new kind of "assembled byte code"?
My guess is that compiled functions/methods would have to start with a
"preamble" that selects the proper compiled code to execute based on the
types of the arguments passed to the function.  Actually, I wonder whether
psyco itself could generate such preambles as an alternative to using
psyco_compatble.  (Or maybe the two techniques are equivalent).

Sure, there could be code explosion in general.  But how likely is this in
practice?  And psyco must limit code bloat as well.  It would be interesting
to get some statistics.  In Leo I suspect the vast majority of functions and
methods use only a single type of each argument and return only a single
type of result.  The null class pattern could be used to deal with None
values of arguments.

We might limit this kind of technique to routines explicitly selected by the
user, or the technique could limit itself to only those functions with
non-varying types.  Just as psyco does, or will do, we would need some kind
of escape if unexpected types of arguments were discovered later.

Any comments?

Edward
--------------------------------------------------------------------
Edward K. Ream   email:  edream at tds.net
Leo: Literate Editor with Outlines
Leo: http://personalpages.tds.net/~edream/front.html
--------------------------------------------------------------------