[pypy-dev] Re: Restricted language

Sun Jan 19 19:50:30 CET 2003

Hello Christian,

On Sat, Jan 18, 2003 at 11:03:34PM +0100, Christian Tismer wrote:
> Would you propose to "start dumb", literally like
> the C source, and add abstractions after the initial
> thing works, or does this have to happen in the first
> place?

I would suggest that wherever we feel that CPython is stuck with a "bad" way
to express things, let's think a moment or two if there is a cleaner way to do
it Pythonically.  Only if no consensus is found, we stick with the CPython
way.

> One thing is the lack of a switch statement in
> Python, which either leads to zillions of elifs,
> or to the use of function tables and indexing.

For the main loop in eval_frame(), I would say use a list or a dict of
functions.  It is more flexible because it would let us experiment with adding
opcodes dynamically.  With some specific support from the "static compiler" it
can later be translated into a regular C switch.

> Another thing is common for-loops in C.
> Almost all of them which I tried to translate
> into Python became while-loops. Is that ok?

Here again I would say use "for i in range(...)" whenever it is clearly what
the C code means.  Compared to "while i < ...", it has the advantage that if
"..." is a complex expression it tells that this expression can be computed
only once.  In C you would have to use workarounds to help the compiler in
this case.

> Data types.
> How do we model the data types which are used
> internally by Python?

We need a common abstraction for all objects, like a base PyObject class with
an ob_type property, maybe nothing more.  In a first place we can implement
objects staightforwardly with the corresponding basic Python objects.  Then we
can provide alternate object implementations which look like CPython's
implementations.  We must allow still other implementations to be added later.

The first phase would be done with a single class which maps attribute
manipulation and method calls to an internal, "real" Python object.  This may
only work for objects like lists and dicts which we have a great deal of
control over from Python; it may not be sufficient for frame objects, for
example.

> I was thinking of some basic classes which describe
> primitive data types, like signed/unsigned integers,
> chars, pointers to primitives, and arrays of
> primitives. Then I would build everything upon these.
> Is that already too low-level?

Something along these lines.  Maybe a little bit higher-level with no explicit
pointers, mutable/immutable flags, and arrays that know their length
(althought out-of-bounds checks are not guaranteed, e.g. the no-debug C
implementation would not have them).  Required pointer indirections can often
be deduced automatically from this info; e.g. tuples can store the array of
items in-place because it is of immutable length.  Lower-level hints may later
be added or experimented with (e.g. in the current CPython implementation,
dicts have a small cache area that is only used if it is small enough, while
lists don't).

I feel that a good way to find out which level of abstraction we should target
would be to think that we may later emit not C code but OCaml code (for
example).

> This means clearly to me, that I should *not*
> repeat the Py_INCREF/Py_DECREF story from Python,
> but we need to do a more abstract formulation of
> that, which allows us to specify it in any desired
> way.

I believe that the above data representation (with maybe the help of more
flags) should be enough to deduce how to make a reference-counting
interpreter.  It may occasionally contain more Py_INCREF/Py_DECREF than the
hand-tuned CPython, but I say never mind.  If it is a real issue in a specific
case then we can always fix it manually with more hints.

> I know I have Python, I would most probably not use
> C string constants all around, but use Python strings.

When the interpreter uses Python strings internally just because handling C
strings is more complex, then of course don't repeat the PyString_FromString()
calls.  We will see later how internal string handling may be translated to C.  
Reserve PyString_FromString() for places where we need a real object visible
from the interpreted program.

> This is just the beginning of a whole can of worms
> to be considered. Just to start it somehow :-)

Sure :-)

By the way, I feel that if one routine deserves some special treatment (like
refactoring) it is the main loop in eval_frame().  For example, using
exceptions to signal "break" or "continue" statements.  We already mentionned
catching "EPython" exceptions raised by called functions to signal an
exception visible in the program we interpret (instead of the "if
result!=NULL" trick), and using a table of functions instead of a big switch.  
If I think about Psyco it is also the place where extra code must be added,
like checking the code object for an already-compiled version or collecting
statistics.  I guess this is also where special treatment is required for a
stackless-style CPS interpreter.

So I would say, as a general rule, let's be as Pythonic as we like for the
main loop, but let's keep globally close to the original C code for everything
else.  This is also crucial for compatibility. (Yes, I think we will also be
able to emit C code almost fully binary compatible with CPython and its
extension modules.)

A bientot,

Armin.