[pypy-dev] From PyPy to Psyco

Sun Jan 19 16:36:22 CET 2003

> Bootstrapping issues where quite widely discussed.  There are several
> valid approaches in my opinion.  I would say that we should currently
> stick to "Python in Python"'s first goal:  to have a Python interpreter up
> and running, written entierely in Python.

[snip]

> There are so many cool things to experiment, I can't wait to have (1) and
> (2) ready --- but I guess it's the same for all of us :-)

Actually, I believe this _last_ paragraph is the heart of the matter, not
the traditional bootstrapping issues.

Cockpit warning sounds: Whoop, whoop: war story! Whoop, whoop: war story!
:-)

By far the most important moment in Leo's 7-year history was the moment that
I saw how to begin to use Leo without actually having it.  Leo is a
combination of an outliner and traditional programming techiques.  I had a
vague notion that the combination was going to be effective, but I was
stuck: building an outliner is a _big_ task, and I wasn't sure exactly what
kind of outliner would work well with the programming constructs.

I was talking to Rebecca on the way back from a one-day ski outing, vaguely
mulling over the problems (she's not a programmer, and she is a great
listener :-) when it suddenly struck me that I could use the MORE outliner
as an "instant prototype".  I would just embed my experimental code in the
MORE outline.  I would then copy the outline to the clibboard by hand using
MORE's copy command.  Finally, I would write a little program (M2C for More
to C) to take the stuff off the clipboard and create proper C source code
that I could then compile.

Naturally, the first "outline-oriented" program I wrote in MORE was M2C.
This took a few hours.  I then simulated by hand the output of M2C on M2C.
The result was the C code for M2C.  Once I debugged M2C I was in business.
It all took less than 2 days.

The point is this:  I could use MORE _immediately_, even without actually
having M2C, and certainly without writing something as complex as the MORE
outliner.  As soon as I shifted my point of view I was able, within seconds,
to experiment with the combination of outlines and literate programming.
Within minutes all my doubts about the combination of the two techniques
vanished.  Within an hour I evolved a new kind of programming style that has
remained remarkably constant for over 7 years.  Within a few days I had a
working prototyping system.

(end of war story: transcript of cabin voice recorder ends)

I believe something this good can be done with psyco.  My ideas:

1. We now have some "safety proofs" in place that show that there is
absolutely no need to worry about performance during the initial
experimentation/prototyping phase of this project.

2. We already have a superb language tool, namely Python.  We must exploit
Python to the fullest.

3. We want a bootstrapping scheme that gets us (or rather Armin :-) going
_now_: preferably within hours or days, and at most within a week.

Putting these ideas together, I suggest the following:

1. Ignore all issues relating to the ultimate target language. In other
words, use Python as the target language.

2. Ignore all issues relating to speed.  Focus instead on the algorythms
that psyco will use and all the nifty experiments that Armin wants to run
yesturday.  Many of these experiments will involve looking at the target
code that gets produced from particular programs/byte codes.

3. Modify Python's logic (it may be possible to do with a simple patch
written in Python) so that Python looks for .pyp files and loads them as
needed before looking for .pyc files.  I believe this can be done very
quickly.

4.  Put nothing but Python code and data into the .pyp files!  The
"bootstrap loader" is the code that loads .pyp files.  It does one of the
following:

a. an import of the .pyp file (changing its type temporarily to .py
presumably)
b. an exec on the entire contents of the .pyp file.

In either case, some cleverness will be needed so that the import or exec
will execute psyco with the proper data.  This cleverness is the province of
the code emitters...

5. Modify psyco so it outputs Python code, not C or machine code.  The "code
emitters" write _whatever is useful_ to the .pyp file.  The code emitters
might use str(x) to dump psyco's x data structure.  At worst (if str could
not be used), the code emitters would be write the Python data structures
used by psyco to the .pyp file _as python code and data_.  As I said before,
some cleverness may be needed so that the Python code in the .pyp file ends
up executing psyco again, but this is "routine cleverness".

Armin is free to dump whatever Python code he wants into the .pyp file.
There is no need for formal specifications and no need for the Python code
to have a consistent format.  Just blast away.  Presumably, Armin will
design the .pyp file so that it is easy to see the results of his
experiments.

The advantages are these:

- This can all be done within hours--days at the most.
- There may be no need for further group design work.
- This ignores everything that should be ignored, namely all implementation
details.
- We get the highest-level, most flexible framework for experimentation,
namely the Python code and data in .pyp files.  This Python code is the
highest-level representation of the generated code, and it the clearest
possible way to see the results of experimentation.
- It is an immediate path to psyco in python.
- There is little or no need to create an interp in Python.

HTH :-)

Edward

P.S.  Yes, the results of experimentation will be Python code.  Yes, the
experimental code will run slower (maybe much slower) than .pyc files given
to the C interp.  That doesn't matter.  What _does_ matter is that Armin
will be up and running quickly with an extremely clear, powerful and
flexible experimental environment.

For example, the code given in another thread:

PyObject* my_function(PyObject* a, PyObject* b, PyObject* c)
{
  int r1, r2, r3;
  if (a->ob_type != &PyInt_Type) goto uncommon_case;
  if (b->ob_type != &PyInt_Type) goto uncommon_case;
  if (c->ob_type != &PyInt_Type) goto uncommon_case;
  r1 = ((PyIntObject*) a)->ob_ival;
  r2 = ((PyIntObject*) b)->ob_ival;
  r3 = ((PyIntObject*) c)->ob_ival;
  return PyInt_FromLong(r1+r2+r3);
}

will appear in the .pyp file as something like this:

def my_function__(a,b,c):
  if a.ob_type__ != PyInt_Type__: do_uncommon_case__()
  if b.ob_type__ != PyInt_Type__: do_uncommon_case__()
  if c.ob_type__ != PyInt_Type__: do_uncommon_case__()
  r1 = a.ob_ival__
  r2 = b.ob_ival__
  r3 = c.ob_ival__
  return PyInt_FromLong__(r1+r2+r3)

I've added trailing double underscores throughout just to indicate that I
don't understand any of the implementation details of psyco in psyco.

Presumably the generated prototype Python code will gather lots of
statistics.  The statistics _themselves_ can be written to the .pyp file as
plain Python data structures.

EKR
--------------------------------------------------------------------
Edward K. Ream   email:  edream at tds.net
Leo: Literate Editor with Outlines
Leo: http://personalpages.tds.net/~edream/front.html
--------------------------------------------------------------------