[pypy-dev] Towards Milestone 1

Wed Aug 10 22:47:22 CEST 2005

Hi all,

Here are some generally interesting comparisons between the current
status of translation in PyPy and what we set as our Milestone 1.  This
is getting closer, both in term of work remaining to be done (good!) and
deadline promized to the EU (hurry!).  (For the latter, I am refering to
http://codespeak.net/svn/pypy/funding/negotiations/part_b_2004_12_11.pdf
pp 80-81.)

Near the end of the Hildesheim sprint we acheived the first
self-contained translated PyPy.  Shortly thereafter we could produce a
standalone executable instead of an extension module for CPython.  In
this respect we have reached an important part of our first milestone.
Note that this is still in heavy development; I'm not sure the current
trunk successfully translates.  It did, take my word :-)  (BTW I propose
that the pypy-translation-snapshot always contains the latest revision
that is known to translate; we should have a script to automate taking a
new snapshot.)

There are some pieces missing from PyPy itself, most notably some C
extension modules of CPython and a few specific features (zip-imports,
weakrefs, ...).  I will not discuss these here.  The subject of this
e-mail is to see what is missing translation-wise:

* first it would be cool if LLVM could also translate PyPy.  This seems
  to be very close as well!  (Eric should tell us more about it after
  tomorrow's pypy-sync meeting)

* memory management: the C back-end uses refcounting; LLVM uses the
  Boehm GC.  It would be quite easy to have a flag to the C back-end to
  use Boehm as well instead of refcounting.  Carl Friedrich is working
  on a more general GC framework; at the moment it is unclear (at least
  to me) when and how easily we will be able to use it to add custom GCs
  to our back-ends.  (Carl should tell us more about it after tomorrow's
  pypy-sync meeting)

* threading: implementing this requires a mixture of source-code-level
  changes and help from the translation process, depending on the
  approach taken.  At the moment we have no threads at all.  There are
  two threading models that are relatively easy to implement by now, and
  more to think about:

  1) the Global Interpreter Lock (GIL), as in CPython.  Only one thread
     interprets Python bytecodes at a time.  All other threads are
     either blocked waiting for the GIL, or doing I/O.  In CPython,
     around each I/O function call, there are hand-coded lines to
     release and re-acquire the GIL.  In PyPy we can insert these lines
     automatically at translation time, whenever we call one of the
     hand-coded C functions in pypy/translator/c/src/ll_*.h.  The source
     code of PyPy needs a minor extension so that every 10 or 100 bytecodes
     it calls a special function "now is a good time to release the GIL
     to give the other threads a chance to run".  Should be fairly
     straightforward.

  2) full Stackless.  As long as some rather inefficient solution is
     good enough, this is not so difficult.  We can modify the C
     back-end to generate functions differently, so that no C function
     calls any other C function directly.  Instead, there is a short
     "main loop", along the lines of

       while (1) {
           next_fn = state->continuation_fn;
           state = next_fn(state);
       }

     Each generated function returns a new 'state' structure whose
     'continuation_fn' member contains the function to call next.  The
     'state' structure also contains arbitrary data like the arguments
     we want to send to the next function.  The net result is that the C
     stack is no longer used.  Then we can have "tasklets" which each
     record a 'state' to run next, and switching to another tasklet is
     done by a special C function that returns the 'state' of the other
     tasklet.  Getting the basics done should not be too difficult -- 
     although it's an open door to endless involved optimization hacks,
     as Christian knows :-)

  3) there are also other less well-thought ideas for threading, mostly
     along the lines of per-object locking.  Here too, it should be
     possible to get a not-too-bad result without having to insert locks
     everywhere by hand.  For example, we could reuse the proxy object
     space mechanism for a LockingObjSpace which first acquires the lock
     of each object involved in each space operation.

* finally, let's consider the translation process itself.  It is
  flexible in principle, but it's not really designed as a framework
  with a well-defined API or hooks to plug into.  However, it is
  possible to hack here and there to change various translation aspects.
  This is, to some extent, the whole idea of PyPy's flexibility: a
  framework with hooks and APIs allows only so much experimentation;
  sooner or later the ability to directly code things differently is
  more powerful.  Nevertheless I guess that some refactoring would help
  to make localized translation aspects easier to change, e.g. by providing
  more "policy" objects to control the process, or (for OOP enthusiasts)
  by designing the classes with subclassing in mind.

For the short-term future, we have to draw priorities with this EU
Milestone 1 in mind.  We promized "hooks into the internals to alter
translation aspects"; in my opinion (debate welcome!) spending time on
this right now is not a really good idea.  Better spend it on the more
concrete issues of GC and threading, which in some sense already prove
that we have enough hooks to do some variations.  In particular, the
Stackless version of PyPy should be an excellent and -- I believe --
quite reachable result.  It's more than a minor variation: it's a
completely different kind of generated code.  It would definitely show
that our translation process is capable of producing more than just a
dummy translation -- which is the whole point.

A bientot,

Armin.