[pypy-dev] Towards Milestone 1
Armin Rigo
arigo at tunes.org
Wed Aug 10 22:47:22 CEST 2005
Hi all,
Here are some generally interesting comparisons between the current
status of translation in PyPy and what we set as our Milestone 1. This
is getting closer, both in term of work remaining to be done (good!) and
deadline promized to the EU (hurry!). (For the latter, I am refering to
http://codespeak.net/svn/pypy/funding/negotiations/part_b_2004_12_11.pdf
pp 80-81.)
Near the end of the Hildesheim sprint we acheived the first
self-contained translated PyPy. Shortly thereafter we could produce a
standalone executable instead of an extension module for CPython. In
this respect we have reached an important part of our first milestone.
Note that this is still in heavy development; I'm not sure the current
trunk successfully translates. It did, take my word :-) (BTW I propose
that the pypy-translation-snapshot always contains the latest revision
that is known to translate; we should have a script to automate taking a
new snapshot.)
There are some pieces missing from PyPy itself, most notably some C
extension modules of CPython and a few specific features (zip-imports,
weakrefs, ...). I will not discuss these here. The subject of this
e-mail is to see what is missing translation-wise:
* first it would be cool if LLVM could also translate PyPy. This seems
to be very close as well! (Eric should tell us more about it after
tomorrow's pypy-sync meeting)
* memory management: the C back-end uses refcounting; LLVM uses the
Boehm GC. It would be quite easy to have a flag to the C back-end to
use Boehm as well instead of refcounting. Carl Friedrich is working
on a more general GC framework; at the moment it is unclear (at least
to me) when and how easily we will be able to use it to add custom GCs
to our back-ends. (Carl should tell us more about it after tomorrow's
pypy-sync meeting)
* threading: implementing this requires a mixture of source-code-level
changes and help from the translation process, depending on the
approach taken. At the moment we have no threads at all. There are
two threading models that are relatively easy to implement by now, and
more to think about:
1) the Global Interpreter Lock (GIL), as in CPython. Only one thread
interprets Python bytecodes at a time. All other threads are
either blocked waiting for the GIL, or doing I/O. In CPython,
around each I/O function call, there are hand-coded lines to
release and re-acquire the GIL. In PyPy we can insert these lines
automatically at translation time, whenever we call one of the
hand-coded C functions in pypy/translator/c/src/ll_*.h. The source
code of PyPy needs a minor extension so that every 10 or 100 bytecodes
it calls a special function "now is a good time to release the GIL
to give the other threads a chance to run". Should be fairly
straightforward.
2) full Stackless. As long as some rather inefficient solution is
good enough, this is not so difficult. We can modify the C
back-end to generate functions differently, so that no C function
calls any other C function directly. Instead, there is a short
"main loop", along the lines of
while (1) {
next_fn = state->continuation_fn;
state = next_fn(state);
}
Each generated function returns a new 'state' structure whose
'continuation_fn' member contains the function to call next. The
'state' structure also contains arbitrary data like the arguments
we want to send to the next function. The net result is that the C
stack is no longer used. Then we can have "tasklets" which each
record a 'state' to run next, and switching to another tasklet is
done by a special C function that returns the 'state' of the other
tasklet. Getting the basics done should not be too difficult --
although it's an open door to endless involved optimization hacks,
as Christian knows :-)
3) there are also other less well-thought ideas for threading, mostly
along the lines of per-object locking. Here too, it should be
possible to get a not-too-bad result without having to insert locks
everywhere by hand. For example, we could reuse the proxy object
space mechanism for a LockingObjSpace which first acquires the lock
of each object involved in each space operation.
* finally, let's consider the translation process itself. It is
flexible in principle, but it's not really designed as a framework
with a well-defined API or hooks to plug into. However, it is
possible to hack here and there to change various translation aspects.
This is, to some extent, the whole idea of PyPy's flexibility: a
framework with hooks and APIs allows only so much experimentation;
sooner or later the ability to directly code things differently is
more powerful. Nevertheless I guess that some refactoring would help
to make localized translation aspects easier to change, e.g. by providing
more "policy" objects to control the process, or (for OOP enthusiasts)
by designing the classes with subclassing in mind.
For the short-term future, we have to draw priorities with this EU
Milestone 1 in mind. We promized "hooks into the internals to alter
translation aspects"; in my opinion (debate welcome!) spending time on
this right now is not a really good idea. Better spend it on the more
concrete issues of GC and threading, which in some sense already prove
that we have enough hooks to do some variations. In particular, the
Stackless version of PyPy should be an excellent and -- I believe --
quite reachable result. It's more than a minor variation: it's a
completely different kind of generated code. It would definitely show
that our translation process is capable of producing more than just a
dummy translation -- which is the whole point.
A bientot,
Armin.
More information about the Pypy-dev
mailing list