Please enlighten me about PyPy
Carl Friedrich Bolz
cfbolz at gmx.de
Thu Dec 22 15:07:04 CET 2005
Luis M. González wrote:
> Well, first and foremost, when I said that I leave the door open for
> further explanations, I meant explanations by other people more
> knowlegeable than me :-)
You did a very good job to describe what PyPy is in this and the
previous mail! I will try to give a justification about why PyPy is done
how it is done.
>>Now I'm confused again--psyco translates Python into machine code--so
>>how does this tie in with the fact that the interpreter written in
>>Python is translated into another language (in this case C?)
> No, the psyco-like techniques come later, after the rpython interpreter
> is auto-translated to c. They are not used to translate the interpreter
> to c (this is done through a tool that uses type inference, flow-graph
> anailisis, etc, etc).
> Getting the rpython auto-translated to C is the first goal of the
> project (already achieved).
> That means having a minimal core, writen in a low level language (c for
> speed) that hasn't been writen by hand, but auto-translated to c from
> the python source -> much easier to improve and maintain from now on.
Indeed. The fact that the core is written in RPython has a number of
The first point is indeed maintainability: Python is a lot more flexible
and more concise than C, so changes and enhancements become much easier.
Another point is that our interpreter can not only be translated, but
also run on top of CPython! This makes testing very fast, because you
don't need to translate the interpreter first before testing it -- just
run in on CPython.
The most important advantage of writing the interpreter in Python is
that of flexibility. In CPython a lot of implementation choices are done
rather early: The choice to use C as the platform the interpreter works
on, the choice to use reference counting (which is reflected
everywhere), the choice to have a GIL, the choice to not be stackless.
All these choices are deeply embedded into the implementation and are
rather hard to change. Not so in PyPy. Since the interpreter is written
in Python and then translated, the translation process can change
different aspects of the interpreter while translating it. The
interpreter implementation does not need to concern itself with all
One example of this is that we are not restricted to translate out
interpreter to C. There are currently backends to translate RPython to
Smalltalk and a Java backend. That means that we could potentially
generate something that is similar to Jython -- which is not entirely
true, because the interfacing with Java libraries would not work, but
pypy-java would run on the JVM.
Another example is that we can choose at translation time which garbage
collection strategy to use. At the moment we even have two different
garbage collectors implemented: one simple reference counting one and
one that uses the Boehm garbage collector. We have also started (as part
of my Summer of Code project) an experimental garbage collection
framework which allow us to implement garbage collectors in Python. This
framework is not finished yet and needs to be integrated with the rest
In a similar manner we hope to make different threading models choosable
at translation time.
> Now this is both, a conclusion and a question (because I also have many
> doubts about it :-):
> At this moment, the traslated python-in-python version is, or intends
> to be, something more or less equivalenet to Cpython in terms of
> performance. Because it is in essence almost the same thing: another C
> python implementation. The only difference is that while Cpython was
> written by hand, pypy was writen in python and auto-translated to C.
Yes, at the moment pypy-c is rather similar to CPython, although slower
(a bit better than ten times slower than CPython at the moment), except
that we can already choose between different aspects (see above).
> What remains to be done now is implementing the psyco-like techniques
> for improving speed (amongst many other things, like stackless, etc).
Stackless is already implemented. In fact, it took around three days to
do this at the Paris sprint :-). It is another aspect that we can choose
at translation time (that means you can also choose to not be stackless
if you want to). With stackless we can support arbitrarily deep
recursion (until the heap is full, that is). We don't export any
task-switching capabilities to the user, yet.
About the psyco-like JIT techniques: we hope to be able to not write the
JIT by hand but to generate it as part of the translation process. But
this is at the moment still quite unclear, in heavy flux and nowhere
near finished yet.
Carl Friedrich Bolz
More information about the Python-list