[pypy-svn] r19686 - pypy/dist/pypy/doc
arigo at codespeak.net
arigo at codespeak.net
Wed Nov 9 17:35:01 CET 2005
Date: Wed Nov 9 17:35:00 2005
New Revision: 19686
Added an Experimental Results chapter.
--- pypy/dist/pypy/doc/draft-low-level-encapsulation.txt (original)
+++ pypy/dist/pypy/doc/draft-low-level-encapsulation.txt Wed Nov 9 17:35:00 2005
@@ -114,7 +114,7 @@
can fine-tune all these interactions freely, without having to rewrite
the whole code all the time but only modifying the C backend. So far,
this allowed us to find a style that does not hinder the compiler
-optimisations and so has only a minor impact on performance (10%) in the
+optimisations and so has only a minor impact on performance in the
XXX Start documenting it and link to it from here!
@@ -225,6 +225,70 @@
+All the aspects described in the previous chapter have been successfully
+implemented and are available since the release 0.7 or 0.8 of PyPy.
+We have conducted preliminary experimental measures of the performance
+impact of enabling each of these features in the compiled PyPy
+interpreter. We present below the current results as of October 2005.
+Most figures appear to vary from machine to machine. Given that the
+generated code is large (it produce a binary of 5.6MB on a Linux
+Pentium), there might be locality and code ordering issues that cause
+important cache effects.
+We have not particularly optimised any of these aspects yet. Our goal
+is primarily to prove that the whole approach is worthwhile, and rely on
+future work and push for external contributions to implement
+state-of-the-art techniques in each of these domains.
+ Producing Stackless-style C code currently means that all the
+ functions of the PyPy interpreter use the new style. The current
+ performance impact is to make PyPy slower by 10% to 20% depending on
+ the application program being interpreted. A couple of
+ optimisations are possible to should reduce this figure a bit. In
+ particular, leaf functions of the call graph -- and more generally
+ all functions that never call any other function that may raise the
+ "unwind" exception -- do not have to be generated as Stackless-style
+ C code. We expect the rest of the performance impact to be mainly
+ caused by the increase of size of the generated executable (+XXX%).
+ No experimental data available so far. We are working on removing a
+ minor technical restriction that prevents our translation toolchain
+ from handling this case.
+ The [Boehm] GC is well-optimised and produces excellent results. By
+ comparison, using reference counting instead makes the interpreter
+ twice as slow. This is almost certainly due to the naive approach
+ to reference counting used so far, which updates the counter far
+ more often than theoretically necessary; we also still have a lot of
+ objects that would theoretically not need a reference counter,
+ either because they are short-lived or because we can prove that
+ they are "owned" by another object and can share its lifetime. In
+ the long run, it will be interesting to see how far this figure can
+ be reduced, given past experiences with CPython which seem to show
+ that reference counting is a viable idea for Python interpreters.
+ No experimental data available so far. Just enabling threads
+ currently creates an overhead that hides the real costs of locking.
+ When translated to C code, the Thunk object space has a global
+ performance impact of 5%. The executable is 12% bigger (probably
+ due to the arguably excessive inlining we perform).
More information about the Pypy-commit