[BangPypers] PyPy outperforms C in microbenchmark

Dhananjay Nene dhananjay.nene at gmail.com
Wed Aug 3 19:10:10 CEST 2011


On Wed, Aug 3, 2011 at 9:58 PM, Gopalakrishnan Subramani
<gopalakrishnan.subramani at gmail.com> wrote:
> I could not understand the PyPy intention in having another run-time.
>
> Can we see having PyPy running Python programs,
>
> 1.  As a challenge? (A language can have its own runtime with little
> improved performance) or
> 2.  Potential for future to replace/co-exists CPython forever with strong
> community support?
>
> I like to understand, nowhere I am arguing. I don't code full time in
> Python, but I do code till I sleep like reading.

For starters, you might want to check out
http://stackoverflow.com/questions/2970108/pypy-what-is-all-the-buzz-about
(I thought I had seen a much more detailed articulation of the reasons
for pypy, but couldn't quickly enough google for it). A Virtual
Machine based approach is a rather different approach which has over
the last decade started to show strong benefits (especially thanks to
the JVM really making strong advances in the area of virtual machine
engineering, and advanced memory management / garbage collection
algorithms).

There is often a chain of thought that manually handcrafted code is
superior to generated code (in many ways higher level languages often
logically generate low level languages). However as the amount of code
that is written starts getting substantial, the sheer amount of logic
we implement often overwhelms any ability to micro-optimise by
carefully handcrafting code. Moreover as some control is taken away
from programmers (eg, malloc/free), eventually good machine optimised
algorithms can outperform manual code especially when done on a large
scale (eg. generational garbage collectors do a wonderful job on the
JVM). Most importantly the ability of virtual machines to dynamically
monitor program execution and then generate native code for only the
most frequently used parts of the code allows a wonderful balance. eg.
in the article that set of this thread sprintf could not be inlined.
Assuming it was feasible to inline it - it would effectively get
inlined everywhere resulting in a much larger code bloat. Dynamic VMs
can generate native inlined code only for the most frequently used
call-sites (places where sprintf is called) of sprintf (as a arbitrary
hypothetical case say only 5 out of 200 locations in the code - but
these 5 effectively make up 99% of the runtime number of calls to
sprintf). Finally because virtual machines can inspect the actual code
behaviour at runtime make optimisations at runtime which are virtually
impossible at compile or link time (eg. see
http://en.wikipedia.org/wiki/Inline_caching for optimising dispatch to
virtual methods).

All put together there are a number of advantages that this approach
has the potential to offer. In many ways PyPy is starting to show
promise of delivering on these advantages now. However the design of a
virtual machine based approach cannot be an incremental bolt-on to
traditional runtimes, for all practical purposes, it needs a complete
rewrite.


More information about the BangPypers mailing list