[BangPypers] PyPy outperforms C in microbenchmark

Wed Aug 3 17:50:58 CEST 2011

On Wed, Aug 3, 2011 at 1:22 PM, Noufal Ibrahim <noufal at gmail.com> wrote:
> Anand Balachandran Pillai <abpillai at gmail.com> writes:
>
>> On Wed, Aug 3, 2011 at 12:50 PM, Noufal Ibrahim <noufal at gmail.com> wrote:
>>
>>>
>>> PyPy outperforms C in a little benchmark capitalising on gcc's inability
>>> to optimise across files.
>>>
>>>
>>> http://morepypy.blogspot.com/2011/08/pypy-is-faster-than-c-again-string.html
>>>
>>
>>  Not sure if that is the only factor playing here. I am pretty sure
>>  the malloc/free every cycle is killing the C program.
>
> [...]
>
> Well, the original benchmark without dynamic allocation is slower that
> the PyPy version too. Your point however is sound.
>
In this case its the difficulty with inlining rather than malloc
(though there could be other situations where a C program could appear
not as fast as others due to malloc/free taking time). Java hotspot
depends to a fair extent on inlining and I presume the same is true
for pypy. Given that hotspot can inline method calls, while gcc is
unable to inline calls into a different library (unless declared as
inline in the headers) thats clearly an area where bytecode based
runtimes can perform superior optimisation. Yet some of it are
advantages that may not remain under code written a little
differently.

I sometimes find people falsely assuming that because some of the
newer JVM based languages compile to byte code they will be as fast as
java - yet sometimes code slows down by orders of magnitude, because
the generated bytecode is difficult to inline.  For an interesting
detailed account of the inlining matter see
http://www.azulsystems.com/blog/cliff/2011-04-04-fixing-the-inlining-problem
I would imagine, similar questions will crop up in the context of PyPy
VM and what constructs of python lend themselves to easy inlining
(functions ??) vs. which ones don't (dynamic dispatch calls on a deep
object tree ??)

Good to see pypy being able to inline well at least in a specific
context. I am sure a well designed garbage collector will also often
outperform naively written free/malloc. So I do hope to see lots of
performance improvements thanks to pypy over a period of time (I
actually am even more keen to see the non-GIL implementation, though
thats likely to be some time away). Definitely exciting developments
(this is but one amongst many other benchmarks they've been publishing
lately).

Dhananjay