Python Front-end to GCC
Steven D'Aprano
steve+comp.lang.python at pearwood.info
Tue Oct 22 08:00:37 EDT 2013
On Tue, 22 Oct 2013 10:14:16 +0100, Oscar Benjamin wrote:
> On 22 October 2013 00:41, Steven D'Aprano
> <steve+comp.lang.python at pearwood.info> wrote:
>> On Mon, 21 Oct 2013 10:55:10 +0100, Oscar Benjamin wrote:
>>
>>> On 21 October 2013 08:46, Steven D'Aprano <steve at pearwood.info> wrote:
>>
>>>> On the contrary, you have that backwards. An optimizing JIT compiler
>>>> can often produce much more efficient, heavily optimized code than a
>>>> static AOT compiler, and at the very least they can optimize
>>>> different things than a static compiler can. This is why very few
>>>> people think that, in the long run, Nuitka can be as fast as PyPy,
>>>> and why PyPy's ultimate aim to be "faster than C" is not moonbeams:
>>>
>>> That may be true but both the examples below are spurious at best. A
>>> decent AOT compiler would reduce both programs to the NULL program as
>>> noted by haypo:
>>> http://morepypy.blogspot.co.uk/2011/02/pypy-faster-than-c-on-
carefully-
>> crafted.html?showComment=1297205903746#c2530451800553246683
Keep in mind that the post's author, Maciej Fijalkowski, is not a native
English speaker (to the best of my knowledge). You or I would probably
have called the post a *contrived* example, not a "carefully crafted one"
-- the meaning is the same, but the connotations are different.
Micro-benchmarks are mostly of theoretical interest, and contrived ones
even more so, but still of interest. One needs to be careful not to read
too much into them, but also not to read too little into them.
>> Are you suggesting that gcc is not a decent compiler?
>
> No.
>
>> If "optimize away
>> to the null program" is such an obvious thing to do, why doesn't the
>> most popular C compiler in the [FOSS] world do it?
>
> It does if you pass the appropriate optimisation setting (as shown in
> haypo's comment). I should have been clearer.
"C can do nothing 10 times faster than Python!" -- well, okay, but what
does that tell you about my long-running web server app? Benchmarks at
the best of time are only suggestive, benchmarks for null programs are
even less useful.
The very next comment after Haypo is an answer to his observation:
[quote]
@haypo print the result so the loop don't get removed as dead
code. Besides, the problem is really the fact that's -flto is
unfair since python imports more resemble shared libraries
than statically-compiled files.
I'll be honest, I don't know enough C to really judge that claim, but I
have noticed that benchmarks rarely compare apples and oranges,
especially when C is involved. You can't eliminate all the differences
between the code being generated, or at least not easily, since different
languages have deep-seated differences in semantics that can't be
entirely eliminated. But you should at least make some effort to compare
code that does the same thing the same way.
Here's an example: responding to a benchmark showing a Haskell compiler
generating faster code than a C compiler, somebody re-wrote the C code
and got the opposite result:
http://jacquesmattheij.com/when-haskell-is-not-faster-than-c
Again, I can't judge the validity of all of the changes he made, but one
stood out like a sore thumb:
[quote]
C does not require you to set static global arrays to ‘0’, so the
for loop in the main function can go...
Wait a minute... Haskell, I'm pretty sure, zeroes memory. C doesn't. So
the C code is now doing less work. Yes, your C compiler will allow you to
avoid zeroing memory before using it, and you'll save some time
initially. But eventually[1] you will need to fix the security
vulnerability by adding code to zero the memory, exactly as Haskell and
other more secure languages already do. So *not* zeroing the memory is
cheating. It's not something you'd do in real code, not if you care about
security and correctness. Even if you don't care about security, you
should care about benchmarking both languages performing the same amount
of work.
Now, I may be completely off-base here. Some Haskell expert may chime up
to say that Haskell does not, in fact, zero memory. But it does
*something*, I'm sure, perhaps it tracks what memory is undefined and
prevents reads from it, or something. Whatever it does, if it does it at
runtime, the C benchmark better do the same thing, or it's an unfair
comparison:
"Safely drive to the mall obeying all speed limits and traffic signals in
a Chevy Volt, versus speed down the road running red lights and stop
signs in a Ford Taurus" -- would it be any surprise that the Taurus is
faster?
[...]
> They are more than carefully crafted. They are useless and misleading.
> It's reasonable to contrive of a simple CPU-intensive programming
> problem for benchmarking. But the program should do *something* even if
> it is contrived. Both programs here consist *entirely* of dead code.
But since the dead code is *not* eliminated, it is actually executed. If
it's executed, it's not really dead, is it? Does it really matter that
you don't do anything with the result? I'm with Maciej on this one --
*executing* the code given is faster in PyPy than in C, at least for this
C compiler. Maybe C is faster to not execute it. Is that really an
interesting benchmark? "C does nothing ten times faster than PyPy does
something!"
Given a sufficiently advanced static analyser, PyPy could probably
special-case programs that do nothing. Then you're in a race to compare
the speed at which the PyPy runtime environment can start up and do
nothing, versus a stand-alone executable that has to start up and do
nothing. If this is a benchmark that people care about, I suggest they
need to get out more :-)
Ultimately, this is an argument as what counts as a fair apples-to-apples
comparison, and what doesn't. Some people consider that for a fair test,
the code has to actually be executed. If you optimize away code and don't
execute it, that's not a good benchmark. I agree with them. You don't. I
can see both sides of the argument, and think that they both have
validity, but on balance agree with the PyPy guys here: a compiler that
optimizes away "for i = 1 to 1000: pass" to do-nothing is useful, but if
you wanted to find out the runtime cost of a for-loop, you would surely
prefer to disable that optimization and time how long it takes the for
loop to actually run.
The actual point that the PyPy developers keep making is that a JIT
compiler can use runtime information to perform optimizations which a
static compiler like gcc cannot, and I haven't seen anyone dispute that
point. More in the comments here:
[quote]
The point here is not that the Python implementation of
formatting is better than the C standard library, but that
dynamic optimisation can make a big difference. The first
time the formatting operator is called its format string is
parsed and assembly code for assembling the output generated.
The next 999999 times that assembly code is used without
doing the parsing step. Even if sprintf were defined locally,
a static compiler can’t optimise away the parsing step, so
that work is done redundantly every time around the loop.
http://morepypy.blogspot.com/2011/08/pypy-is-faster-than-c-again-string.html?showComment=1312357475889#c6708170690935286644
Also possibly of interest:
http://beza1e1.tuxen.de/articles/faster_than_C.html
[1] Probably not until after the Zero Day exploit is released.
--
Steven
More information about the Python-list
mailing list