[Tutor] Loop comparison

Stefan Behnel stefan_ml at behnel.de
Fri Apr 16 10:52:31 CEST 2010

Alan Gauld, 16.04.2010 10:29:
> "Stefan Behnel" wrote
>> import cython
>> @cython.locals(result=cython.longlong, i=cython.longlong)
>> def add():
>>     result = 0
>>     for i in xrange(1000000000):
>>         result += i
>>     return result
>> print add()
>> This runs in less than half a second on my machine, including the time
>> to launch the CPython interpreter. I doubt that the JVM can even start
>> up in that time.
> I'm astonished at these results. What kind of C are you using. Even in
> assembler I'd expect the loop/sum to take at least 3s
> on a quad core 3GHz box.
> Or is cython doing the precalculation optimisations you mentioned?

Nothing surprising in the C code:

   __pyx_v_result = 0;
   for (__pyx_t_1 = 0; __pyx_t_1 < 1000000000; __pyx_t_1+=1) {
     __pyx_v_i = __pyx_t_1;
     __pyx_v_result += __pyx_v_i;

> And if so when does it do them? Because surely, at some stage, it still
> has to crank the numbers?

Cython does a bit of constant folding (which it can benefit from on 
internal optimisation decisions), but apart from that, the mantra is to 
just show the C compiler explicit C code that it can understand well, and 
let it do its job.

> (We can of course do some fancy math to speed this particular sum up
> since the result for any power of ten has a common pattern, but I
> wouldn't expect the compiler optimiser to be that clever)

In this particular case, the C compiler really stores the end result in the 
binary module, so I assume that it simply applies Little Gauß as an 
optimisation in combination with loop variable aliasing.


More information about the Tutor mailing list