Microbenchmark: Summing over array of doubles
bulatov at engr.orst.edu
Tue Aug 3 01:51:46 CEST 2004
Christopher T King <squirrel at WPI.EDU> wrote in message news:<Pine.LNX.4.44.0408011847510.21160-100000 at ccc4.wpi.edu>...
> On 31 Jul 2004, Yaroslav Bulatov wrote:
> > I'm doing intensive computation on arrays in Python, so if you have
> > suggestions on Python/C solutions that could push the envelope, please
> > let me know.
> If you're doing mostly vector calculations as opposed to summing, I've
> been doing some work on adding SIMD support to numarray, with pleasing
> results (around 2x speedups). I've also done some work adding local
> parallel processing support to numarray, with not-so-pleasing results
> (mostly due to Python overhead).
> Regarding your results:
> numarray should be just as fast as the -O2 C version. I was puzzled at
> first as to where the speed discrepancy came from, but the culprit is in
> the -O2 flag: gcc -O2 noticies that sum is never used, and thus removes
> the loop entirely. As a matter of fact, there isn't even any fadd
> instruction in the assembler output:
> call clock
> movl %eax, %esi
> movl $9999999, %ebx
> decl %ebx
> jns .L11
> subl $16, %esp
> call clock
> As you can see, the 21ms you're seeing is the time spent counting down
> from 9,999,999 to 0. To obtain correct results, add a line such as
> 'printf("%f\n",sum);' after the main loop in the C version. This will
> force gcc to leave the actual calculation in place and give you accurate
> The above fix will likely render numarray faster than the C version.
> Using gcc -O3 rather than gcc -O2 will get fairer results, as this is what
> numarray uses.
You are right, how silly of me! Fixing the script now results in 130
millis mean, 8.42 millis standard deviation, which is slower than
numarray (104, 2.6 respectively). I wonder why numarray gives faster
results on such a simple task?
> Is there any reason why in the Python/numarray version, you use
> Numeric's RandomArray rather than numarray.random_array? It shouldn't
> affect your results, but it would speed up initialization time a bit.
There isn't a good reason, I simply didn't know about
> There are a few inefficiences in the pytime module (mostly involving
> range() and *args/**kwargs), but I don't think they'll have too big of an
> impact on your results. Instead, I'd suggest running the numarray/Numeric
> tests using Psyco to remove much of the Python overhead.
> For completeness, I'd also suggest both running the Java version using a
> JIT compiler such as Kaffe, and compiling it natively using gcj (the
> latter should approach the speed of C).
More information about the Python-list