Microbenchmark: Summing over array of doubles
Christopher T King
squirrel at WPI.EDU
Sun Aug 1 19:54:58 EDT 2004
On 31 Jul 2004, Yaroslav Bulatov wrote:
> I'm doing intensive computation on arrays in Python, so if you have
> suggestions on Python/C solutions that could push the envelope, please
> let me know.
If you're doing mostly vector calculations as opposed to summing, I've
been doing some work on adding SIMD support to numarray, with pleasing
results (around 2x speedups). I've also done some work adding local
parallel processing support to numarray, with not-so-pleasing results
(mostly due to Python overhead).
Regarding your results:
numarray should be just as fast as the -O2 C version. I was puzzled at
first as to where the speed discrepancy came from, but the culprit is in
the -O2 flag: gcc -O2 noticies that sum is never used, and thus removes
the loop entirely. As a matter of fact, there isn't even any fadd
instruction in the assembler output:
call clock
movl %eax, %esi
movl $9999999, %ebx
.L11:
decl %ebx
jns .L11
subl $16, %esp
call clock
As you can see, the 21ms you're seeing is the time spent counting down
from 9,999,999 to 0. To obtain correct results, add a line such as
'printf("%f\n",sum);' after the main loop in the C version. This will
force gcc to leave the actual calculation in place and give you accurate
results.
The above fix will likely render numarray faster than the C version.
Using gcc -O3 rather than gcc -O2 will get fairer results, as this is what
numarray uses.
Is there any reason why in the Python/numarray version, you use
Numeric's RandomArray rather than numarray.random_array? It shouldn't
affect your results, but it would speed up initialization time a bit.
There are a few inefficiences in the pytime module (mostly involving
range() and *args/**kwargs), but I don't think they'll have too big of an
impact on your results. Instead, I'd suggest running the numarray/Numeric
tests using Psyco to remove much of the Python overhead.
For completeness, I'd also suggest both running the Java version using a
JIT compiler such as Kaffe, and compiling it natively using gcj (the
latter should approach the speed of C).
More information about the Python-list
mailing list