[Numpy-discussion] OT: performance in C extension; OpenMP, or SSE ?

Eric Carlson ecarlson at eng.ua.edu
Thu Feb 17 20:59:24 EST 2011

For 4 cores, on your system, your conclusion makes some sense. That 
said, I played around with this on both a core 2 duo and the 12 core 
system. For the 12-core system, on my tests the 0 case ran extremely 
close to the 2-thread case for all my sizes.

The core 2 duo runs windows 7, and after downloading pthreadsGC2.dll 
from the pthreads project, I was able to use openmp under a year-old 
(32-bit) pythonxy distribution with mingw. The result, 0 threads come in 
slightly faster than one thread, .00102 versus .00106, and 2 threads 
took .00060.

My current theory is that gcc under linux uses some background trick to 
get two thread-like streams going. As I assess scale-up under linux, I 
will need to consider this behavior.

Creating optimal codes with OpenMP certainly requires a considerable 
commitment. Given the problem-specific fine tuning required, I would not 
expect much gain in general-purpose routines. In specific routines like 
cdist, it might make more sense. I talked to a Dell HPC rep today, and 
he said that squeezing out an extra 15% performance boost on an Intel 
CPU was a pleasant surprise, so the 30% improvement is maybe not so bad.


More information about the NumPy-Discussion mailing list