
A Thursday 17 February 2011 02:24:33 Eric Carlson escrigué:
Hello Francesc, The problem appears to related to my lack of optimization in the compilation. If I use
gcc -O3 -c my_lib.c -fPIC -fopenmp -ffast-math
the C executable and ctypes/python versions behave almost identically.
Ahh, good to know.
Getting decent behavior takes some thought, though, far from the incredible almost-automatic behavior of numexpr.
numexpr uses a very simple method for distributing load among the threads, so I suppose this is why it is fast. The drawback is that numexpr only can be used for operations implying the same index (i.e. like a+b**3, but not for things like a[i+1]+b[i]**3). For other operations openmp is probably the best option (I should say the *easiest* option) right now.
Now I've got to figure out how to scale up a bunch of vector adds/multiplies. Neither numexpr or openmp get you very far with a bunch of "z=a*x+b*y"-type calcs.
For these sort of computations you are most probably hitting the memory bandwidth wall, so you are out of luck (at least until processors will be fast enough to allow compression to actually reduce the time spent in computations). Cheers, -- Francesc Alted