
A Friday 22 May 2009 13:59:17 Andrew Friedley escrigué:
Using multiple cores is pretty easy for element-wise ufuncs; no communication needs to occur and the work partitioning is trivial. And actually I've found with some initial testing that multiple cores does still help when you are memory bound. I don't fully understand why yet, though I have some ideas. One reason is multiple memory controllers due to multiple sockets (ie opteron).
Yeah. I think this must likely be the reason. If, as in your case, you have several independent paths from different processors to your data, then you can achieve speed-ups even if you are having a memory bound in a one-processor scenario.
Another is that each thread is pulling memory from a different bank, utilizing more bandwidth than a single sequential thread could. However if that's the case, we could possibly come up with code for a single thread that achieves (nearly) the same additional throughput..
Well, I don't think you can achieve important speed-ups in this case, but experimenting never hurts :) Good luck! -- Francesc Alted