[Numpy-discussion] OT: performance in C extension; OpenMP, or SSE ?
Sturla Molden
sturla at molden.no
Sat Feb 19 18:01:59 EST 2011
Den 19.02.2011 18:13, skrev Sebastian Haase:
> Can one assume that the cache line is always a few mega bytes ?
Don't confuse the size of a cache with the size of a cache line.
A "cache line" (which is the unit that gets marked dirty) is typically
8-512 bytes.
Make sure your OpenMP threads stay off each others cache lines, and it
will scale nicely.
For example, you can specify a chunk-size to the "schedule" pragma to
force them apart; it applies to the loop index, so you must do
calculations for the block size on the shared write buffer. If you use
reduction(+:dist) the write buffer will be completely private, but you
get summations after the loop. That is the limited amount of control you
get with OpenMP.
pthreads will give you better control than OpenMP, but are messy and
painful to work with.
With MPI you have separate processes, so everything is completely
isolated. It's more difficult to program and debug than OpenMP code, but
will usually perform better.
Sturla
More information about the NumPy-Discussion
mailing list