[Numpy-discussion] OT: performance in C extension; OpenMP, or SSE ?

Sturla Molden sturla at molden.no
Sat Feb 19 18:01:59 EST 2011


Den 19.02.2011 18:13, skrev Sebastian Haase:
> Can one assume that the cache line is always a few mega bytes ?

Don't confuse the size of a cache with the size of a cache line.

A "cache line" (which is the unit that gets marked dirty) is typically 
8-512 bytes.

Make sure your OpenMP threads stay off each others cache lines, and it 
will scale nicely.

For example, you can specify a chunk-size to the "schedule" pragma to 
force them apart; it applies to the loop index, so you must do 
calculations for the block size on the shared write buffer. If you use 
reduction(+:dist) the write buffer will be completely private, but you 
get summations after the loop. That is the limited amount of control you 
get with OpenMP.

pthreads will give you better control than OpenMP, but are messy and 
painful to work with.

With MPI you have separate processes, so everything is completely 
isolated. It's more difficult to program and debug than OpenMP code, but 
will usually perform better.

Sturla





More information about the NumPy-Discussion mailing list