[Numpy-discussion] Improving Python+MPI import performance

Fri Jan 13 18:28:39 EST 2012

This paper also repeats a common mistake about the GIL:

"A future challenge is the increasing number of CPU cores per node, 
which is normally addressed by hybrid thread and message passing based 
parallelization. Whereas message passing can be used transparently by 
both on Python and C level, the global interpreter lock in CPython 
limits the thread based parallelization to the C-extensions only. We are 
currently investigating hybrid OpenMP/MPI implementation with the hope 
that limiting threading to only C-extension provides enough performance."

This is NOT true.

Python threads are native OS threads. They can be used for parallel 
computing on multi-core CPUs. The only requirement is that the Python 
code calls a C extension that releases the GIL. We can use threads in C 
or Python code: OpenMP and threading.Thread perform equally well, but if 
we use threading.Thread the GIL must be released for parallel execution. 
OpenMP is typically better for fine-grained parallelism in C code and 
threading.Thread is better for course-grained parallelism in Python 
code. The latter is also where mpi4py and multiprocessing can be used.


