[Numpy-discussion] Using multiprocessing (shared memory) with numpy array multiplication

Wed Jun 15 13:38:45 EDT 2011

Perhaps it is time to write somthing in the SciPy cookbook about 
parallel computing with NumPy? It seems to be certain problems that are 
discussed again and again. These are some issues that come to mind (I'm 
sure there is more):

- The difference between I/O bound, memory bound, and CPU bound work.
- Why NumPy code is usually memory bound, and what that means.
- The problem with false-sharing in cache lines (including Python refcounts)
- What the GIL is and what it's not (real information instead of FUD)
- Linear algebra with optimized BLAS and LAPACK libraries.
- Parallel FFTs (FFTW, MKL, ACML)
- Parallel PRNGs (and algorithmic pitfalls)
- Autovectorizing Fortran compilers
- OpenMP with C, C++ or Fortran (and using it from Python)
- Python threads and releasing the GIL
- Python threads in Cython
- native threads in Cython
- multiprocessing with ordinary NumPy arrays
- multiprocessing with shared memory
- MPI with Python (mpi4py)
- os.fork and copy-on-write memory (including the problem with Python 
refcounts)
- Using GPUs with Python, including ACML-GPU, PyOpenCL and PyCUDA.

Sturla