[Numpy-discussion] IDL vs Python parallel computing

Sturla Molden sturla.molden at gmail.com
Wed May 7 14:11:13 EDT 2014

On 03/05/14 23:56, Siegfried Gonzi wrote:
 > I noticed IDL uses at least 400% (4 processors or cores) out of the box
 > for simple things like reading and processing files, calculating the
 > mean etc.

The DMA controller is working at its own pace, regardless of what the 
CPU is doing. You cannot get data faster off the disk by burning the 
CPU. If you are seeing 100 % CPU usage while doing file i/o there is 
something very bad going on. If you did this to an i/o intensive server 
it would go up in a ball of smoke... The purpose of high-performance 
asynchronous i/o systems such as epoll, kqueue, IOCP is actually to keep 
the CPU usage to a minimum.

Also there are computations where using multiple processors do not help. 
First, there is a certain overhead due to thread synchronization and 
scheduling the workload. Thus you want have a certain amount of work 
before you consider to invoke multiple threads. Seconds, hierachical 
memory also makes it mandatory to avoid that the threads share the same 
objects in cache. Otherwise the performance will degrade as more threads 
are added.

A more technical answer is that NumPy's internals does not play very 
nicely with multithreading. For examples the array iterators used in 
ufuncs store an internal state. Multithreading would imply an excessive 
contention for this state, as well as induce false sharing of the 
iterator object. Therefore, a multithreaded NumPy would have performance 
problems due to synchronization as well as hierachical memory 
collisions. Adding multithreading support to the current NumPy core 
would just degrade the performance. NumPy will not be able to use 
multithreading efficiently unless we redesign the iterators in NumPy 
core. That is a massive undertaking which prbably means rewriting most 
of NumPy's core C code. A better strategy would be to monkey-patch some 
of the more common ufuncs with multithreaded versions.

 > I have never seen this happening with numpy except for the linalgebra
 > stuff (e.g lapack).
 > Any comments?

The BLAS/LAPACK library can use multithreading internally, depending on 
which BLAS/LAPACK library you use.


More information about the NumPy-Discussion mailing list