On 03/12/2015 10:15 AM, Gregor Thalhammer wrote:
Another note, numpy makes it easy to provide new ufuncs, see
http://docs.scipy.org/doc/numpy-dev/user/c-info.ufunc-tutorial.html
from a C function that operates on 1D arrays, but this function needs to
support arbitrary spacing (stride) between the items. Unfortunately, to
achieve good performance, vector math libraries often expect that the
items are laid out contiguously in memory. MKL/VML is a notable
exception. So for non contiguous in- or output arrays you might need to
copy the data to a buffer, which likely kills large amounts of the
performance gain.
The elementary functions are very slow even compared to memory access,
they take in the orders of hundreds to tens of thousand cycles to
complete (depending on range and required accuracy).
Even in the case of strided access that gives the hardware prefetchers
plenty of time to load the data before the previous computation is done.