Am 11.03.2015 um 23:18 schrieb Dp Docs <sdpan21@gmail.com>:



On Wed, Mar 11, 2015 at 10:34 PM, Gregor Thalhammer <gregor.thalhammer@gmail.com> wrote:
>
>
> On the scipy mailing list I also answered to Amine, who is also interested in this proposal.
​​ Can you provide the link of that discussion? I am getting trouble in searching that.

​>​
Long time ago I wrote a package that
>​
provides fast math functions (ufuncs) for numpy, using Intel’s MKL/VML library, see  https://github.com/geggo/uvml and my comments
​>​
there. This code could be easily ported to use other vector math libraries.

​When MKL is not available for a System, will this integration work with default numpy maths functions?
​>​
 Would be interesting to evaluate other possibilities. Due to
​>​
the fact that MKL is non-free, there are concerns to use it with numpy,
​>​
although e.g. numpy and scipy using the MKL LAPACK
​>​
routines are used frequently (Anaconda or Christoph Gohlkes  binaries).
>
> You can easily inject the fast math ufuncs into numpy, e.g. with set_numeric_ops() or np.sin = vml.sin. 

​Can you explain in a bit detail or provide a link where i can see it?​

My approach for https://github.com/geggo/uvml was to provide a separate python extension that provides faster numpy ufuncs for math operations like exp, sin, cos, … To replace the standard numpy ufuncs by the optimized ones you don’t need to apply changes to the source code of numpy, instead at runtime you monkey patch it and get faster math everywhere. Numpy even offers an interface (set_numeric_ops) to modify it at runtime. 

Another note, numpy makes it easy to provide new ufuncs, see 
http://docs.scipy.org/doc/numpy-dev/user/c-info.ufunc-tutorial.html
from a C function that operates on 1D arrays, but this function needs to support arbitrary spacing (stride) between the items. Unfortunately, to achieve good performance, vector math libraries often expect that the items are laid out contiguously in memory. MKL/VML is a notable exception. So for non contiguous in- or output arrays you might need to copy the data to a buffer, which likely kills large amounts of the performance gain. This does not completely rule out some of the libraries, since performance critical data is likely to be stored in contiguous arrays.

Using a library that supports only vector math for contiguous arrays is more difficult, but perhaps the numpy nditer provides everything needed. 

Gregor