# [Numpy-discussion] testing with amd libm/acml

Dag Sverre Seljebotn d.s.seljebotn at astro.uio.no
Wed Nov 7 09:52:28 EST 2012

On 11/07/2012 03:30 PM, Neal Becker wrote:
> David Cournapeau wrote:
>
>> On Wed, Nov 7, 2012 at 1:56 PM, Neal Becker <ndbecker2 at gmail.com> wrote:
>>> David Cournapeau wrote:
>>>
>>>> On Wed, Nov 7, 2012 at 12:35 PM, Neal Becker <ndbecker2 at gmail.com> wrote:
>>>>> I'm trying to do a bit of benchmarking to see if amd libm/acml will help
>>>>> me.
>>>>>
>>>>> I got an idea that instead of building all of numpy/scipy and all of my
>>>>> custom modules against these libraries, I could simply use:
>>>>>
>>>>>
>>>
>>>>> <my program here>
>>>>>
>>>>> I'm hoping that both numpy and my own dll's then will take advantage of
>>>>> these libraries.
>>>>>
>>>>> Do you think this will work?
>>>>
>>>> Quite unlikely depending on your configuration, because those
>>>> libraries are rarely if ever ABI compatible (that's why it is such a
>>>> pain to support).
>>>>
>>>> David
>>>
>>> When you say quite unlikely (to work), you mean
>>>
>>> a) unlikely that libm/acml will be used to resolve symbols in numpy/dlls at
>>> runtime (e.g., exp)?
>>>
>>> or
>>>
>>> b) program may produce wrong results and/or crash ?
>>
>> Both, actually. That's not something I would use myself. Did you try
>> openblas ? It is open source, simple to build, and is pretty fast,
>>
>> David
>
> In my current work, probably the largest bottlenecks are 'max*',  which are
>
> log (\sum e^(x_i))

numexpr with Intel VML is the solution I know of that doesn't require
you to dig into compiling C code yourself. Did you look into that or is
using Intel VML/MKL not an option?

Fast exps depend on the CPU evaluating many exp's at the same time (both
explicit through vector registers, and implicit through pipelining);
even if you get what you try to work (which is unlikely I think) the
approach is inherently slow, since just passing a single number at the
time through the "exp" function can't be efficient.

Dag Sverre