[Numpy-discussion] adding fused multiply and add to numpy

Julian Taylor jtaylor.debian at googlemail.com
Thu Jan 9 10:18:03 EST 2014


On Thu, Jan 9, 2014 at 3:54 PM, Daπid <davidmenhur at gmail.com> wrote:

>
> On 8 January 2014 22:39, Julian Taylor <jtaylor.debian at googlemail.com>wrote:
>
>> As you can see even without real hardware support it is about 30% faster
>> than inplace unblocked numpy due better use of memory bandwidth. Its
>> even more than two times faster than unoptimized numpy.
>>
>
> I have an i5, and AVX crashes, even though it is supported by my CPU.
>

I forgot about the 32 byte alignment avx (as it is used in this code)
requires. I pushed a new version that takes care of it.
It should now work with avx.


> Following the instructions in the readme, there is only one compiled file,
> npfma.so, but no .o.
>
>
> the .o files are in the build/ subfolder
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140109/8646a5f3/attachment.html>


More information about the NumPy-Discussion mailing list