[Numpy-discussion] adding fused multiply and add to numpy
jtaylor.debian at googlemail.com
Thu Jan 9 10:18:03 EST 2014
On Thu, Jan 9, 2014 at 3:54 PM, Daπid <davidmenhur at gmail.com> wrote:
> On 8 January 2014 22:39, Julian Taylor <jtaylor.debian at googlemail.com>wrote:
>> As you can see even without real hardware support it is about 30% faster
>> than inplace unblocked numpy due better use of memory bandwidth. Its
>> even more than two times faster than unoptimized numpy.
> I have an i5, and AVX crashes, even though it is supported by my CPU.
I forgot about the 32 byte alignment avx (as it is used in this code)
requires. I pushed a new version that takes care of it.
It should now work with avx.
> Following the instructions in the readme, there is only one compiled file,
> npfma.so, but no .o.
> the .o files are in the build/ subfolder
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion