[Numpy-discussion] adding fused multiply and add to numpy
freddie at witherden.org
Thu Jan 9 09:43:07 EST 2014
On 08/01/14 21:39, Julian Taylor wrote:
> An issue is software emulation of real fma. This can be enabled in the
> test ufunc with npfma.set_type("libc").
> This is unfortunately incredibly slow about a factor 300 on my machine
> without hardware fma.
> This means we either have a function that is fast on some platforms and
> slow on others but always gives the same result or we have a fast
> function that gives better results on some platforms.
> Given that we are not worth that what numpy currently provides I favor
> the latter.
> Any opinions on whether this should go into numpy or maybe stay a third
> party ufunc?
My preference would be to initially add an "madd" intrinsic. This can
be supported on all platforms and can be documented to permit the use of
FMA where available.
A 'true' FMA intrinsic function should only be provided when hardware
FMA support is available. Many of the more interesting applications of
FMA depend on there only being a single rounding step and as such "FMA"
should probably mean "a*b + c with only a single rounding".
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 836 bytes
Desc: OpenPGP digital signature
More information about the NumPy-Discussion