<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Thu, Jan 9, 2014 at 3:50 PM, Frédéric Bastien <span dir="ltr"><<a href="mailto:nouiz@nouiz.org" target="_blank">nouiz@nouiz.org</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi,<br>

<br>

It happen frequently that NumPy isn't compiled with all instruction<br>

that is available where it run. For example in distro. So if the<br>

decision is made to use the fast version when we don't use the newer<br>

instruction, the user need a way to know that. So the library need a<br>

function/attribute to tell that.<br></blockquote><div><br></div><div>As these instructions are very new runtime cpu feature detection is required. That way also distribution users get the fast code if their cpu supports it.<br>

</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

How hard would it be to provide the choise to the user? We could<br>

provide 2 functions like: fma_fast() fma_prec() (for precision)? Or<br>

this could be a parameter or a user configuration option like for the<br>

overflow/underflow error.<br></blockquote><div><br><span name="Freddie Witherden" class="">I like Freddie Witherden</span> proposal to name the function madd which does not guarantee one rounding operation. This leaves the namespace open for a special fma function with that guarantee. It can use the libc fma function which is very slow sometimes but platform independent. This is assuming apple did not again take shortcuts like they did with their libc hypot implementation, can someone disassemble apple libc to check what they are doing for C99 fma?<br>

And it leaves users the possibility to use the faster madd function if they do not need the precision guarantee.<br><br></div><div>Another option would be a precision context manager which tells numpy which variant to use. This would also be useful for other code (like abs/hypot/abs2/sum/reciprocal sqrt) but probably it involves more work.<br>

</div><div><br>with numpy.precision_mode('fast'):<br></div><div>  ... # allow no fma, use fast hypot, fast sum, ignore overflow/invalid errors<br><br></div><div>with numpy.precision_mode('precise'):<br></div>

<div>  ... # require fma, use precise hypot, use exact summation (math.fsum) or at least kahan summation, full overflow/invalid checks etc<br></div><div><br> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">


<br>

<div class=""><div class="h5"><br>

On Thu, Jan 9, 2014 at 9:43 AM, Freddie Witherden <<a href="mailto:freddie@witherden.org">freddie@witherden.org</a>> wrote:<br>

> On 08/01/14 21:39, Julian Taylor wrote:<br>

>> An issue is software emulation of real fma. This can be enabled in the<br>

>> test ufunc with npfma.set_type("libc").<br>

>> This is unfortunately incredibly slow about a factor 300 on my machine<br>

>> without hardware fma.<br>

>> This means we either have a function that is fast on some platforms and<br>

>> slow on others but always gives the same result or we have a fast<br>

>> function that gives better results on some platforms.<br>

>> Given that we are not worth that what numpy currently provides I favor<br>

>> the latter.<br>

>><br>

>> Any opinions on whether this should go into numpy or maybe stay a third<br>

>> party ufunc?<br>

><br>

> My preference would be to initially add an "madd" intrinsic.  This can<br>

> be supported on all platforms and can be documented to permit the use of<br>

> FMA where available.<br>

><br>

> A 'true' FMA intrinsic function should only be provided when hardware<br>

> FMA support is available.  Many of the more interesting applications of<br>

> FMA depend on there only being a single rounding step and as such "FMA"<br>

> should probably mean "a*b + c with only a single rounding".<br>

><br>

> Regards, Freddie.<br>

><br>

><br>

</div></div><div class=""><div class="h5">> _______________________________________________<br>

> NumPy-Discussion mailing list<br>

> <a href="mailto:NumPy-Discussion@scipy.org">NumPy-Discussion@scipy.org</a><br>

> <a href="http://mail.scipy.org/mailman/listinfo/numpy-discussion" target="_blank">http://mail.scipy.org/mailman/listinfo/numpy-discussion</a><br>

><br>

_______________________________________________<br>

NumPy-Discussion mailing list<br>

<a href="mailto:NumPy-Discussion@scipy.org">NumPy-Discussion@scipy.org</a><br>

<a href="http://mail.scipy.org/mailman/listinfo/numpy-discussion" target="_blank">http://mail.scipy.org/mailman/listinfo/numpy-discussion</a><br>

</div></div></blockquote></div><br></div></div>