Hi Gregory,

2017-01-16 20:28 GMT+01:00 Gregory P. Smith <greg@krypto.org>:

Is there a good reason not to detect single expression multiply adds and just emit a new FMA bytecode?

Yes ;-) See below.
 

Is our goal for floats to strictly match the result of the same operations coded in unoptimized C using doubles?


I think it should be. This determinism is a feature, i.e. it is of value to some,
although not to everybody.
The cost of this determinism if a possible loss of performance, but as I already mentioned
in an earlier mail, I do not believe this cost would be observable in pure Python code.
And anyway, people who care about numerical performance to that extent are all using Numpy.
 

Or can we be more precise on occasion?


Being more precise on occasion is only valuable if the occasion can be predicted/controlled by the programmer.
(In this I assume you are not proposing that x*y+z would be guaranteed to produce an FMA on *all* platforms,
even those lacking a hardware FMA. That would be very expensive.)
 
Generally speaking, there are two reasons why people may *not* want an FMA operation.
1. They need their results to be reproducible across compilers/platforms. (the most common reason)
2. The correctness of their algorithm depends on the intermediate rounding step being done.

As an example of the second, take for example the cross product of two 2D vectors:

def cross(a, b):
    return a[0]*b[1] - b[0] * a[1]

In exact mathematics, this operation has the property that cross(a, b) == -cross(b,a).
In the current Python implementation, this property is preserved. 
Synthesising an FMA would destroy it.

I guess a similar question may be asked of all C compilers as they too could emit an FMA instruction on such expressions... If they don't do it by default, that suggests we match them and not do it either.


C99 has defined #pragma's to let the programmer indicate if they care about the strict FP model or not.
So in C99 I can express the following three options:

1. I need an FMA, give it to me even if it needs to be emulated expensively in software:
     fma(x, y, z)

2. I do NOT want an FMA, please do intermediate rounding:
  #pragma STDC FP_CONTRACT OFF
   x*y + z

3. I don't care if you do intermediate rounding or not, just give me what is fastest:
  #pragma STDC FP_CONTRACT ON
   x*y + z

Note that a conforming compiler can simply ignore FP_CONTRACT as long as it never
generates an FMA for "x*y+z". This is what GCC does in -std mode.
It's what I would recommend for Python.

Regardless +1 on adding math.fma() either way as it is an expression of precise intent.


Yep.

Stephan
 

-gps


On Mon, Jan 16, 2017, 10:44 AM David Mertz <mertz@gnosis.cx> wrote:
My understanding is that NumPy does NOT currently support a direct FMA operation "natively."  However, higher-level routines like `numpy.linalg.solve` that are linked to MKL or BLAS DO take advantage of FMA within the underlying libraries.

On Mon, Jan 16, 2017 at 10:06 AM, Guido van Rossum <gvanrossum@gmail.com> wrote:
Does numpy support this?

--Guido (mobile)

On Jan 16, 2017 7:27 AM, "Stephan Houben" <stephanh42@gmail.com> wrote:
Hi Steve,

Very good!
Here is a version which also handles the nan's, infinities,
negative zeros properly.

===============
import math
from fractions import Fraction

def fma2(x, y, z):
    if math.isfinite(x) and math.isfinite(y) and math.isfinite(z):
        result = float(Fraction(x)*Fraction(y) + Fraction(z))
        if not result and not z:
            result = math.copysign(result, x*y+z)
    else:
        result = x * y + z
        assert not math.isfinite(result)
    return result
===========================

Stephan


2017-01-16 12:04 GMT+01:00 Steven D'Aprano <steve@pearwood.info>:
On Mon, Jan 16, 2017 at 11:01:23AM +0100, Stephan Houben wrote:

[...]
> So the following would not be a valid FMA fallback
>
> double bad_fma(double x, double y, double z) {
>   return x*y + z;
> }
[...]
> Upshot: if we want to provide a software fallback in the Python code, we
> need to do something slow and complicated like musl does.

I don't know about complicated. I think this is pretty simple:

from fractions import Fraction

def fma(x, y, z):
    # Return x*y + z with only a single rounding.
    return float(Fraction(x)*Fraction(y) + Fraction(z))


When speed is not the number one priority and accuracy is important,
its hard to beat the fractions module.


--
Steve
_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/



--
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/