Mailman 3 Fused multiply-add (FMA) - Python-ideas

newer
proposal: "python -m foo" should...

Fused multiply-add (FMA)

Juraj Sukop

Jan. 15, 2017

5:25 p.m.

Hello! Fused multiply-add (henceforth FMA) is an operation which calculates the product of two numbers and then the sum of the product and a third number with just one floating-point rounding. More concretely: r = x*y + z The value of `r` is the same as if the RHS was calculated with infinite precision and the rounded to a 32-bit single-precision or 64-bit double-precision floating-point number [1]. Even though one FMA CPU instruction might be calculated faster than the two separate instructions for multiply and add, its main advantage comes from the increased precision of numerical computations that involve the accumulation of products. Examples which benefit from using FMA are: dot product [2], compensated arithmetic [3], polynomial evaluation [4], matrix multiplication, Newton's method and many more [5]. C99 includes `fma` function to `math.h` [6] and emulates the calculation if the FMA instruction is not present on the host CPU [7]. PEP 7 states that "Python versions greater than or equal to 3.6 use C89 with several select C99 features" and that "Future C99 features may be added to this list in the future depending on compiler support" [8]. This proposal is then about adding new `fma` function with the following signature to `math` module: math.fma(x, y, z) '''Return a float representing the result of the operation `x*y + z` with single rounding error, as defined by the platform C library. The result is the same as if the operation was carried with infinite precision and rounded to a floating-point number.''' There is a simple module for Python 3 demonstrating the fused multiply-add operation which was build with simple `python3 setup.py build` under Linux [9]. Any feedback is greatly appreciated! Juraj Sukop [1] https://en.wikipedia.org/wiki/Multiply%E2%80%93accumulate_operation [2] S. Graillat, P. Langlois, N. Louvet. Accurate dot products with FMA. 2006 [3] S. Graillat, Accurate Floating Point Product and Exponentiation. 2007. [4] S. Graillat, P. Langlois, N. Louvet. Improving the compensated Horner scheme with a Fused Multiply and Add. 2006 [5] J.-M. Muller, N. Brisebarre, F. de Dinechin, C.-P. Jeannerod, V. Lefèvre, G. Melquiond, N. Revol, D. Stehlé, S. Torres. Handbook of Floating-Point Arithmetic. 2010. Chapter 5 [6] ISO/IEC 9899:TC3, "7.12.13.1 The fma functions", Committee Draft - Septermber 7, 2007 [7] https://git.musl-libc.org/cgit/musl/tree/src/math/fma.c [8] https://www.python.org/dev/peps/pep-0007/ [9] https://github.com/sukop/fma

Attachments:

attachment.htm (text/html — 3.2 KB)

Show replies by date

Stephan Houben

January 2017

6:52 p.m.

Hi Juraj, I think this would be a very useful addition to the `math' module. The gating issue is probably C compiler support. The most important non-C99 C compiler for Python is probably MS Visual Studio. And that one appears to support it: https://msdn.microsoft.com/en-us/library/mt720715.aspx So +1 on the proposal. Stephan 2017-01-15 18:25 GMT+01:00 Juraj Sukop <juraj.sukop@gmail.com>:

...

Hello!

Fused multiply-add (henceforth FMA) is an operation which calculates the product of two numbers and then the sum of the product and a third number with just one floating-point rounding. More concretely:

r = x*y + z

The value of `r` is the same as if the RHS was calculated with infinite precision and the rounded to a 32-bit single-precision or 64-bit double-precision floating-point number [1].

Even though one FMA CPU instruction might be calculated faster than the two separate instructions for multiply and add, its main advantage comes from the increased precision of numerical computations that involve the accumulation of products. Examples which benefit from using FMA are: dot product [2], compensated arithmetic [3], polynomial evaluation [4], matrix multiplication, Newton's method and many more [5].

C99 includes `fma` function to `math.h` [6] and emulates the calculation if the FMA instruction is not present on the host CPU [7]. PEP 7 states that "Python versions greater than or equal to 3.6 use C89 with several select C99 features" and that "Future C99 features may be added to this list in the future depending on compiler support" [8].

This proposal is then about adding new `fma` function with the following signature to `math` module:

math.fma(x, y, z)

'''Return a float representing the result of the operation `x*y + z` with single rounding error, as defined by the platform C library. The result is the same as if the operation was carried with infinite precision and rounded to a floating-point number.'''

There is a simple module for Python 3 demonstrating the fused multiply-add operation which was build with simple `python3 setup.py build` under Linux [9].

Any feedback is greatly appreciated!

Juraj Sukop

[1] https://en.wikipedia.org/wiki/Multiply%E2%80%93accumulate_operation [2] S. Graillat, P. Langlois, N. Louvet. Accurate dot products with FMA. 2006 [3] S. Graillat, Accurate Floating Point Product and Exponentiation. 2007. [4] S. Graillat, P. Langlois, N. Louvet. Improving the compensated Horner scheme with a Fused Multiply and Add. 2006 [5] J.-M. Muller, N. Brisebarre, F. de Dinechin, C.-P. Jeannerod, V. Lefèvre, G. Melquiond, N. Revol, D. Stehlé, S. Torres. Handbook of Floating-Point Arithmetic. 2010. Chapter 5 [6] ISO/IEC 9899:TC3, "7.12.13.1 The fma functions", Committee Draft - Septermber 7, 2007 [7] https://git.musl-libc.org/cgit/musl/tree/src/math/fma.c [8] https://www.python.org/dev/peps/pep-0007/ [9] https://github.com/sukop/fma

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Chris Angelico

7:10 p.m.

On Mon, Jan 16, 2017 at 4:25 AM, Juraj Sukop <juraj.sukop@gmail.com> wrote:

...

There is a simple module for Python 3 demonstrating the fused multiply-add operation which was build with simple `python3 setup.py build` under Linux [9].

Any feedback is greatly appreciated!

+1. Just tried it out, and apart from dropping a pretty little SystemError when I fat-finger the args wrong (a trivial matter of adding more argument checking), it looks good. Are there any possible consequences (not counting performance) of the fall-back? I don't understand all the code in what you linked to, but I think what's happening is that it goes to great lengths to avoid intermediate rounding, so the end result is always going to be the same. If that's the case, yeah, definite +1 on the proposal. ChrisA

Mark Dickinson

7:41 a.m.

On Sun, Jan 15, 2017 at 5:25 PM, Juraj Sukop <juraj.sukop@gmail.com> wrote:

...

This proposal is then about adding new `fma` function with the following signature to `math` module:

math.fma(x, y, z)

Sounds good to me. Please could you open an issue on the bug tracker (http://bugs.python.org)? Thanks, Mark

Victor Stinner

8:45 a.m.

2017-01-15 18:25 GMT+01:00 Juraj Sukop <juraj.sukop@gmail.com>:

...

C99 includes `fma` function to `math.h` [6] and emulates the calculation if the FMA instruction is not present on the host CPU [7].

If even the libc function has a fallback on x*y followed by +z, it's fine to add such function to the Python stdlib. It means that Python can do the same if the libc lacks a fma() function. In the math module, the trend is more to implement missing functions or add special code to workaround bugs or limitations of libc functions. Victor

Stephan Houben

10:01 a.m.

Hi Victor, The fallback implementations in the various libc take care to preserve the correct rounding behaviour. Let me stress that *fused* multiply-add means the specific rounding behaviour as defined in the standard IEEE-754 2008 (i.e. with guaranteed *no* intermediate rounding). So the following would not be a valid FMA fallback double bad_fma(double x, double y, double z) { return x*y + z; } Now in practice, people want FMA for two reasons. 1. They need the additional precision. 2. They want the performance of a hardware FMA instruction. Now, admittedly, the second category would be satisfied with the bad_fma fallback. However, I don't think 2. is a very compelling reason for fma *in pure Python code*, since the performance advantage would probably be dwarfed by interpreter overhead. So I would estimate that approx. 100% of the target audience of math.fma would want to use it for the increased accuracy. So providing a fallback which does not, in fact, give that accuracy would not make people happy. Upshot: if we want to provide a software fallback in the Python code, we need to do something slow and complicated like musl does. Possibly by actually using the musl code. Either that, or we rely on the Python-external libc implementation always. Stephan 2017-01-16 9:45 GMT+01:00 Victor Stinner <victor.stinner@gmail.com>:

...

2017-01-15 18:25 GMT+01:00 Juraj Sukop <juraj.sukop@gmail.com>:

...
C99 includes `fma` function to `math.h` [6] and emulates the calculation if the FMA instruction is not present on the host CPU [7].

If even the libc function has a fallback on x*y followed by +z, it's fine to add such function to the Python stdlib. It means that Python can do the same if the libc lacks a fma() function. In the math module, the trend is more to implement missing functions or add special code to workaround bugs or limitations of libc functions.

Victor _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Steven D'Aprano

11:04 a.m.

On Mon, Jan 16, 2017 at 11:01:23AM +0100, Stephan Houben wrote: [...]

...

So the following would not be a valid FMA fallback

double bad_fma(double x, double y, double z) { return x*y + z; } [...] Upshot: if we want to provide a software fallback in the Python code, we need to do something slow and complicated like musl does.

I don't know about complicated. I think this is pretty simple: from fractions import Fraction def fma(x, y, z): # Return x*y + z with only a single rounding. return float(Fraction(x)*Fraction(y) + Fraction(z)) When speed is not the number one priority and accuracy is important, its hard to beat the fractions module. -- Steve

Stephan Houben

3:02 p.m.

Hi Steve, Very good! Here is a version which also handles the nan's, infinities, negative zeros properly. =============== import math from fractions import Fraction def fma2(x, y, z): if math.isfinite(x) and math.isfinite(y) and math.isfinite(z): result = float(Fraction(x)*Fraction(y) + Fraction(z)) if not result and not z: result = math.copysign(result, x*y+z) else: result = x * y + z assert not math.isfinite(result) return result =========================== Stephan 2017-01-16 12:04 GMT+01:00 Steven D'Aprano <steve@pearwood.info>:

...

On Mon, Jan 16, 2017 at 11:01:23AM +0100, Stephan Houben wrote:

[...]

...
So the following would not be a valid FMA fallback

double bad_fma(double x, double y, double z) { return x*y + z; } [...] Upshot: if we want to provide a software fallback in the Python code, we need to do something slow and complicated like musl does.

I don't know about complicated. I think this is pretty simple:

from fractions import Fraction

def fma(x, y, z): # Return x*y + z with only a single rounding. return float(Fraction(x)*Fraction(y) + Fraction(z))

When speed is not the number one priority and accuracy is important, its hard to beat the fractions module.

-- Steve _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Guido van Rossum

6:06 p.m.

Does numpy support this? --Guido (mobile) On Jan 16, 2017 7:27 AM, "Stephan Houben" <stephanh42@gmail.com> wrote:

...

Hi Steve,

Very good! Here is a version which also handles the nan's, infinities, negative zeros properly.

=============== import math from fractions import Fraction

def fma2(x, y, z): if math.isfinite(x) and math.isfinite(y) and math.isfinite(z): result = float(Fraction(x)*Fraction(y) + Fraction(z)) if not result and not z: result = math.copysign(result, x*y+z) else: result = x * y + z assert not math.isfinite(result) return result ===========================

Stephan

2017-01-16 12:04 GMT+01:00 Steven D'Aprano <steve@pearwood.info>:

...
On Mon, Jan 16, 2017 at 11:01:23AM +0100, Stephan Houben wrote:

[...]

...
So the following would not be a valid FMA fallback

double bad_fma(double x, double y, double z) { return x*y + z; } [...] Upshot: if we want to provide a software fallback in the Python code, we need to do something slow and complicated like musl does.

I don't know about complicated. I think this is pretty simple:

from fractions import Fraction

def fma(x, y, z): # Return x*y + z with only a single rounding. return float(Fraction(x)*Fraction(y) + Fraction(z))

When speed is not the number one priority and accuracy is important, its hard to beat the fractions module.

-- Steve _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

David Mertz

6:44 p.m.

My understanding is that NumPy does NOT currently support a direct FMA operation "natively." However, higher-level routines like `numpy.linalg.solve` that are linked to MKL or BLAS DO take advantage of FMA within the underlying libraries. On Mon, Jan 16, 2017 at 10:06 AM, Guido van Rossum <gvanrossum@gmail.com> wrote:

...

Does numpy support this?

--Guido (mobile)

On Jan 16, 2017 7:27 AM, "Stephan Houben" <stephanh42@gmail.com> wrote:

...
Hi Steve,

Very good! Here is a version which also handles the nan's, infinities, negative zeros properly.

=============== import math from fractions import Fraction

def fma2(x, y, z): if math.isfinite(x) and math.isfinite(y) and math.isfinite(z): result = float(Fraction(x)*Fraction(y) + Fraction(z)) if not result and not z: result = math.copysign(result, x*y+z) else: result = x * y + z assert not math.isfinite(result) return result ===========================

Stephan

2017-01-16 12:04 GMT+01:00 Steven D'Aprano <steve@pearwood.info>:

...
On Mon, Jan 16, 2017 at 11:01:23AM +0100, Stephan Houben wrote:

[...]

...
So the following would not be a valid FMA fallback

double bad_fma(double x, double y, double z) { return x*y + z; } [...] Upshot: if we want to provide a software fallback in the Python code, we need to do something slow and complicated like musl does.

I don't know about complicated. I think this is pretty simple:

from fractions import Fraction

def fma(x, y, z): # Return x*y + z with only a single rounding. return float(Fraction(x)*Fraction(y) + Fraction(z))

When speed is not the number one priority and accuracy is important, its hard to beat the fractions module.

-- Steve _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

Gregory P. Smith

7:28 p.m.

Is there a good reason not to detect single expression multiply adds and just emit a new FMA bytecode? Is our goal for floats to strictly match the result of the same operations coded in unoptimized C using doubles? Or can we be more precise on occasion? I guess a similar question may be asked of all C compilers as they too could emit an FMA instruction on such expressions... If they don't do it by default, that suggests we match them and not do it either. Regardless +1 on adding math.fma() either way as it is an expression of precise intent. -gps On Mon, Jan 16, 2017, 10:44 AM David Mertz <mertz@gnosis.cx> wrote:

...

My understanding is that NumPy does NOT currently support a direct FMA operation "natively." However, higher-level routines like `numpy.linalg.solve` that are linked to MKL or BLAS DO take advantage of FMA within the underlying libraries.

On Mon, Jan 16, 2017 at 10:06 AM, Guido van Rossum <gvanrossum@gmail.com> wrote:

Does numpy support this?

--Guido (mobile)

On Jan 16, 2017 7:27 AM, "Stephan Houben" <stephanh42@gmail.com> wrote:

Hi Steve,

Very good! Here is a version which also handles the nan's, infinities, negative zeros properly.

=============== import math from fractions import Fraction

def fma2(x, y, z): if math.isfinite(x) and math.isfinite(y) and math.isfinite(z): result = float(Fraction(x)*Fraction(y) + Fraction(z)) if not result and not z: result = math.copysign(result, x*y+z) else: result = x * y + z assert not math.isfinite(result) return result ===========================

Stephan

2017-01-16 12:04 GMT+01:00 Steven D'Aprano <steve@pearwood.info>:

On Mon, Jan 16, 2017 at 11:01:23AM +0100, Stephan Houben wrote:

[...]

...
So the following would not be a valid FMA fallback

double bad_fma(double x, double y, double z) { return x*y + z; } [...] Upshot: if we want to provide a software fallback in the Python code, we need to do something slow and complicated like musl does.

I don't know about complicated. I think this is pretty simple:

from fractions import Fraction

def fma(x, y, z): # Return x*y + z with only a single rounding. return float(Fraction(x)*Fraction(y) + Fraction(z))

When speed is not the number one priority and accuracy is important, its hard to beat the fractions module.

-- Steve _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Sven R. Kunze

8:21 p.m.

On 16.01.2017 20:28, Gregory P. Smith wrote:

...

Is there a good reason not to detect single expression multiply adds and just emit a new FMA bytecode?

Same question here.

Stephan Houben

9:21 a.m.

Hi Gregory, 2017-01-16 20:28 GMT+01:00 Gregory P. Smith <greg@krypto.org>:

...

Is there a good reason not to detect single expression multiply adds and just emit a new FMA bytecode?

Yes ;-) See below.

...

Is our goal for floats to strictly match the result of the same operations coded in unoptimized C using doubles?

I think it should be. This determinism is a feature, i.e. it is of value to some, although not to everybody. The cost of this determinism if a possible loss of performance, but as I already mentioned in an earlier mail, I do not believe this cost would be observable in pure Python code. And anyway, people who care about numerical performance to that extent are all using Numpy.

...

Or can we be more precise on occasion?

Being more precise on occasion is only valuable if the occasion can be predicted/controlled by the programmer. (In this I assume you are not proposing that x*y+z would be guaranteed to produce an FMA on *all* platforms, even those lacking a hardware FMA. That would be very expensive.) Generally speaking, there are two reasons why people may *not* want an FMA operation. 1. They need their results to be reproducible across compilers/platforms. (the most common reason) 2. The correctness of their algorithm depends on the intermediate rounding step being done. As an example of the second, take for example the cross product of two 2D vectors: def cross(a, b): return a[0]*b[1] - b[0] * a[1] In exact mathematics, this operation has the property that cross(a, b) == -cross(b,a). In the current Python implementation, this property is preserved. Synthesising an FMA would destroy it. I guess a similar question may be asked of all C compilers as they too

...

could emit an FMA instruction on such expressions... If they don't do it by default, that suggests we match them and not do it either.

C99 has defined #pragma's to let the programmer indicate if they care about the strict FP model or not. So in C99 I can express the following three options: 1. I need an FMA, give it to me even if it needs to be emulated expensively in software: fma(x, y, z) 2. I do NOT want an FMA, please do intermediate rounding: #pragma STDC FP_CONTRACT OFF x*y + z 3. I don't care if you do intermediate rounding or not, just give me what is fastest: #pragma STDC FP_CONTRACT ON x*y + z Note that a conforming compiler can simply ignore FP_CONTRACT as long as it never generates an FMA for "x*y+z". This is what GCC does in -std mode. It's what I would recommend for Python. Regardless +1 on adding math.fma() either way as it is an expression of

...

precise intent.

Yep. Stephan

...

-gps

On Mon, Jan 16, 2017, 10:44 AM David Mertz <mertz@gnosis.cx> wrote:

...
My understanding is that NumPy does NOT currently support a direct FMA operation "natively." However, higher-level routines like `numpy.linalg.solve` that are linked to MKL or BLAS DO take advantage of FMA within the underlying libraries.

On Mon, Jan 16, 2017 at 10:06 AM, Guido van Rossum <gvanrossum@gmail.com> wrote:

Does numpy support this?

--Guido (mobile)

On Jan 16, 2017 7:27 AM, "Stephan Houben" <stephanh42@gmail.com> wrote:

Hi Steve,

Very good! Here is a version which also handles the nan's, infinities, negative zeros properly.

=============== import math from fractions import Fraction

def fma2(x, y, z): if math.isfinite(x) and math.isfinite(y) and math.isfinite(z): result = float(Fraction(x)*Fraction(y) + Fraction(z)) if not result and not z: result = math.copysign(result, x*y+z) else: result = x * y + z assert not math.isfinite(result) return result ===========================

Stephan

2017-01-16 12:04 GMT+01:00 Steven D'Aprano <steve@pearwood.info>:

On Mon, Jan 16, 2017 at 11:01:23AM +0100, Stephan Houben wrote:

[...]

...
So the following would not be a valid FMA fallback

double bad_fma(double x, double y, double z) { return x*y + z; } [...] Upshot: if we want to provide a software fallback in the Python code, we need to do something slow and complicated like musl does.

I don't know about complicated. I think this is pretty simple:

from fractions import Fraction

def fma(x, y, z): # Return x*y + z with only a single rounding. return float(Fraction(x)*Fraction(y) + Fraction(z))

When speed is not the number one priority and accuracy is important, its hard to beat the fractions module.

-- Steve _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Xavier Combelle

3:04 p.m.

...

Generally speaking, there are two reasons why people may *not* want an FMA operation. 1. They need their results to be reproducible across compilers/platforms. (the most common reason)

The reproducibility of floating point calculation is very hard to reach a good survey of the problem is https://randomascii.wordpress.com/2013/07/16/floating-point-determinism/ it mention the fma problem but it only a part of a biggest picture

Stephan Houben

3:48 p.m.

Hi Xavier, In this bright age of IEEE-754 compatible CPUs, it is certainly possible to achieve reproducible FP. I worked for a company whose software produced bit-identical results on various CPUs (x86, Sparc, Itanium) and OSes (Linux, Solaris, Windows). The trick is to closely RTFM for your CPU and compiler, in particular all those nice appendices related to "FPU control words" and "FP consistency models". For example, if the author of that article had done so, he might have learned about the "precision control" field of the x87 status register, which you can set so that all intermediate operations are always represented as 64-bits doubles. So no double roundings from double-extended precision. (Incidentally, the x87-internal double-extended precision is another fine example where being "more precise on occasion" usually does not help.) Frankly not very impressed with that article. I could go in detail but that's off-topic, and I will try to fight the "somebody is *wrong* on the Internet" urge. Stephan 2017-01-17 16:04 GMT+01:00 Xavier Combelle <xavier.combelle@gmail.com>:

...

Generally speaking, there are two reasons why people may *not* want an FMA operation. 1. They need their results to be reproducible across compilers/platforms. (the most common reason)

The reproducibility of floating point calculation is very hard to reach a good survey of the problem is https://randomascii.wordpress. com/2013/07/16/floating-point-determinism/ it mention the fma problem but it only a part of a biggest picture

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Gregory P. Smith

5:16 p.m.

Makes sense, thanks! math.fma() it is. :) On Tue, Jan 17, 2017, 7:48 AM Stephan Houben <stephanh42@gmail.com> wrote:

...

Hi Xavier,

In this bright age of IEEE-754 compatible CPUs, it is certainly possible to achieve reproducible FP. I worked for a company whose software produced bit-identical results on various CPUs (x86, Sparc, Itanium) and OSes (Linux, Solaris, Windows).

The trick is to closely RTFM for your CPU and compiler, in particular all those nice appendices related to "FPU control words" and "FP consistency models".

For example, if the author of that article had done so, he might have learned about the "precision control" field of the x87 status register, which you can set so that all intermediate operations are always represented as 64-bits doubles. So no double roundings from double-extended precision.

(Incidentally, the x87-internal double-extended precision is another fine example where being "more precise on occasion" usually does not help.)

Frankly not very impressed with that article. I could go in detail but that's off-topic, and I will try to fight the "somebody is *wrong* on the Internet" urge.

Stephan

2017-01-17 16:04 GMT+01:00 Xavier Combelle <xavier.combelle@gmail.com>:

Generally speaking, there are two reasons why people may *not* want an FMA operation. 1. They need their results to be reproducible across compilers/platforms. (the most common reason)

The reproducibility of floating point calculation is very hard to reach a good survey of the problem is https://randomascii.wordpress.com/2013/07/16/floating-point-determinism/ it mention the fma problem but it only a part of a biggest picture

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Xavier Combelle

6:12 p.m.

I never said it was impossible, just very hard. Le 17/01/2017 à 16:48, Stephan Houben a écrit :

...

Hi Xavier,

In this bright age of IEEE-754 compatible CPUs, it is certainly possible to achieve reproducible FP. I worked for a company whose software produced bit-identical results on various CPUs (x86, Sparc, Itanium) and OSes (Linux, Solaris, Windows).

The trick is to closely RTFM for your CPU and compiler, in particular all those nice appendices related to "FPU control words" and "FP consistency models".

For example, if the author of that article had done so, he might have learned about the "precision control" field of the x87 status register, which you can set so that all intermediate operations are always represented as 64-bits doubles. So no double roundings from double-extended precision.

(Incidentally, the x87-internal double-extended precision is another fine example where being "more precise on occasion" usually does not help.)

Frankly not very impressed with that article. I could go in detail but that's off-topic, and I will try to fight the "somebody is *wrong* on the Internet" urge.

Stephan

2017-01-17 16:04 GMT+01:00 Xavier Combelle <xavier.combelle@gmail.com <mailto:xavier.combelle@gmail.com>>:

...
Generally speaking, there are two reasons why people may *not* want an FMA operation. 1. They need their results to be reproducible across compilers/platforms. (the most common reason)

The reproducibility of floating point calculation is very hard to reach a good survey of the problem is https://randomascii.wordpress.com/2013/07/16/floating-point-determinism/ <https://randomascii.wordpress.com/2013/07/16/floating-point-determinism/> it mention the fma problem but it only a part of a biggest picture

_______________________________________________ Python-ideas mailing list Python-ideas@python.org <mailto:Python-ideas@python.org> https://mail.python.org/mailman/listinfo/python-ideas <https://mail.python.org/mailman/listinfo/python-ideas> Code of Conduct: http://python.org/psf/codeofconduct/ <http://python.org/psf/codeofconduct/>

2952

Age (days ago)

2954

Last active (days ago)

List overview

Download

16 comments

11 participants

participants (11)

Chris Angelico
David Mertz
Gregory P. Smith
Guido van Rossum
Juraj Sukop
Mark Dickinson
Stephan Houben
Steven D'Aprano
Sven R. Kunze
Victor Stinner
Xavier Combelle

Fused multiply-add (FMA)

Juraj Sukop

Stephan Houben

Stephan Houben

Stephan Houben

Stephan Houben

Stephan Houben

tags

participants (11)