Mailman 3 Floating point precision expectations in NumPy - NumPy-Discussion

Floating point precision expectations in NumPy

Sebastian Berg

27 Jul 2021 27 Jul '21

11:50 p.m.

Hi all, there is a proposal to add some Intel specific fast math routine to NumPy: https://github.com/numpy/numpy/pull/19478 part of numerical algorithms is that there is always a speed vs. precision trade-off, giving a more precise result is slower. So there is a question what the general precision expectation should be in NumPy. And how much is it acceptable to diverge in the precision/speed trade-off depending on CPU/system? I doubt we can formulate very clear rules here, but any input on what precision you would expect or trade-offs seem acceptable would be appreciated! Some more details ----------------- This is mainly interesting e.g. for functions like logarithms, trigonometric functions, or cubic roots. Some basic functions (multiplication, addition) are correct as per IEEE standard and give the best possible result, but these are typically only correct within very small numerical errors. This is typically measured as "ULP": https://en.wikipedia.org/wiki/Unit_in_the_last_place where 0.5 ULP would be the best possible result. Merging the PR may mean relaxing the current precision slightly in some places. In general Intel advertises 4 ULP of precision (although the actual precision for most functions seems better). Here are two tables, one from glibc and one for the Intel functions: https://www.gnu.org/software/libc/manual/html_node/Errors-in-Math-Functions.... (Mainly the LA column) https://software.intel.com/content/www/us/en/develop/documentation/onemkl-vm... Different implementation give different accuracy, but formulating some guidelines/expectation (or referencing them) would be useful guidance. For basic

Show replies by date

Gregor Thalhammer

28 Jul 28 Jul

10:13 a.m.

...

Am 28.07.2021 um 01:50 schrieb Sebastian Berg :

Hi all,

there is a proposal to add some Intel specific fast math routine to NumPy:

https://github.com/numpy/numpy/pull/19478

Many years ago I wrote a package https://github.com/geggo/uvml that makes the VML, a fast implementation of transcendetal math functions, available for numpy. Don’t know if it still compiles. It uses Intel VML, designed for processing arrays, not the SVML intrinsics. By this it is less machine dependent (optimized implementations are selected automatically depending on the availability of, e.g., SSE, AVX, or AVX512), just link to a library. It compiles as an external module, can be activated at runtime. Different precision models can be selected at runtime (globally). I thinks Intel advocates to use the LA (low accuracy) mode as a good compromise between performance and accuracy. Different people have strongly diverging opinions about what to expect. The speedups possibly gained by these approaches often vaporize in non-benchmark applications, as for those functions performance is often limited by memory bandwidth, unless all your data stays in CPU cache. By default I would go for high accuracy mode, with option to switch to low accuracy if one urgently needs the better performance. But then one should use different approaches for speeding up numpy. Gregor

...

part of numerical algorithms is that there is always a speed vs. precision trade-off, giving a more precise result is slower.

So there is a question what the general precision expectation should be in NumPy. And how much is it acceptable to diverge in the precision/speed trade-off depending on CPU/system?

I doubt we can formulate very clear rules here, but any input on what precision you would expect or trade-offs seem acceptable would be appreciated!

Some more details -----------------

This is mainly interesting e.g. for functions like logarithms, trigonometric functions, or cubic roots.

Some basic functions (multiplication, addition) are correct as per IEEE standard and give the best possible result, but these are typically only correct within very small numerical errors.

This is typically measured as "ULP":

https://en.wikipedia.org/wiki/Unit_in_the_last_place

where 0.5 ULP would be the best possible result.

Merging the PR may mean relaxing the current precision slightly in some places. In general Intel advertises 4 ULP of precision (although the actual precision for most functions seems better).

Here are two tables, one from glibc and one for the Intel functions:

https://www.gnu.org/software/libc/manual/html_node/Errors-in-Math-Functions.... (Mainly the LA column) https://software.intel.com/content/www/us/en/develop/documentation/onemkl-vm...

Different implementation give different accuracy, but formulating some guidelines/expectation (or referencing them) would be useful guidance.

For basic

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

Jerome Kieffer

18 Aug 18 Aug

1:14 p.m.

I strongly agree with you Gregor: * Best precision should remain the default. I lost months in finding the compiler option (in ICC) which switched to LA mode and broke all my calculations. * I wonder how those SVML behaves on non-intel plateform ? Sleef provides the same approach but it works also on Power and ARM platforms (and is designed to be extended...). Cheers, Jerome On Wed, 28 Jul 2021 12:13:44 +0200 Gregor Thalhammer wrote:

...

...
Am 28.07.2021 um 01:50 schrieb Sebastian Berg :

Hi all,

there is a proposal to add some Intel specific fast math routine to NumPy:

https://github.com/numpy/numpy/pull/19478

Many years ago I wrote a package https://github.com/geggo/uvml that makes the VML, a fast implementation of transcendetal math functions, available for numpy. Don’t know if it still compiles. It uses Intel VML, designed for processing arrays, not the SVML intrinsics. By this it is less machine dependent (optimized implementations are selected automatically depending on the availability of, e.g., SSE, AVX, or AVX512), just link to a library. It compiles as an external module, can be activated at runtime.

Different precision models can be selected at runtime (globally). I thinks Intel advocates to use the LA (low accuracy) mode as a good compromise between performance and accuracy. Different people have strongly diverging opinions about what to expect.

The speedups possibly gained by these approaches often vaporize in non-benchmark applications, as for those functions performance is often limited by memory bandwidth, unless all your data stays in CPU cache. By default I would go for high accuracy mode, with option to switch to low accuracy if one urgently needs the better performance. But then one should use different approaches for speeding up numpy.

Gregor

...
part of numerical algorithms is that there is always a speed vs. precision trade-off, giving a more precise result is slower.

So there is a question what the general precision expectation should be in NumPy. And how much is it acceptable to diverge in the precision/speed trade-off depending on CPU/system?

I doubt we can formulate very clear rules here, but any input on what precision you would expect or trade-offs seem acceptable would be appreciated!

Some more details -----------------

This is mainly interesting e.g. for functions like logarithms, trigonometric functions, or cubic roots.

Some basic functions (multiplication, addition) are correct as per IEEE standard and give the best possible result, but these are typically only correct within very small numerical errors.

This is typically measured as "ULP":

https://en.wikipedia.org/wiki/Unit_in_the_last_place

where 0.5 ULP would be the best possible result.

Merging the PR may mean relaxing the current precision slightly in some places. In general Intel advertises 4 ULP of precision (although the actual precision for most functions seems better).

Here are two tables, one from glibc and one for the Intel functions:

https://www.gnu.org/software/libc/manual/html_node/Errors-in-Math-Functions.... (Mainly the LA column) https://software.intel.com/content/www/us/en/develop/documentation/onemkl-vm...

Different implementation give different accuracy, but formulating some guidelines/expectation (or referencing them) would be useful guidance.

For basic

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

-- Jérôme Kieffer tel +33 476 882 445

Jerry Morrison

30 Jul 30 Jul

6:04 p.m.

On Tue, Jul 27, 2021 at 4:55 PM Sebastian Berg wrote:

...

Hi all,

there is a proposal to add some Intel specific fast math routine to NumPy:

https://github.com/numpy/numpy/pull/19478

part of numerical algorithms is that there is always a speed vs. precision trade-off, giving a more precise result is slower.

So there is a question what the general precision expectation should be in NumPy. And how much is it acceptable to diverge in the precision/speed trade-off depending on CPU/system?

I doubt we can formulate very clear rules here, but any input on what precision you would expect or trade-offs seem acceptable would be appreciated!

Some more details -----------------

This is mainly interesting e.g. for functions like logarithms, trigonometric functions, or cubic roots.

Some basic functions (multiplication, addition) are correct as per IEEE standard and give the best possible result, but these are typically only correct within very small numerical errors.

This is typically measured as "ULP":

https://en.wikipedia.org/wiki/Unit_in_the_last_place

where 0.5 ULP would be the best possible result.

Merging the PR may mean relaxing the current precision slightly in some places. In general Intel advertises 4 ULP of precision (although the actual precision for most functions seems better).

Here are two tables, one from glibc and one for the Intel functions:

https://www.gnu.org/software/libc/manual/html_node/Errors-in-Math-Functions.... (Mainly the LA column) https://software.intel.com/content/www/us/en/develop/documentation/onemkl-vm...

Different implementation give different accuracy, but formulating some guidelines/expectation (or referencing them) would be useful guidance.

"Close enough" depends on the application but non-linear models can get the "butterfly effect" where the results diverge if they aren't identical. For a certain class of scientific programming applications, reproducibility is paramount. Development teams may use a variety of development laptops, workstations, scientific computing clusters, and cloud computing platforms. If the tests pass on your machine but fail in CI, you have a debugging problem. If your published scientific article links to source code that replicates your computation, scientists will expect to be able to run that code, now or in a couple decades, and replicate the same outputs. They'll be using different OS releases and maybe different CPU + accelerator architectures. Reproducible Science is good. Replicated Science is better. http://rescience.github.io/ Clearly there are other applications where it's easy to trade reproducibility and some precision for speed.

Sebastian Berg

7:17 p.m.

On Fri, 2021-07-30 at 11:04 -0700, Jerry Morrison wrote:

...

On Tue, Jul 27, 2021 at 4:55 PM Sebastian Berg < sebastian@sipsolutions.net> wrote:

...
Hi all,

there is a proposal to add some Intel specific fast math routine to NumPy:

https://github.com/numpy/numpy/pull/19478

part of numerical algorithms is that there is always a speed vs. precision trade-off, giving a more precise result is slower.

<snip> I have to make a correction: I linked the SVML, which is distinct from VML (which the PR proposes), the actual table for precision is here: https://github.com/numpy/numpy/pull/19485#issuecomment-887995864

...

"Close enough" depends on the application but non-linear models can get the "butterfly effect" where the results diverge if they aren't identical.

Right, so my hope was to gauge what the general expectation is. I take it you expect a high accuracy. The error for the computations itself is seems low on first sight, but of course they can explode quickly in non-linear settings... (In the chaotic systems I worked with, the shadowing theorem would usually alleviate such worries. And testing the integration would be more important. But I am sure for certain questions things may be far more tricky.)

...

For a certain class of scientific programming applications, reproducibility is paramount.

Development teams may use a variety of development laptops, workstations, scientific computing clusters, and cloud computing platforms. If the tests pass on your machine but fail in CI, you have a debugging problem.

If your published scientific article links to source code that replicates your computation, scientists will expect to be able to run that code, now or in a couple decades, and replicate the same outputs. They'll be using different OS releases and maybe different CPU + accelerator architectures.

Reproducible Science is good. Replicated Science is better. http://rescience.github.io/

Clearly there are other applications where it's easy to trade reproducibility and some precision for speed.

Agreed, although there are so many factors, often out of our control, that I am not sure that true replicability is achievable without containers :(. It would be amazing if NumPy could have a "replicable" mode, but I am not sure how that could be done, or if the "ground work" in the math and linear algebra libraries even exists. However, even if it is practically impossible to make things replicable, there is an argument for improving reproducibility and replicability, e.g. by choosing the high-accuracy version here. Even if it is impossible to actually ensure. Cheers, Sebastian

...

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

Jerry Morrison

19 Aug 19 Aug

7:12 a.m.

On Fri, Jul 30, 2021 at 12:22 PM Sebastian Berg wrote:

...

On Fri, 2021-07-30 at 11:04 -0700, Jerry Morrison wrote:

...
On Tue, Jul 27, 2021 at 4:55 PM Sebastian Berg < sebastian@sipsolutions.net> wrote:

...
Hi all,

there is a proposal to add some Intel specific fast math routine to NumPy:

https://github.com/numpy/numpy/pull/19478

part of numerical algorithms is that there is always a speed vs. precision trade-off, giving a more precise result is slower.

<snip>

...

...
"Close enough" depends on the application but non-linear models can get the "butterfly effect" where the results diverge if they aren't identical.

Right, so my hope was to gauge what the general expectation is. I take it you expect a high accuracy.

The error for the computations itself is seems low on first sight, but of course they can explode quickly in non-linear settings... (In the chaotic systems I worked with, the shadowing theorem would usually alleviate such worries. And testing the integration would be more important. But I am sure for certain questions things may be far more tricky.)

I'll put forth an expectation that after installing a specific set of libraries, the floating point results would be identical across platforms and into the future. Ideally developers could install library updates (for hardware compatibility, security fixes, or other reasons) and still get identical results. That expectation is for reproducibility, not high accuracy. So it'd be fine to install different libraries [or maybe use those pip package options in brackets, whatever they do?] to trade accuracy for speed. Could any particular choice of accuracy still provide reproducible results across platforms and time?

...

...
For a certain class of scientific programming applications, reproducibility is paramount.

Development teams may use a variety of development laptops, workstations, scientific computing clusters, and cloud computing platforms. If the tests pass on your machine but fail in CI, you have a debugging problem.

If your published scientific article links to source code that replicates your computation, scientists will expect to be able to run that code, now or in a couple decades, and replicate the same outputs. They'll be using different OS releases and maybe different CPU + accelerator architectures.

Reproducible Science is good. Replicated Science is better. http://rescience.github.io/

Clearly there are other applications where it's easy to trade reproducibility and some precision for speed.

Agreed, although there are so many factors, often out of our control, that I am not sure that true replicability is achievable without containers :(.

It would be amazing if NumPy could have a "replicable" mode, but I am not sure how that could be done, or if the "ground work" in the math and linear algebra libraries even exists.

However, even if it is practically impossible to make things replicable, there is an argument for improving reproducibility and replicability, e.g. by choosing the high-accuracy version here. Even if it is impossible to actually ensure.

Yes! Let's at least have reproducibility in mind and work on improving it, e.g. by removing failure modes. (Ditto for security :-)

Stanley Seibert

6:18 p.m.

On Thu, Aug 19, 2021 at 2:13 AM Jerry Morrison < jerry.morrison+numpy@gmail.com> wrote:

...

I'll put forth an expectation that after installing a specific set of libraries, the floating point results would be identical across platforms and into the future. Ideally developers could install library updates (for hardware compatibility, security fixes, or other reasons) and still get identical results.

That expectation is for reproducibility, not high accuracy. So it'd be fine to install different libraries [or maybe use those pip package options in brackets, whatever they do?] to trade accuracy for speed. Could any particular choice of accuracy still provide reproducible results across platforms and time?

While this would be nice, in practice bit-identical results for floating point NumPy functions across different operating systems and future time is going to be impractical to achieve. IEEE-754 helps by specifying the result of basic floating point operations, but once you move into special math functions (like cos()) or other algorithms that can be implemented in several "mathematically equivalent" ways, bit-level stability basically becomes impossible without snapshotting your entire software stack. Many of these special math functions are provided by the operating system, which generally do not make such guarantees. Quick example: Suppose you want to implement sum() on a floating point array. If you start at the beginning of the array and iterate to the end, adding each element to an accumulator, you will get one answer. If you do mathematically equivalent pairwise summations (using a temporary array for storage), you will get a different, and probably more accurate answer. Neither answer will (in general) be the same as summing those numbers together with infinite precision, then rounding to the closest floating point number at the end. We could decide to make the specification for sum() also specify the algorithm for computing sum() to ensure we make the same round-off errors every time. However, this kind of detailed specification might be harder to write for other functions, or might even lock the library into accuracy bugs that can't be fixed in the future. I think the most pragmatic thing you can hope for is: - Bit-identical results with containers that snapshot everything, including the system math library. - Libraries that specify their accuracy levels when possible, and disclose when algorithm changes will affect the bit-identicalness of results. On a meta-level, if analysis conclusions depend on getting bit-identical results from floating point operations, then you really want to use a higher precision float and/or an algorithm less sensitive to round-off error. Floating point numbers are not real numbers. :)

980

Age (days ago)

1003

Last active (days ago)

List overview

Download

6 comments

5 participants

participants (5)

Gregor Thalhammer
Jerome Kieffer
Jerry Morrison
Sebastian Berg
Stanley Seibert

Floating point precision expectations in NumPy

Sebastian Berg

Gregor Thalhammer

Jerome Kieffer

Jerry Morrison

Sebastian Berg

Jerry Morrison

Stanley Seibert

tags

participants (5)