Floating point precision expectations in NumPy
![](https://secure.gravatar.com/avatar/b4f6d4f8b501cb05fd054944a166a121.jpg?s=120&d=mm&r=g)
Hi all, there is a proposal to add some Intel specific fast math routine to NumPy: https://github.com/numpy/numpy/pull/19478 part of numerical algorithms is that there is always a speed vs. precision trade-off, giving a more precise result is slower. So there is a question what the general precision expectation should be in NumPy. And how much is it acceptable to diverge in the precision/speed trade-off depending on CPU/system? I doubt we can formulate very clear rules here, but any input on what precision you would expect or trade-offs seem acceptable would be appreciated! Some more details ----------------- This is mainly interesting e.g. for functions like logarithms, trigonometric functions, or cubic roots. Some basic functions (multiplication, addition) are correct as per IEEE standard and give the best possible result, but these are typically only correct within very small numerical errors. This is typically measured as "ULP": https://en.wikipedia.org/wiki/Unit_in_the_last_place where 0.5 ULP would be the best possible result. Merging the PR may mean relaxing the current precision slightly in some places. In general Intel advertises 4 ULP of precision (although the actual precision for most functions seems better). Here are two tables, one from glibc and one for the Intel functions: https://www.gnu.org/software/libc/manual/html_node/Errors-in-Math-Functions.... (Mainly the LA column) https://software.intel.com/content/www/us/en/develop/documentation/onemkl-vm... Different implementation give different accuracy, but formulating some guidelines/expectation (or referencing them) would be useful guidance. For basic
![](https://secure.gravatar.com/avatar/aecfaa98df819a602f8a1405a9a4a7b0.jpg?s=120&d=mm&r=g)
Many years ago I wrote a package https://github.com/geggo/uvml that makes the VML, a fast implementation of transcendetal math functions, available for numpy. Don’t know if it still compiles. It uses Intel VML, designed for processing arrays, not the SVML intrinsics. By this it is less machine dependent (optimized implementations are selected automatically depending on the availability of, e.g., SSE, AVX, or AVX512), just link to a library. It compiles as an external module, can be activated at runtime. Different precision models can be selected at runtime (globally). I thinks Intel advocates to use the LA (low accuracy) mode as a good compromise between performance and accuracy. Different people have strongly diverging opinions about what to expect. The speedups possibly gained by these approaches often vaporize in non-benchmark applications, as for those functions performance is often limited by memory bandwidth, unless all your data stays in CPU cache. By default I would go for high accuracy mode, with option to switch to low accuracy if one urgently needs the better performance. But then one should use different approaches for speeding up numpy. Gregor
![](https://secure.gravatar.com/avatar/114b52fab2628d55a23c40a246a08bd1.jpg?s=120&d=mm&r=g)
I strongly agree with you Gregor: * Best precision should remain the default. I lost months in finding the compiler option (in ICC) which switched to LA mode and broke all my calculations. * I wonder how those SVML behaves on non-intel plateform ? Sleef provides the same approach but it works also on Power and ARM platforms (and is designed to be extended...). Cheers, Jerome On Wed, 28 Jul 2021 12:13:44 +0200 Gregor Thalhammer <gregor.thalhammer@gmail.com> wrote:
-- Jérôme Kieffer tel +33 476 882 445
![](https://secure.gravatar.com/avatar/21f3f2d3ca8271d707aefc296052f21f.jpg?s=120&d=mm&r=g)
On Tue, Jul 27, 2021 at 4:55 PM Sebastian Berg <sebastian@sipsolutions.net> wrote:
"Close enough" depends on the application but non-linear models can get the "butterfly effect" where the results diverge if they aren't identical. For a certain class of scientific programming applications, reproducibility is paramount. Development teams may use a variety of development laptops, workstations, scientific computing clusters, and cloud computing platforms. If the tests pass on your machine but fail in CI, you have a debugging problem. If your published scientific article links to source code that replicates your computation, scientists will expect to be able to run that code, now or in a couple decades, and replicate the same outputs. They'll be using different OS releases and maybe different CPU + accelerator architectures. Reproducible Science is good. Replicated Science is better. <http://rescience.github.io/> Clearly there are other applications where it's easy to trade reproducibility and some precision for speed.
![](https://secure.gravatar.com/avatar/b4f6d4f8b501cb05fd054944a166a121.jpg?s=120&d=mm&r=g)
On Fri, 2021-07-30 at 11:04 -0700, Jerry Morrison wrote:
<snip> I have to make a correction: I linked the SVML, which is distinct from VML (which the PR proposes), the actual table for precision is here: https://github.com/numpy/numpy/pull/19485#issuecomment-887995864
Right, so my hope was to gauge what the general expectation is. I take it you expect a high accuracy. The error for the computations itself is seems low on first sight, but of course they can explode quickly in non-linear settings... (In the chaotic systems I worked with, the shadowing theorem would usually alleviate such worries. And testing the integration would be more important. But I am sure for certain questions things may be far more tricky.)
Agreed, although there are so many factors, often out of our control, that I am not sure that true replicability is achievable without containers :(. It would be amazing if NumPy could have a "replicable" mode, but I am not sure how that could be done, or if the "ground work" in the math and linear algebra libraries even exists. However, even if it is practically impossible to make things replicable, there is an argument for improving reproducibility and replicability, e.g. by choosing the high-accuracy version here. Even if it is impossible to actually ensure. Cheers, Sebastian
![](https://secure.gravatar.com/avatar/21f3f2d3ca8271d707aefc296052f21f.jpg?s=120&d=mm&r=g)
On Fri, Jul 30, 2021 at 12:22 PM Sebastian Berg <sebastian@sipsolutions.net> wrote:
I'll put forth an expectation that after installing a specific set of libraries, the floating point results would be identical across platforms and into the future. Ideally developers could install library updates (for hardware compatibility, security fixes, or other reasons) and still get identical results. That expectation is for reproducibility, not high accuracy. So it'd be fine to install different libraries [or maybe use those pip package options in brackets, whatever they do?] to trade accuracy for speed. Could any particular choice of accuracy still provide reproducible results across platforms and time?
Yes! Let's at least have reproducibility in mind and work on improving it, e.g. by removing failure modes. (Ditto for security :-)
![](https://secure.gravatar.com/avatar/4ee7af2230cae7f127cc3bc16c882b4f.jpg?s=120&d=mm&r=g)
On Thu, Aug 19, 2021 at 2:13 AM Jerry Morrison < jerry.morrison+numpy@gmail.com> wrote:
While this would be nice, in practice bit-identical results for floating point NumPy functions across different operating systems and future time is going to be impractical to achieve. IEEE-754 helps by specifying the result of basic floating point operations, but once you move into special math functions (like cos()) or other algorithms that can be implemented in several "mathematically equivalent" ways, bit-level stability basically becomes impossible without snapshotting your entire software stack. Many of these special math functions are provided by the operating system, which generally do not make such guarantees. Quick example: Suppose you want to implement sum() on a floating point array. If you start at the beginning of the array and iterate to the end, adding each element to an accumulator, you will get one answer. If you do mathematically equivalent pairwise summations (using a temporary array for storage), you will get a different, and probably more accurate answer. Neither answer will (in general) be the same as summing those numbers together with infinite precision, then rounding to the closest floating point number at the end. We could decide to make the specification for sum() also specify the algorithm for computing sum() to ensure we make the same round-off errors every time. However, this kind of detailed specification might be harder to write for other functions, or might even lock the library into accuracy bugs that can't be fixed in the future. I think the most pragmatic thing you can hope for is: - Bit-identical results with containers that snapshot everything, including the system math library. - Libraries that specify their accuracy levels when possible, and disclose when algorithm changes will affect the bit-identicalness of results. On a meta-level, if analysis conclusions depend on getting bit-identical results from floating point operations, then you really want to use a higher precision float and/or an algorithm less sensitive to round-off error. Floating point numbers are not real numbers. :)
![](https://secure.gravatar.com/avatar/aecfaa98df819a602f8a1405a9a4a7b0.jpg?s=120&d=mm&r=g)
Many years ago I wrote a package https://github.com/geggo/uvml that makes the VML, a fast implementation of transcendetal math functions, available for numpy. Don’t know if it still compiles. It uses Intel VML, designed for processing arrays, not the SVML intrinsics. By this it is less machine dependent (optimized implementations are selected automatically depending on the availability of, e.g., SSE, AVX, or AVX512), just link to a library. It compiles as an external module, can be activated at runtime. Different precision models can be selected at runtime (globally). I thinks Intel advocates to use the LA (low accuracy) mode as a good compromise between performance and accuracy. Different people have strongly diverging opinions about what to expect. The speedups possibly gained by these approaches often vaporize in non-benchmark applications, as for those functions performance is often limited by memory bandwidth, unless all your data stays in CPU cache. By default I would go for high accuracy mode, with option to switch to low accuracy if one urgently needs the better performance. But then one should use different approaches for speeding up numpy. Gregor
![](https://secure.gravatar.com/avatar/114b52fab2628d55a23c40a246a08bd1.jpg?s=120&d=mm&r=g)
I strongly agree with you Gregor: * Best precision should remain the default. I lost months in finding the compiler option (in ICC) which switched to LA mode and broke all my calculations. * I wonder how those SVML behaves on non-intel plateform ? Sleef provides the same approach but it works also on Power and ARM platforms (and is designed to be extended...). Cheers, Jerome On Wed, 28 Jul 2021 12:13:44 +0200 Gregor Thalhammer <gregor.thalhammer@gmail.com> wrote:
-- Jérôme Kieffer tel +33 476 882 445
![](https://secure.gravatar.com/avatar/21f3f2d3ca8271d707aefc296052f21f.jpg?s=120&d=mm&r=g)
On Tue, Jul 27, 2021 at 4:55 PM Sebastian Berg <sebastian@sipsolutions.net> wrote:
"Close enough" depends on the application but non-linear models can get the "butterfly effect" where the results diverge if they aren't identical. For a certain class of scientific programming applications, reproducibility is paramount. Development teams may use a variety of development laptops, workstations, scientific computing clusters, and cloud computing platforms. If the tests pass on your machine but fail in CI, you have a debugging problem. If your published scientific article links to source code that replicates your computation, scientists will expect to be able to run that code, now or in a couple decades, and replicate the same outputs. They'll be using different OS releases and maybe different CPU + accelerator architectures. Reproducible Science is good. Replicated Science is better. <http://rescience.github.io/> Clearly there are other applications where it's easy to trade reproducibility and some precision for speed.
![](https://secure.gravatar.com/avatar/b4f6d4f8b501cb05fd054944a166a121.jpg?s=120&d=mm&r=g)
On Fri, 2021-07-30 at 11:04 -0700, Jerry Morrison wrote:
<snip> I have to make a correction: I linked the SVML, which is distinct from VML (which the PR proposes), the actual table for precision is here: https://github.com/numpy/numpy/pull/19485#issuecomment-887995864
Right, so my hope was to gauge what the general expectation is. I take it you expect a high accuracy. The error for the computations itself is seems low on first sight, but of course they can explode quickly in non-linear settings... (In the chaotic systems I worked with, the shadowing theorem would usually alleviate such worries. And testing the integration would be more important. But I am sure for certain questions things may be far more tricky.)
Agreed, although there are so many factors, often out of our control, that I am not sure that true replicability is achievable without containers :(. It would be amazing if NumPy could have a "replicable" mode, but I am not sure how that could be done, or if the "ground work" in the math and linear algebra libraries even exists. However, even if it is practically impossible to make things replicable, there is an argument for improving reproducibility and replicability, e.g. by choosing the high-accuracy version here. Even if it is impossible to actually ensure. Cheers, Sebastian
![](https://secure.gravatar.com/avatar/21f3f2d3ca8271d707aefc296052f21f.jpg?s=120&d=mm&r=g)
On Fri, Jul 30, 2021 at 12:22 PM Sebastian Berg <sebastian@sipsolutions.net> wrote:
I'll put forth an expectation that after installing a specific set of libraries, the floating point results would be identical across platforms and into the future. Ideally developers could install library updates (for hardware compatibility, security fixes, or other reasons) and still get identical results. That expectation is for reproducibility, not high accuracy. So it'd be fine to install different libraries [or maybe use those pip package options in brackets, whatever they do?] to trade accuracy for speed. Could any particular choice of accuracy still provide reproducible results across platforms and time?
Yes! Let's at least have reproducibility in mind and work on improving it, e.g. by removing failure modes. (Ditto for security :-)
![](https://secure.gravatar.com/avatar/4ee7af2230cae7f127cc3bc16c882b4f.jpg?s=120&d=mm&r=g)
On Thu, Aug 19, 2021 at 2:13 AM Jerry Morrison < jerry.morrison+numpy@gmail.com> wrote:
While this would be nice, in practice bit-identical results for floating point NumPy functions across different operating systems and future time is going to be impractical to achieve. IEEE-754 helps by specifying the result of basic floating point operations, but once you move into special math functions (like cos()) or other algorithms that can be implemented in several "mathematically equivalent" ways, bit-level stability basically becomes impossible without snapshotting your entire software stack. Many of these special math functions are provided by the operating system, which generally do not make such guarantees. Quick example: Suppose you want to implement sum() on a floating point array. If you start at the beginning of the array and iterate to the end, adding each element to an accumulator, you will get one answer. If you do mathematically equivalent pairwise summations (using a temporary array for storage), you will get a different, and probably more accurate answer. Neither answer will (in general) be the same as summing those numbers together with infinite precision, then rounding to the closest floating point number at the end. We could decide to make the specification for sum() also specify the algorithm for computing sum() to ensure we make the same round-off errors every time. However, this kind of detailed specification might be harder to write for other functions, or might even lock the library into accuracy bugs that can't be fixed in the future. I think the most pragmatic thing you can hope for is: - Bit-identical results with containers that snapshot everything, including the system math library. - Libraries that specify their accuracy levels when possible, and disclose when algorithm changes will affect the bit-identicalness of results. On a meta-level, if analysis conclusions depend on getting bit-identical results from floating point operations, then you really want to use a higher precision float and/or an algorithm less sensitive to round-off error. Floating point numbers are not real numbers. :)
participants (5)
-
Gregor Thalhammer
-
Jerome Kieffer
-
Jerry Morrison
-
Sebastian Berg
-
Stanley Seibert