On Tue, Jul 27, 2021 at 4:55 PM Sebastian Berg
Hi all,
there is a proposal to add some Intel specific fast math routine to NumPy:
https://github.com/numpy/numpy/pull/19478
part of numerical algorithms is that there is always a speed vs. precision trade-off, giving a more precise result is slower.
So there is a question what the general precision expectation should be in NumPy. And how much is it acceptable to diverge in the precision/speed trade-off depending on CPU/system?
I doubt we can formulate very clear rules here, but any input on what precision you would expect or trade-offs seem acceptable would be appreciated!
Some more details -----------------
This is mainly interesting e.g. for functions like logarithms, trigonometric functions, or cubic roots.
Some basic functions (multiplication, addition) are correct as per IEEE standard and give the best possible result, but these are typically only correct within very small numerical errors.
This is typically measured as "ULP":
https://en.wikipedia.org/wiki/Unit_in_the_last_place
where 0.5 ULP would be the best possible result.
Merging the PR may mean relaxing the current precision slightly in some places. In general Intel advertises 4 ULP of precision (although the actual precision for most functions seems better).
Here are two tables, one from glibc and one for the Intel functions:
https://www.gnu.org/software/libc/manual/html_node/Errors-in-Math-Functions.... (Mainly the LA column) https://software.intel.com/content/www/us/en/develop/documentation/onemkl-vm...
Different implementation give different accuracy, but formulating some guidelines/expectation (or referencing them) would be useful guidance.
"Close enough" depends on the application but non-linear models can get the "butterfly effect" where the results diverge if they aren't identical. For a certain class of scientific programming applications, reproducibility is paramount. Development teams may use a variety of development laptops, workstations, scientific computing clusters, and cloud computing platforms. If the tests pass on your machine but fail in CI, you have a debugging problem. If your published scientific article links to source code that replicates your computation, scientists will expect to be able to run that code, now or in a couple decades, and replicate the same outputs. They'll be using different OS releases and maybe different CPU + accelerator architectures. Reproducible Science is good. Replicated Science is better. http://rescience.github.io/ Clearly there are other applications where it's easy to trade reproducibility and some precision for speed.