Intel's Short Vector Math Library has been merged as a git submodule
Yesterday we merged PR 19478 [0] to add a open source version of Intel's Short Vector Math library into NumPy. The original PR was by Raghuveer Devulapalli. There was a lot of discussion on the PR, but now that I look back it seems the mailing list may not have been involved. The code lives in https://github.com/numpy/SVML, and in order to pull it in you must now do `git submodule update --init` before building NumPy. The library provides AVX-512 implementations of the following math functions: exp, exp2, log, log2, log10, expm1, log1p, cbrt, pow, sin, cos, tan, asin, acos, atan, atan2, sinh, cosh, tanh, asinh, acosh and atanh, speeding them up significantly where AVX-512 is available. Before this PR was merged, Raghuveer added a set of tests for accuracy and compliance for these functions [1]. The accuracy is now defined as up to 4 ULP in the worst case for tan, cos, sin, asin, atan, expm1, but typically the error is up to 2 ULP. As SciPy already found out, some downstream libraries may need to tweak their tolerances for some functions as a result of this PR. We wanted to put it in early enough in the release cycle so that we can back it out fully or partially if the accuracy degradation is too large, so please speak up if you notice anything strange. Matti [0] https://github.com/numpy/numpy/pull/19478 [1] https://github.com/numpy/numpy/pull/19485
On Mon, 11 Oct 2021 18:04:58 +0300 Matti Picus <matti.picus@gmail.com> wrote:
As SciPy already found out, some downstream libraries may need to tweak their tolerances for some functions as a result of this PR. We wanted to put it in early enough in the release cycle so that we can back it out fully or partially if the accuracy degradation is too large, so please speak up if you notice anything strange.
Thanks for warning in advance... now, we need find some computers to test those versions. Do you know if it works "the same" with AVX2 ? since most computers have AVX2 and for now you need the latest servers to test AVX512. Cheers, Jerome
On 11/10/21 11:05 pm, Jerome Kieffer wrote:
On Mon, 11 Oct 2021 18:04:58 +0300 Matti Picus <matti.picus@gmail.com> wrote:
As SciPy already found out, some downstream libraries may need to tweak their tolerances for some functions as a result of this PR. We wanted to put it in early enough in the release cycle so that we can back it out fully or partially if the accuracy degradation is too large, so please speak up if you notice anything strange. Thanks for warning in advance... now, we need find some computers to test those versions. Do you know if it works "the same" with AVX2 ? since most computers have AVX2 and for now you need the latest servers to test AVX512.
Cheers,
Jerome
Short answer: the code path should be exactly the same on machines without AVX512 before and after this PR. Long answer: The use of intrinsics for ufunc loops is mostly described in the docs [0] . When calling a ufunc loop, a dispatch mechanism chooses the appropriate compiled loop for the available intrinsics on the system. You can see which intrinsics are supported on your installation of NumPy (new for 1.22) by using numpy.show_config(). The last few rows show which intrinsics are built into numpy and can possibly be used, and which subset is detected and will be used. This means we ship multiple variants of loops, and only one set will be used on each machine. So a machine without AVX512 will continue to use whatever loop it used before this PR. Matti [0] https://numpy.org/devdocs/reference/simd/simd-optimizations.html
participants (2)
-
Jerome Kieffer
-
Matti Picus