Linking other libm-Implementation
Hi all, I wanted to know if there is any sane way to build numpy while linking to a different implementation of libm? A drop-in replacement for libm (e.g. openlibm) should in principle work, I guess, but I did not manage to actually make it work. As far as I understand the build code, setting MATHLIB=openlibm should suffice, but it did not. The build works fine, but in the end when running numpy apparently the functions of the system libm.so are used. I could not verify this directly (as I do not know how) but noticed that there is no performance difference between the builds - while there is one with pure C programs linked against libm and openlibm. Using amdlibm would require some work as the functions are prefixed with "_amd", I guess? Using intels libimf should work when using intels compiler, but I did not try this. With gcc I did not get it to work. A quite general question: At the moment the performance and the accuracy of the base mathematical functions depends on the platform and libm-Implementation of the system. Although there are functions defined in npy_math, they are only used as fall-backs, if they are not provided by a library. (correct me if I am wrong here) Is there some plan to change this in the future and provide defined behaviour (specified accuracy and/or speed) across platforms? As I understood it Julia started openlibm for this reason (which is based on fdlibm/msun, same as npy_math). Cheers Nils
On Sun, Feb 7, 2016 at 4:39 PM, Nils Becker
Hi all,
I wanted to know if there is any sane way to build numpy while linking to a different implementation of libm? A drop-in replacement for libm (e.g. openlibm) should in principle work, I guess, but I did not manage to actually make it work. As far as I understand the build code, setting MATHLIB=openlibm should suffice, but it did not. The build works fine, but in the end when running numpy apparently the functions of the system libm.so are used. I could not verify this directly (as I do not know how) but noticed that there is no performance difference between the builds - while there is one with pure C programs linked against libm and openlibm. Using amdlibm would require some work as the functions are prefixed with "_amd", I guess? Using intels libimf should work when using intels compiler, but I did not try this. With gcc I did not get it to work.
A quite general question: At the moment the performance and the accuracy of the base mathematical functions depends on the platform and libm-Implementation of the system. Although there are functions defined in npy_math, they are only used as fall-backs, if they are not provided by a library. (correct me if I am wrong here) Is there some plan to change this in the future and provide defined behaviour (specified accuracy and/or speed) across platforms? As I understood it Julia started openlibm for this reason (which is based on fdlibm/msun, same as npy_math).
The npy_math functions are used if otherwise unavailable OR if someone has at some point noticed that say glibc 2.4-2.10 has a bad quality tan (or whatever) and added a special case hack that checks for those particular library versions and uses our built-in version instead. It's not the most convenient setup to maintain, so there's been some discussion of trying openlibm instead [1], but AFAIK you're the first person to find the time to actually sit down and try doing it :-). You should be able to tell what math library you're linked to by running ldd (on linux) or otool (on OS X) against the .so / .dylib files inside your built copy of numpy -- e.g. ldd numpy/core/umath.cpython-34m.so (exact filename and command will vary depending on python version and platform). -n [1] https://github.com/numpy/numpy/search?q=openlibm&type=Issues&utf8=%E2%9C%93 -- Nathaniel J. Smith -- https://vorpus.org
The npy_math functions are used if otherwise unavailable OR if someone has at some point noticed that say glibc 2.4-2.10 has a bad quality tan (or whatever) and added a special case hack that checks for those particular library versions and uses our built-in version instead. It's not the most convenient setup to maintain, so there's been some discussion of trying openlibm instead [1], but AFAIK you're the first person to find the time to actually sit down and try doing it :-).
You should be able to tell what math library you're linked to by running ldd (on linux) or otool (on OS X) against the .so / .dylib files inside your built copy of numpy -- e.g.
ldd numpy/core/umath.cpython-34m.so
(exact filename and command will vary depending on python version and platform).
-n
[1] https://github.com/numpy/numpy/search?q=openlibm&type=Issues&utf8=%E2%9C%93
Ok, I with a little help from someone, at least I got it to work somehow. Apparently linking to openlibm is not a problem, MATHLIB=openlibm does the job. The resulting .so-files are linked to openlibm AND libm. I do not know why, maybe you would have to call gcc with -nostdlib and explicitly include everything you need. When running such a build of numpy, however, only the functions in libm are called. What did the job was to export LD_PRELOAD=/usr/lib/libopenlibm.so. In that case the functions from openlibm are used. This works with any build of numpy and needs no rebuilding. Of course its hacky and not a solution but at the moment it seems by far the easiest way to use a different libm implementation. This does also work with intels libimf. It does not work with amdlibm as they use the prefix amd_ in function names which would require real changes to the build system. Very superficial benchmarks (see below) seem devastating for gnu libm. It seems that openlibm (compiled with gcc -mtune=native -O3) performs really well and intels libm implementation is the best (on my intel CPU). I did not check the accuracy of the functions, though. My own code uses a lot of trigonometric and complex functions (optics calculations). I'd guess it could go 25% faster by just using a better libm implementation. Therefore, I have an interest in getting sane linking to a defined libm implementation to work. Apparently openlibm seems quite a good choice for numpy, at least performance wise. However, I did not find any documentation or tests of the accuracy of its functions. A benchmarking and testing (for accuracy) code for libms would probably be a good starting point for a discussion. I could maybe help with that - but apparently not with any linking/building stuff (I just don't get it). Benchmark: gnu libm.so 3000 x sin(double[100000]): 6.68215647800389 s 3000 x log(double[100000]): 8.86350397899514 s 3000 x exp(double[100000]): 6.560557693999726 s openlibm.so 3000 x sin(double[100000]): 4.5058218560006935 s 3000 x log(double[100000]): 4.106520485998772 s 3000 x exp(double[100000]): 4.597905882001214 s Intel libimf.so 3000 x sin(double[100000]): 4.282402812998043 s 3000 x log(double[100000]): 4.008453270995233 s 3000 x exp(double[100000]): 3.301279639999848 s
On Feb 8, 2016 3:04 AM, "Nils Becker"
[...]
Very superficial benchmarks (see below) seem devastating for gnu libm. It seems that openlibm (compiled with gcc -mtune=native -O3) performs really well and intels libm implementation is the best (on my intel CPU). I did not check the accuracy of the functions, though.
My own code uses a lot of trigonometric and complex functions (optics calculations). I'd guess it could go 25% faster by just using a better libm implementation. Therefore, I have an interest in getting sane linking to a defined libm implementation to work.
Apparently openlibm seems quite a good choice for numpy, at least
On further thought: I guess that to do this we actually will need to change the names of the functions in openlibm and then use those names when calling from numpy. So long as we're using the regular libm symbol names, it doesn't matter what library the python extensions themselves are linked to; the way ELF symbol lookup works, the libm that the python interpreter is linked to will be checked *before* checking the libm that numpy is linked to, so the symbols will all get shadowed. I guess statically linking openlibm would also work, but not sure that's a great idea since we'd need it multiple places. performance wise. However, I did not find any documentation or tests of the accuracy of its functions. A benchmarking and testing (for accuracy) code for libms would probably be a good starting point for a discussion. I could maybe help with that - but apparently not with any linking/building stuff (I just don't get it).
Benchmark:
gnu libm.so 3000 x sin(double[100000]): 6.68215647800389 s 3000 x log(double[100000]): 8.86350397899514 s 3000 x exp(double[100000]): 6.560557693999726 s
openlibm.so 3000 x sin(double[100000]): 4.5058218560006935 s 3000 x log(double[100000]): 4.106520485998772 s 3000 x exp(double[100000]): 4.597905882001214 s
Intel libimf.so 3000 x sin(double[100000]): 4.282402812998043 s 3000 x log(double[100000]): 4.008453270995233 s 3000 x exp(double[100000]): 3.301279639999848 s
I would be highly suspicious that this speed comes at the expense of accuracy... My impression is that there's a lot of room to make speed/accuracy tradeoffs in these functions, and modern glibc's libm has seen a fair amount of scrutiny by people who have access to the same code that openlibm is based off of. But then again, maybe not :-). If these are the operations that you care about optimizing, an even better approach might be to figure out how to integrate a vector math library here like yeppp (BSD licensed) or MKL. Libm tries to optimize log(scalar); these are libraries that specifically try to optimize log(vector). Adding this would require changing numpy's code to use these new APIs though. (Very new gcc can also try to do this in some cases but I don't know how good at it it is... Julian might.) -n
On 02/08/2016 06:36 PM, Nathaniel Smith wrote:
On Feb 8, 2016 3:04 AM, "Nils Becker"
mailto:nilsc.becker@gmail.com> wrote: Very superficial benchmarks (see below) seem devastating for gnu libm. It seems that openlibm (compiled with gcc -mtune=native -O3) performs really well and intels libm implementation is the best (on my intel CPU). I did not check the accuracy of the functions, though.
My own code uses a lot of trigonometric and complex functions (optics calculations). I'd guess it could go 25% faster by just using a better
[...] libm implementation. Therefore, I have an interest in getting sane linking to a defined libm implementation to work.
On further thought: I guess that to do this we actually will need to change the names of the functions in openlibm and then use those names when calling from numpy. So long as we're using the regular libm symbol names, it doesn't matter what library the python extensions themselves are linked to; the way ELF symbol lookup works, the libm that the python interpreter is linked to will be checked *before* checking the libm that numpy is linked to, so the symbols will all get shadowed.
I guess statically linking openlibm would also work, but not sure that's a great idea since we'd need it multiple places.
Apparently openlibm seems quite a good choice for numpy, at least performance wise. However, I did not find any documentation or tests of the accuracy of its functions. A benchmarking and testing (for accuracy) code for libms would probably be a good starting point for a discussion. I could maybe help with that - but apparently not with any linking/building stuff (I just don't get it).
Benchmark:
gnu libm.so 3000 x sin(double[100000]): 6.68215647800389 s 3000 x log(double[100000]): 8.86350397899514 s 3000 x exp(double[100000]): 6.560557693999726 s
openlibm.so 3000 x sin(double[100000]): 4.5058218560006935 s 3000 x log(double[100000]): 4.106520485998772 s 3000 x exp(double[100000]): 4.597905882001214 s
Intel libimf.so 3000 x sin(double[100000]): 4.282402812998043 s 3000 x log(double[100000]): 4.008453270995233 s 3000 x exp(double[100000]): 3.301279639999848 s
I would be highly suspicious that this speed comes at the expense of accuracy... My impression is that there's a lot of room to make speed/accuracy tradeoffs in these functions, and modern glibc's libm has seen a fair amount of scrutiny by people who have access to the same code that openlibm is based off of. But then again, maybe not :-).
If these are the operations that you care about optimizing, an even better approach might be to figure out how to integrate a vector math library here like yeppp (BSD licensed) or MKL. Libm tries to optimize log(scalar); these are libraries that specifically try to optimize log(vector). Adding this would require changing numpy's code to use these new APIs though. (Very new gcc can also try to do this in some cases but I don't know how good at it it is... Julian might.)
-n
which version of glibm was used here? There are significant difference in performance between versions. Also the input ranges are very important for these functions, depending on input the speed of these functions can vary by factors of 1000. glibm now includes vectorized versions of most math functions, does openlibm have vectorized math? Thats where most speed can be gained, a lot more than 25%.
Am 08.02.2016 um 18:36 schrieb Nathaniel Smith
: On Feb 8, 2016 3:04 AM, "Nils Becker"
mailto:nilsc.becker@gmail.com> wrote: [...]
Very superficial benchmarks (see below) seem devastating for gnu libm. It seems that openlibm (compiled with gcc -mtune=native -O3) performs really well and intels libm implementation is the best (on my intel CPU). I did not check the accuracy of the functions, though.
My own code uses a lot of trigonometric and complex functions (optics calculations). I'd guess it could go 25% faster by just using a better libm implementation. Therefore, I have an interest in getting sane linking to a defined libm implementation to work.
On further thought: I guess that to do this we actually will need to change the names of the functions in openlibm and then use those names when calling from numpy. So long as we're using the regular libm symbol names, it doesn't matter what library the python extensions themselves are linked to; the way ELF symbol lookup works, the libm that the python interpreter is linked to will be checked *before* checking the libm that numpy is linked to, so the symbols will all get shadowed.
I guess statically linking openlibm would also work, but not sure that's a great idea since we'd need it multiple places.
Apparently openlibm seems quite a good choice for numpy, at least performance wise. However, I did not find any documentation or tests of the accuracy of its functions. A benchmarking and testing (for accuracy) code for libms would probably be a good starting point for a discussion. I could maybe help with that - but apparently not with any linking/building stuff (I just don't get it).
Benchmark:
gnu libm.so 3000 x sin(double[100000]): 6.68215647800389 s 3000 x log(double[100000]): 8.86350397899514 s 3000 x exp(double[100000]): 6.560557693999726 s
openlibm.so 3000 x sin(double[100000]): 4.5058218560006935 s 3000 x log(double[100000]): 4.106520485998772 s 3000 x exp(double[100000]): 4.597905882001214 s
Intel libimf.so 3000 x sin(double[100000]): 4.282402812998043 s 3000 x log(double[100000]): 4.008453270995233 s 3000 x exp(double[100000]): 3.301279639999848 s
I would be highly suspicious that this speed comes at the expense of accuracy... My impression is that there's a lot of room to make speed/accuracy tradeoffs in these functions, and modern glibc's libm has seen a fair amount of scrutiny by people who have access to the same code that openlibm is based off of. But then again, maybe not :-).
If these are the operations that you care about optimizing, an even better approach might be to figure out how to integrate a vector math library here like yeppp (BSD licensed) or MKL. Libm tries to optimize log(scalar); these are libraries that specifically try to optimize log(vector). Adding this would require changing numpy's code to use these new APIs though. (Very new gcc can also try to do this in some cases but I don't know how good at it it is... Julian might.)
Years ago I made the vectorized math functions from Intels Vector Math Library (VML), part of MKL, available for numpy, see https://github.com/geggo/uvml Not particularly difficult, you not even have to change numpy. For some cases (e.g., exp) I have seen speedups up to 5x-10x. Unfortunately MKL is not free, and free vector math libraries like yeppp implement much fewer functions or do not support the required strided memory layout. But to improve performance, numexpr, numba or theano are much better. Gregor
-n
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org mailto:NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion https://mail.scipy.org/mailman/listinfo/numpy-discussion
2016-02-08 18:54 GMT+01:00 Julian Taylor
which version of glibm was used here? There are significant difference in performance between versions. Also the input ranges are very important for these functions, depending on input the speed of these functions can vary by factors of 1000.
glibm now includes vectorized versions of most math functions, does openlibm have vectorized math? Thats where most speed can be gained, a lot more than 25%.
glibc 2.22 was used running on archlinux. As far as I know openlibm does
not include special vectorized functions. (for reference vectorized
operations in glibc: https://sourceware.org/glibc/wiki/libmvec).
2016-02-08 23:32 GMT+01:00 Gregor Thalhammer
Years ago I made the vectorized math functions from Intels Vector Math Library (VML), part of MKL, available for numpy, see https://github.com/geggo/uvml Not particularly difficult, you not even have to change numpy. For some cases (e.g., exp) I have seen speedups up to 5x-10x. Unfortunately MKL is not free, and free vector math libraries like yeppp implement much fewer functions or do not support the required strided memory layout. But to improve performance, numexpr, numba or theano are much better.
Gregor
Thank you very much for the link! I did not know about numpy.set_numeric_ops. You are right, vectorized operations can push down calculation time per element by factors. The benchmarks done for the yeppp-project also indicate that (however far you would trust them: http://www.yeppp.info/benchmarks.html). But I would agree that this domain should be left to specialized tools like numexpr as fully exploiting the speedup depends on the expression, that should be calculated. It is not suitable as a standard for numpy. Still, I think it would be good to give the possibility to choose the libm numpy links against. And be it simply to allow to choose or guarantee a specific accuracy/performance on different platforms and systems. Maybe maintaining a de-facto libm in npy_math could be replaced with a dependency on e.g. openlibm. But such a decision would require a thorough benchmark/testing of the available solutions. Especially with respect to the accuracy-performance-tradeoff that was mentioned. Cheers Nils
Am 09.02.2016 um 11:21 schrieb Nils Becker
: 2016-02-08 18:54 GMT+01:00 Julian Taylor
mailto:jtaylor.debian@googlemail.com>: which version of glibm was used here? There are significant difference in performance between versions. Also the input ranges are very important for these functions, depending on input the speed of these functions can vary by factors of 1000.
glibm now includes vectorized versions of most math functions, does openlibm have vectorized math? Thats where most speed can be gained, a lot more than 25%.
glibc 2.22 was used running on archlinux. As far as I know openlibm does not include special vectorized functions. (for reference vectorized operations in glibc: https://sourceware.org/glibc/wiki/libmvec https://sourceware.org/glibc/wiki/libmvec).
2016-02-08 23:32 GMT+01:00 Gregor Thalhammer
mailto:gregor.thalhammer@gmail.com>: Years ago I made the vectorized math functions from Intels Vector Math Library (VML), part of MKL, available for numpy, see https://github.com/geggo/uvml https://github.com/geggo/uvml Not particularly difficult, you not even have to change numpy. For some cases (e.g., exp) I have seen speedups up to 5x-10x. Unfortunately MKL is not free, and free vector math libraries like yeppp implement much fewer functions or do not support the required strided memory layout. But to improve performance, numexpr, numba or theano are much better. Gregor
Thank you very much for the link! I did not know about numpy.set_numeric_ops. You are right, vectorized operations can push down calculation time per element by factors. The benchmarks done for the yeppp-project also indicate that (however far you would trust them: http://www.yeppp.info/benchmarks.html http://www.yeppp.info/benchmarks.html). But I would agree that this domain should be left to specialized tools like numexpr as fully exploiting the speedup depends on the expression, that should be calculated. It is not suitable as a standard for bumpy.
Why should numpy not provide fast transcendental math functions? For linear algebra it supports fast implementations, even non-free (MKL). Wouldn’t it be nice if numpy outperforms C?
Still, I think it would be good to give the possibility to choose the libm numpy links against. And be it simply to allow to choose or guarantee a specific accuracy/performance on different platforms and systems. Maybe maintaining a de-facto libm in npy_math could be replaced with a dependency on e.g. openlibm. But such a decision would require a thorough benchmark/testing of the available solutions. Especially with respect to the accuracy-performance-tradeoff that was mentioned.
Intel publishes accuracy/performance charts for VML/MKL: https://software.intel.com/sites/products/documentation/doclib/mkl/vm/functi... https://software.intel.com/sites/products/documentation/doclib/mkl/vm/functi... For GNU libc it is more difficult to find similarly precise data, I only could find: http://www.gnu.org/software/libc/manual/html_node/Errors-in-Math-Functions.h... http://www.gnu.org/software/libc/manual/html_node/Errors-in-Math-Functions.h... Gregor
Cheers Nils
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
On 8 February 2016 at 18:36, Nathaniel Smith
I would be highly suspicious that this speed comes at the expense of accuracy... My impression is that there's a lot of room to make speed/accuracy tradeoffs in these functions, and modern glibc's libm has seen a fair amount of scrutiny by people who have access to the same code that openlibm is based off of. But then again, maybe not :-).
I did some digging, and I found this: http://julia-programming-language.2336112.n4.nabble.com/Is-the-accuracy-of-J... In short: according to their devs, most openlibm functions are accurate to less than 1ulp, while GNU libm is rounded to closest float. /David.
On Tue, Feb 9, 2016 at 7:06 AM, Daπid
On 8 February 2016 at 18:36, Nathaniel Smith
wrote: I would be highly suspicious that this speed comes at the expense of accuracy... My impression is that there's a lot of room to make speed/accuracy tradeoffs in these functions, and modern glibc's libm has seen a fair amount of scrutiny by people who have access to the same code that openlibm is based off of. But then again, maybe not :-).
I did some digging, and I found this:
http://julia-programming-language.2336112.n4.nabble.com/Is-the-accuracy-of-J...
In short: according to their devs, most openlibm functions are accurate to less than 1ulp, while GNU libm is rounded to closest float.
So GNU libm has max error <= 0.5 ULP, openlibm has <= 1 ULP, and OSX is (almost always) somewhere in-between. So, is <= 1 ULP good enough? Cheers, Matthew
It is not suitable as a standard for numpy.
Why should numpy not provide fast transcendental math functions? For
2016-02-09 18:02 GMT+01:00 Gregor Thalhammer
Intel publishes accuracy/performance charts for VML/MKL:
https://software.intel.com/sites/products/documentation/doclib/mkl/vm/functi...
For GNU libc it is more difficult to find similarly precise data, I only
could find:
http://www.gnu.org/software/libc/manual/html_node/Errors-in-Math-Functions.h...
On Tue, Feb 9, 2016 at 7:06 AM, Daπid
I did some digging, and I found this:
http://julia-programming-language.2336112.n4.nabble.com/Is-the-accuracy-of-J...
Thank you for looking that up! I did not knew about the stuff published by
Intel yet.
2016-02-09 20:13 GMT+01:00 Matthew Brett
So GNU libm has max error <= 0.5 ULP, openlibm has <= 1 ULP, and OSX is (almost always) somewhere in-between.
So, is <= 1 ULP good enough?
Calculating transcendental functions correctly rounded is very, very hard and to my knowledge there is no complete libm implementation that guarantees the necessary accuracy for all possible inputs. One effort was/is the Correctly Rounded LibM (crlibm [1]) which tried to prove the accuracy of their algorithms. However, the performance impact to achieve that last ulp in all rounding modes can be severe. Assessing accuracy of a function implementation is hard. Testing all possible inputs is not feasible (2^32/64 for single/double) and proving accuracy bounds may be even harder. Most of the time one samples accuracy with random numbers from a certain range. This generates tables like the ones for GNU libm or Intel. This is a kind of "faithful" accuracy as you believe that the accuracy you tested on a sample extends to the whole argument range. The error in worst case may be (much) bigger. That being said, I believe the values given by GNU libm for example are very trustworthy. libm is not always correctly rounded (which would be <= 0.5ulp in round-to-nearest), however, the error bounds given in the table seem to cover all worst cases. Common single-argument functions (sin, cos) are correctly rounded and even complex two-argument functions (cpow) are at most 5ulp off. I do not think that other implementations are more accurate. So libm is definitely good enough, accuracy-wise. In any case I would like to build a testing framework to compare some libms and check accuracy/performance (at least Intel has a history of underestimating their error bounds in transcendental functions [2]). crlibm offers worst-case arguments for some functions which could be used to complement randomized sampling. Maybe I have some time in the next weeks... [1] http://lipforge.ens-lyon.fr/www/crlibm/ [2] https://randomascii.wordpress.com/2014/10/09/intel-underestimates-error-boun...
participants (6)
-
Daπid
-
Gregor Thalhammer
-
Julian Taylor
-
Matthew Brett
-
Nathaniel Smith
-
Nils Becker