A Strange Phenomenon with respect to Multi-threading

Hello~, I'm here again. Recently I came up with a really weird and confusing phenomenon. I used to dig deep into Numpy's C code to find out how np.add(a, b) is implemented and I thought I've figured it out. As far as I know, there is no multi-threading in Numpy for addition, instead, we use SIMD to speed it up. However, I found that for some certain situation with certain version of Numpy, there is multi-threading. Here is the screenshot (where we can see multi-threading clearly): [image: image.png] The environment is: Python 3.6.5 :: Anaconda, Inc. Numpy 1.14.3 np.show_config(): mkl_info: libraries = ['mkl_rt', 'pthread'] library_dirs = ['/home/SIXIE/tliu/anaconda3/lib'] define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)] include_dirs = ['/home/SIXIE/tliu/anaconda3/include'] blas_mkl_info: libraries = ['mkl_rt', 'pthread'] library_dirs = ['/home/SIXIE/tliu/anaconda3/lib'] define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)] include_dirs = ['/home/SIXIE/tliu/anaconda3/include'] blas_opt_info: libraries = ['mkl_rt', 'pthread'] library_dirs = ['/home/SIXIE/tliu/anaconda3/lib'] define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)] include_dirs = ['/home/SIXIE/tliu/anaconda3/include'] lapack_mkl_info: libraries = ['mkl_rt', 'pthread'] library_dirs = ['/home/SIXIE/tliu/anaconda3/lib'] define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)] include_dirs = ['/home/SIXIE/tliu/anaconda3/include'] lapack_opt_info: libraries = ['mkl_rt', 'pthread'] library_dirs = ['/home/SIXIE/tliu/anaconda3/lib'] define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)] include_dirs = ['/home/SIXIE/tliu/anaconda3/include'] However, when I try to repeat the multi-threading with my built-from-source Numpy, it failed (the cpu percentage is at most 100%): [image: image.png] And from the screenshot we can see that we also have mkl_rt and pthread when building the Numpy since I added it to *site.cfg*. The complete config info is: blas_armpl_info: NOT AVAILABLE blas_mkl_info: libraries = ['mkl_rt', 'pthread', 'mkl_rt'] library_dirs = ['/home/tliu/anaconda3/envs/numpy-dev/lib'] define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)] include_dirs = ['/home/tliu/anaconda3/envs/numpy-dev/include'] blas_opt_info: libraries = ['mkl_rt', 'pthread', 'mkl_rt'] library_dirs = ['/home/tliu/anaconda3/envs/numpy-dev/lib'] define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)] include_dirs = ['/home/tliu/anaconda3/envs/numpy-dev/include'] lapack_armpl_info: NOT AVAILABLE lapack_mkl_info: libraries = ['mkl_rt', 'pthread', 'mkl_rt'] library_dirs = ['/home/tliu/anaconda3/envs/numpy-dev/lib'] define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)] include_dirs = ['/home/tliu/anaconda3/envs/numpy-dev/include'] lapack_opt_info: libraries = ['mkl_rt', 'pthread', 'mkl_rt'] library_dirs = ['/home/tliu/anaconda3/envs/numpy-dev/lib'] define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)] include_dirs = ['/home/tliu/anaconda3/envs/numpy-dev/include'] Supported SIMD extensions in this NumPy install: baseline = SSE,SSE2,SSE3 found = SSSE3,SSE41,POPCNT,SSE42,AVX,F16C,FMA3,AVX2 not found = AVX512F,AVX512CD,AVX512_KNL,AVX512_KNM,AVX512_SKX,AVX512_CLX,AVX512_CNL,AVX512_ICL I want to repeat multi-threading because I really want to know in which part of C code Numpy dispatches this computation to an outer library which utilize multi-threading (Numpy itself doesn't have multi-threading, if we see multi-threads, it must be created by outer library like BLAS, mkl, etc. But we know BLAS doesn't have functions for adding two vectors, so there must be mkl. BLAS is for matrix multiplication). But with pip-installed Numpy*, *I can't debug with it; I could only use *gdb *with my own built-from-source Numpy. I must miss something so that multi-threading doesn't recur. Any instructions or suggestions? Thank to all you guys in advance~

The specific version of numpy that demonstrates the usage of threads is a patched build of numpy from Anaconda. That old version happened to include modifications to umath implementations of `np.add`, among others, to use implementations from the Vector Math Library component of the Intel MKL. Any builds from unpatched upstream sources won't have this behavior. I believe it got removed here: https://github.com/AnacondaRecipes/numpy-feedstock/commit/a0637efc6908f1152d... On Tue, Nov 15, 2022 at 12:40 AM 腾刘 <27rabbitlt@gmail.com> wrote:
-- Robert Kern

The specific version of numpy that demonstrates the usage of threads is a patched build of numpy from Anaconda. That old version happened to include modifications to umath implementations of `np.add`, among others, to use implementations from the Vector Math Library component of the Intel MKL. Any builds from unpatched upstream sources won't have this behavior. I believe it got removed here: https://github.com/AnacondaRecipes/numpy-feedstock/commit/a0637efc6908f1152d... On Tue, Nov 15, 2022 at 12:40 AM 腾刘 <27rabbitlt@gmail.com> wrote:
-- Robert Kern
participants (2)
-
Robert Kern
-
腾刘