![](https://secure.gravatar.com/avatar/b66aa02d1410b5a0765fbece33299a7f.jpg?s=120&d=mm&r=g)
Hello everyone, I'm interested in the numpy project and tried a lot with the numpy array. I'm wondering what is actually done that there is so much overhead when I call a function in Numpy. What is the reason? Thanks in advance. Regards Sebastian Kaster
![](https://secure.gravatar.com/avatar/697900d3a29858ea20cc109a2aee0af6.jpg?s=120&d=mm&r=g)
You are going to need to provide much more context than that. Overhead compared to what? And where (io, cpu, etc.)? What are the size of your arrays, and what sort of operations are you doing? Finally, how much overhead are you seeing? There can be all sorts of reasons for overhead, and some can easily be mitigated, and others not so much. Cheers! Ben Root On Tue, Feb 28, 2017 at 4:47 PM, Sebastian K <sebastiankaster@googlemail.com
wrote:
![](https://secure.gravatar.com/avatar/b66aa02d1410b5a0765fbece33299a7f.jpg?s=120&d=mm&r=g)
Thank you for your answer. For example a very simple algorithm is a matrix multiplication. I can see that the heap peak is much higher for the numpy version in comparison to a pure python 3 implementation. The heap is measured with the libmemusage from libc: *heap peak* Maximum of all *size* arguments of malloc(3) <http://man7.org/linux/man-pages/man3/malloc.3.html>, all products of *nmemb***size* of calloc(3) <http://man7.org/linux/man-pages/man3/calloc.3.html>, all *size* arguments of realloc(3) <http://man7.org/linux/man-pages/man3/realloc.3.html>, *length* arguments of mmap(2) <http://man7.org/linux/man-pages/man2/mmap.2.html>, and *new_size* arguments of mremap(2) <http://man7.org/linux/man-pages/man2/mremap.2.html>. Regards Sebastian On 28 Feb 2017 11:03 p.m., "Benjamin Root" <ben.v.root@gmail.com> wrote:
![](https://secure.gravatar.com/avatar/b4929294417e9ac44c17967baae75a36.jpg?s=120&d=mm&r=g)
Hi, On Tue, Feb 28, 2017 at 2:12 PM, Sebastian K <sebastiankaster@googlemail.com> wrote:
Could you post the exact code you're comparing? I think you'll find that a naive Python 3 matrix multiplication method is much, much slower than the same thing with Numpy, with arrays of any reasonable size. Cheers, Matthew
![](https://secure.gravatar.com/avatar/b66aa02d1410b5a0765fbece33299a7f.jpg?s=120&d=mm&r=g)
Yes it is true the execution time is much faster with the numpy function. The Code for numpy version: def createMatrix(n): Matrix = np.empty(shape=(n,n), dtype='float64') for x in range(n): for y in range(n): Matrix[x, y] = 0.1 + ((x*y)%1000)/1000.0 return Matrix if __name__ == '__main__': n = getDimension() if n > 0: A = createMatrix(n) B = createMatrix(n) C = np.empty(shape=(n,n), dtype='float64') C = np.dot(A,B) #print(C) In the pure python version I am just implementing the multiplication with three for-loops. Measured data with libmemusage: dimension of matrix: 100x100 heap peak pure python3: 1060565 heap peakt numpy function: 4917180 2017-02-28 23:17 GMT+01:00 Matthew Brett <matthew.brett@gmail.com>:
![](https://secure.gravatar.com/avatar/b4929294417e9ac44c17967baae75a36.jpg?s=120&d=mm&r=g)
Hi, On Tue, Feb 28, 2017 at 3:04 PM, Sebastian K <sebastiankaster@googlemail.com> wrote:
Yes you are right. There is no need to add that line. I deleted it. But the measured heap peak is still the same.
You're applying the naive matrix multiplication algorithm, which is ideal for minimizing memory use during the computation, but terrible for speed-related stuff like keeping values in the CPU cache: https://en.wikipedia.org/wiki/Matrix_multiplication_algorithm The Numpy version is likely calling into a highly optimized compiled routine for matrix multiplication, which can load chunks of the matrices at a time, to speed up computation. If you really need minimum memory heap usage and don't care about the order of magnitude(s) slowdown, then you might need to use the naive method, maybe implemented in Cython / C. Cheers, Matthew
![](https://secure.gravatar.com/avatar/97c543aca1ac7bbcfb5279d0300c8330.jpg?s=120&d=mm&r=g)
On Feb 28, 2017 2:57 PM, "Sebastian K" <sebastiankaster@googlemail.com> wrote: Yes it is true the execution time is much faster with the numpy function. The Code for numpy version: def createMatrix(n): Matrix = np.empty(shape=(n,n), dtype='float64') for x in range(n): for y in range(n): Matrix[x, y] = 0.1 + ((x*y)%1000)/1000.0 return Matrix if __name__ == '__main__': n = getDimension() if n > 0: A = createMatrix(n) B = createMatrix(n) C = np.empty(shape=(n,n), dtype='float64') C = np.dot(A,B) #print(C) In the pure python version I am just implementing the multiplication with three for-loops. Measured data with libmemusage: dimension of matrix: 100x100 heap peak pure python3: 1060565 heap peakt numpy function: 4917180 4 megabytes is less than the memory needed just to load numpy :-). Try a 1000x1000 array (or even bigger), and I think you'll see more reasonable results. -n
![](https://secure.gravatar.com/avatar/697900d3a29858ea20cc109a2aee0af6.jpg?s=120&d=mm&r=g)
You are going to need to provide much more context than that. Overhead compared to what? And where (io, cpu, etc.)? What are the size of your arrays, and what sort of operations are you doing? Finally, how much overhead are you seeing? There can be all sorts of reasons for overhead, and some can easily be mitigated, and others not so much. Cheers! Ben Root On Tue, Feb 28, 2017 at 4:47 PM, Sebastian K <sebastiankaster@googlemail.com
wrote:
![](https://secure.gravatar.com/avatar/b66aa02d1410b5a0765fbece33299a7f.jpg?s=120&d=mm&r=g)
Thank you for your answer. For example a very simple algorithm is a matrix multiplication. I can see that the heap peak is much higher for the numpy version in comparison to a pure python 3 implementation. The heap is measured with the libmemusage from libc: *heap peak* Maximum of all *size* arguments of malloc(3) <http://man7.org/linux/man-pages/man3/malloc.3.html>, all products of *nmemb***size* of calloc(3) <http://man7.org/linux/man-pages/man3/calloc.3.html>, all *size* arguments of realloc(3) <http://man7.org/linux/man-pages/man3/realloc.3.html>, *length* arguments of mmap(2) <http://man7.org/linux/man-pages/man2/mmap.2.html>, and *new_size* arguments of mremap(2) <http://man7.org/linux/man-pages/man2/mremap.2.html>. Regards Sebastian On 28 Feb 2017 11:03 p.m., "Benjamin Root" <ben.v.root@gmail.com> wrote:
![](https://secure.gravatar.com/avatar/b4929294417e9ac44c17967baae75a36.jpg?s=120&d=mm&r=g)
Hi, On Tue, Feb 28, 2017 at 2:12 PM, Sebastian K <sebastiankaster@googlemail.com> wrote:
Could you post the exact code you're comparing? I think you'll find that a naive Python 3 matrix multiplication method is much, much slower than the same thing with Numpy, with arrays of any reasonable size. Cheers, Matthew
![](https://secure.gravatar.com/avatar/b66aa02d1410b5a0765fbece33299a7f.jpg?s=120&d=mm&r=g)
Yes it is true the execution time is much faster with the numpy function. The Code for numpy version: def createMatrix(n): Matrix = np.empty(shape=(n,n), dtype='float64') for x in range(n): for y in range(n): Matrix[x, y] = 0.1 + ((x*y)%1000)/1000.0 return Matrix if __name__ == '__main__': n = getDimension() if n > 0: A = createMatrix(n) B = createMatrix(n) C = np.empty(shape=(n,n), dtype='float64') C = np.dot(A,B) #print(C) In the pure python version I am just implementing the multiplication with three for-loops. Measured data with libmemusage: dimension of matrix: 100x100 heap peak pure python3: 1060565 heap peakt numpy function: 4917180 2017-02-28 23:17 GMT+01:00 Matthew Brett <matthew.brett@gmail.com>:
![](https://secure.gravatar.com/avatar/b4929294417e9ac44c17967baae75a36.jpg?s=120&d=mm&r=g)
Hi, On Tue, Feb 28, 2017 at 3:04 PM, Sebastian K <sebastiankaster@googlemail.com> wrote:
Yes you are right. There is no need to add that line. I deleted it. But the measured heap peak is still the same.
You're applying the naive matrix multiplication algorithm, which is ideal for minimizing memory use during the computation, but terrible for speed-related stuff like keeping values in the CPU cache: https://en.wikipedia.org/wiki/Matrix_multiplication_algorithm The Numpy version is likely calling into a highly optimized compiled routine for matrix multiplication, which can load chunks of the matrices at a time, to speed up computation. If you really need minimum memory heap usage and don't care about the order of magnitude(s) slowdown, then you might need to use the naive method, maybe implemented in Cython / C. Cheers, Matthew
![](https://secure.gravatar.com/avatar/97c543aca1ac7bbcfb5279d0300c8330.jpg?s=120&d=mm&r=g)
On Feb 28, 2017 2:57 PM, "Sebastian K" <sebastiankaster@googlemail.com> wrote: Yes it is true the execution time is much faster with the numpy function. The Code for numpy version: def createMatrix(n): Matrix = np.empty(shape=(n,n), dtype='float64') for x in range(n): for y in range(n): Matrix[x, y] = 0.1 + ((x*y)%1000)/1000.0 return Matrix if __name__ == '__main__': n = getDimension() if n > 0: A = createMatrix(n) B = createMatrix(n) C = np.empty(shape=(n,n), dtype='float64') C = np.dot(A,B) #print(C) In the pure python version I am just implementing the multiplication with three for-loops. Measured data with libmemusage: dimension of matrix: 100x100 heap peak pure python3: 1060565 heap peakt numpy function: 4917180 4 megabytes is less than the memory needed just to load numpy :-). Try a 1000x1000 array (or even bigger), and I think you'll see more reasonable results. -n
participants (5)
-
Benjamin Root
-
Joseph Fox-Rabinovitz
-
Matthew Brett
-
Nathaniel Smith
-
Sebastian K