Improving performance of sparse matrix multiplication
Dear Scipy developers, We are developing a program that perform a large number of sparse matrix multiplications. We recently wrote a Python version of this program for several reasons (the original code is in Fortran). We are trying now to improve the performance of the Python version and we noticed that one of the bottlenecks are the sparse matrix multiplications, as example, import numpy as np from scipy.sparse import csr_matrix row = np.array([0, 0, 1, 2, 2, 2]) col = np.array([0, 2, 2, 0, 1, 2]) data = np.array([1, 2, 3, 4, 5, 6], dtype=np.float32) csr = csr_matrix((data, (row, col)), shape=(3, 3)) print(csr.toarray()) A = np.array([1, 2, 3], dtype=np.float32) print(csr*A) I started to look at the Scipy code to see how this functions were implemented, and realized that there is no openmp parallelization over the for loops. Like in function csr_matvec in sparse/sparsetools/csr.h (line 1120). Is it possible to parallelize this loops with openmp? Do you have maybe better ideas to improve the performances for this kind of operations? Best regards, Marc Barbry
On Tue, Aug 29, 2017 at 4:14 AM, marc <marc.barbry@mailoo.org> wrote:
Dear Scipy developers,
We are developing a program that perform a large number of sparse matrix multiplications. We recently wrote a Python version of this program for several reasons (the original code is in Fortran).
We are trying now to improve the performance of the Python version and we noticed that one of the bottlenecks are the sparse matrix multiplications, as example,
import numpy as np from scipy.sparse import csr_matrix
row = np.array([0, 0, 1, 2, 2, 2]) col = np.array([0, 2, 2, 0, 1, 2]) data = np.array([1, 2, 3, 4, 5, 6], dtype=np.float32)
csr = csr_matrix((data, (row, col)), shape=(3, 3)) print(csr.toarray())
A = np.array([1, 2, 3], dtype=np.float32)
print(csr*A)
I started to look at the Scipy code to see how this functions were implemented, and realized that there is no openmp parallelization over the for loops. Like in function csr_matvec in sparse/sparsetools/csr.h (line 1120). Is it possible to parallelize this loops with openmp?
Short answer: no openmp in scipy. It has been discussed a number of times before, see for example http://numpy-discussion.10968.n7.nabble.com/Cython-based-OpenMP-accelerated-... Cheers, Ralf
Do you have maybe better ideas to improve the performances for this kind of operations?
Best regards, Marc Barbry _______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
Dear Marc, did you try the PyRSB prototype: "librsb is a high performance sparse matrix library implementing the Recursive Sparse Blocks format, which is especially well suited for multiplications in iterative methods on huge sparse matrices. PyRSB is a Cython-based Python interface to librsb." https://github.com/michelemartone/pyrsb ? How large are your matrices ? Are they symmetric ? If your matrices are large you might get quite of a speedup; if symmetric, even better. Best regards, Michele p.s.: PyRSB (a thin interface) is a prototype, but librsb itself http://librsb.sourceforge.net/ is in a mature state and usable also from Fortran, and OpenMP based. On 20170828@18:14, marc wrote:
Dear Scipy developers,
We are developing a program that perform a large number of sparse matrix multiplications. We recently wrote a Python version of this program for several reasons (the original code is in Fortran).
We are trying now to improve the performance of the Python version and we noticed that one of the bottlenecks are the sparse matrix multiplications, as example,
import numpy as np from scipy.sparse import csr_matrix
row = np.array([0, 0, 1, 2, 2, 2]) col = np.array([0, 2, 2, 0, 1, 2]) data = np.array([1, 2, 3, 4, 5, 6], dtype=np.float32)
csr = csr_matrix((data, (row, col)), shape=(3, 3)) print(csr.toarray())
A = np.array([1, 2, 3], dtype=np.float32)
print(csr*A)
I started to look at the Scipy code to see how this functions were implemented, and realized that there is no openmp parallelization over the for loops. Like in function csr_matvec in sparse/sparsetools/csr.h (line 1120). Is it possible to parallelize this loops with openmp? Do you have maybe better ideas to improve the performances for this kind of operations?
Best regards, Marc Barbry _______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
participants (3)
-
marc -
Michele Martone -
Ralf Gommers