Re: [SciPy-Dev] Improving performance of sparse matrix multiplication
It is possible to write your own openmp implimentation, performing the row multiplications in parallel, using Cython or whatever low language you like. Since sparse matrix dense vector multiplication is memory bandwidth limited, you will see good performance if the problem fits into CPU cache. Otherwise, you will see only marginal improvements. If your problem has structure, then that should be exploited to make things faster. Also permutations like reverse Cuthill-Mckee, in SciPy, can give a modest boost. Best, Paul On Aug 29, 2017 04:26, "Michele Martone" < michelemartone@users.sourceforge.net> wrote: Dear Marc, did you try the PyRSB prototype: "librsb is a high performance sparse matrix library implementing the Recursive Sparse Blocks format, which is especially well suited for multiplications in iterative methods on huge sparse matrices. PyRSB is a Cython-based Python interface to librsb." https://github.com/michelemartone/pyrsb ? How large are your matrices ? Are they symmetric ? If your matrices are large you might get quite of a speedup; if symmetric, even better. Best regards, Michele p.s.: PyRSB (a thin interface) is a prototype, but librsb itself http://librsb.sourceforge.net/ is in a mature state and usable also from Fortran, and OpenMP based. On 20170828@18:14, marc wrote:
Dear Scipy developers,
We are developing a program that perform a large number of sparse matrix multiplications. We recently wrote a Python version of this program for several reasons (the original code is in Fortran).
We are trying now to improve the performance of the Python version and we noticed that one of the bottlenecks are the sparse matrix multiplications, as example,
import numpy as np from scipy.sparse import csr_matrix
row = np.array([0, 0, 1, 2, 2, 2]) col = np.array([0, 2, 2, 0, 1, 2]) data = np.array([1, 2, 3, 4, 5, 6], dtype=np.float32)
csr = csr_matrix((data, (row, col)), shape=(3, 3)) print(csr.toarray())
A = np.array([1, 2, 3], dtype=np.float32)
print(csr*A)
I started to look at the Scipy code to see how this functions were implemented, and realized that there is no openmp parallelization over the for loops. Like in function csr_matvec in sparse/sparsetools/csr.h (line 1120). Is it possible to parallelize this loops with openmp? Do you have maybe better ideas to improve the performances for this kind of operations?
Best regards, Marc Barbry _______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
participants (1)
-
Paul Nation