[SciPy-Dev] Improving performance of sparse matrix multiplication

Paul Nation nonhermitian at gmail.com
Tue Aug 29 10:33:00 EDT 2017


It is possible to write your own openmp implimentation, performing the row
multiplications in parallel, using Cython or whatever low language you
like.  Since sparse matrix dense vector multiplication is memory bandwidth
limited, you will see good performance if the problem fits into CPU cache.
Otherwise, you will see only marginal improvements.  If your problem has
structure, then that should be exploited to make things faster.  Also
permutations like reverse Cuthill-Mckee, in SciPy, can give a modest boost.

Best,

Paul

On Aug 29, 2017 04:26, "Michele Martone" <
michelemartone at users.sourceforge.net> wrote:

Dear Marc,

did you try the PyRSB prototype:

 "librsb is a high performance sparse matrix library implementing the
  Recursive Sparse Blocks format, which is especially well suited for
  multiplications in iterative methods on huge sparse matrices.
  PyRSB is a Cython-based Python interface to librsb."
 https://github.com/michelemartone/pyrsb

?

How large are your matrices ?

Are they symmetric ?

If your matrices are large you might get quite of a speedup;
if symmetric, even better.

Best regards,
Michele

p.s.: PyRSB (a thin interface) is a prototype, but librsb itself
 http://librsb.sourceforge.net/
is in a mature state and usable also from Fortran, and OpenMP based.

On 20170828 at 18:14, marc wrote:
> Dear Scipy developers,
>
> We are developing a program that perform a large number of sparse matrix
> multiplications. We recently wrote a Python version of this program for
> several reasons (the original code is in Fortran).
>
> We are trying now to improve the performance of the Python version and we
> noticed that one of the bottlenecks are the sparse matrix multiplications,
> as example,
>
> import numpy as np
> from scipy.sparse import csr_matrix
>
> row = np.array([0, 0, 1, 2, 2, 2])
> col = np.array([0, 2, 2, 0, 1, 2])
> data = np.array([1, 2, 3, 4, 5, 6], dtype=np.float32)
>
> csr = csr_matrix((data, (row, col)), shape=(3, 3))
> print(csr.toarray())
>
> A = np.array([1, 2, 3], dtype=np.float32)
>
> print(csr*A)
>
>
> I started to look at the Scipy code to see how this functions were
> implemented, and realized that there is no openmp parallelization over the
> for loops. Like in function  csr_matvec in sparse/sparsetools/csr.h (line
> 1120). Is it possible to parallelize this loops with openmp? Do you have
> maybe better ideas to improve the performances for this kind of
operations?
>
> Best regards,
> Marc Barbry
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev

_______________________________________________
SciPy-Dev mailing list
SciPy-Dev at python.org
https://mail.python.org/mailman/listinfo/scipy-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20170829/1bf4f867/attachment.html>


More information about the SciPy-Dev mailing list