Hello,

I was wondering what is the fastest way (format) to multiply a sparse matrix with a numpy array. Intuitively, a csr format multiplied with a numpy array which is fortran contiguous seems to be the fastest, but I have ran a few benchmarks and it seems otherwise. It is also mentioned here
http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.sparse.csc_matrix.html that using csr matrices "may" be faster.

In [5]: X 
Out[5]: 
<11314x130107 sparse matrix of type '<type 'numpy.float64'>'
    with 1787565 stored elements in Compressed Sparse Row format>
In [6]: _, n_features = X.shape
In [9]: w_c = np.random.rand(n_features, 10)
In [10]: w_f = np.asarray(w_c, order='f')
In [13]: csc = sparse.csc_matrix(X)
In [30]: %timeit X * w_f
10 loops, best of 3: 40.5 ms per loop

In [31]: %timeit X * w_c
10 loops, best of 3: 37.3 ms per loop

In [32]: %timeit csc *  w_c
10 loops, best of 3: 24.3 ms per loop

In [33]: %timeit csc * w_f
10 loops, best of 3: 27.3 ms per loop

It seems here, using a csc matrix is faster with a C-contiguous numpy array which is completely non-intuitive to me. Are there any hard rules for this? or is it data dependent?

Sorry for my noobish questions!

Regards,
Manoj Kumar,
GSoC 2014, Scikit-learn
Mech Undergrad
http://manojbits.wordpress.com