Mailman 3 fastest way to populate sparse matrix? - SciPy-User

fastest way to populate sparse matrix?

Peter Skomoroch

10 Dec 2008 10 Dec '08

11:54 a.m.

What is the fastest way to replace non-zero elements of a sparse matrix with corresponding elements from a product of dense matrices, without the memory overhead of computing the entire dense matrix product? The code below demonstrates the way I am doing it now: looping through the nonzero elements in the sparse matrix, and forming the corresponding row - column product from the dense matrices. It uses the sparse module from the latest scipy trunk. -Pete #!/usr/bin/env python # encoding: utf-8 """ sparse_fill.py uses latest sparse package from scipy svn """ import sys import os import time from numpy import * from numpy.random import random import scipy.sparse as sparse from scipy import io import urllib import tarfile def main(): # number of rows 1,916 # number of columns 1,916 # nonzeros 195,985 print "loading matrix..." f = urllib.urlopen("http://www.cise.ufl.edu/research/sparse/MM/JGD_CAG/CAG_mat1916.tar.gz") tar_download = open('CAG_mat1916.tar.gz','w') print >> tar_download, f.read() tar_download.close() tar = tarfile.open("CAG_mat1916.tar.gz", "r:gz") tar.extractall() # V is a sparse matrix, with around 5 % of entries populated V = io.mmread("CAG_mat1916/CAG_mat1916.mtx").tocsr() n,m = V.shape r = 200 eps=1e-9 nonzeros = [tuple(ij) for ij in transpose(V.nonzero())] W = (random([n,r]) ).astype(float32) H = random([r,m]).astype(float32) ################################################# # This block needs to be executed many times... # Fill non-zero elements of a sparse matrix with # corresponding elements from dense matrix product. # We don't want to compute the full matrix product so # we can save memory print "filling..." t0=time.time() V_approx = sparse.lil_matrix((n,m), dtype=float32) for i,j in nonzeros: V_approx[i,j] = dot(W[i,:],H[:,j]) print "time to fill:", time.time() -t0 W_factor = (V*H.T + eps ) / ( V_approx*H.T + eps) W = W_factor*W #################################################### if __name__ == '__main__': main() -- Peter N. Skomoroch peter.skomoroch@gmail.com http://www.datawrangling.com http://del.icio.us/pskomoroch

Show replies by date

Nathan Bell

10 Dec 10 Dec

12:46 p.m.

New subject: [SciPy-user] fastest way to populate sparse matrix?

On Wed, Dec 10, 2008 at 12:54 PM, Peter Skomoroch <peter.skomoroch@gmail.com> wrote:

...

What is the fastest way to replace non-zero elements of a sparse matrix with corresponding elements from a product of dense matrices, without the memory overhead of computing the entire dense matrix product?

The code below demonstrates the way I am doing it now: looping through the nonzero elements in the sparse matrix, and forming the corresponding row - column product from the dense matrices. It uses the sparse module from the latest scipy trunk.

The fastest way to construct a sparse matrix is using the COO format as discussed here: http://www.scipy.org/SciPyPackages/Sparse#head-be8a0be5d0e44c4d59550d64fb017... Using COO instead of LIL should be considerably faster. -- Nathan Bell wnbell@gmail.com http://graphics.cs.uiuc.edu/~wnbell/

Peter Skomoroch

3:18 p.m.

New subject: [SciPy-user] fastest way to populate sparse matrix?

Nathan, Thanks for the pointer, I had missed that wiki page. Using coo_matrix as you suggest gives over a 5x speedup: time to fill lil_matrix: 10.4162230492 total time to fill coo_matrix: 1.82639312744 The bottleneck now seems to be this for-loop, which takes the majority of the remaining time (1.82258105278 seconds): for index, (i,j) in enumerate(nonzero_indices): data[index] = dot(W[i,:],H[:,j]) Is there a better approach for this assignment block? -Pete New code: #!/usr/bin/env python # encoding: utf-8 """ sparse_fill.py uses latest sparse package from scipy svn """ import sys import os import time from numpy import * from numpy.random import random import scipy.sparse as sparse from scipy import io import urllib import tarfile def lil_fill(V): print "Filling lil_matrix" n,m = V.shape r = 200 eps=1e-9 nonzeros = [tuple(ij) for ij in transpose(V.nonzero())] W = (random([n,r]) ).astype(float32) H = random([r,m]).astype(float32) print "filling..." t0=time.time() V_approx = sparse.lil_matrix((n,m), dtype=float32) for i,j in nonzeros: V_approx[i,j] = dot(W[i,:],H[:,j]) print "time to fill:", time.time() -t0 W_factor = (V*H.T + eps ) / ( V_approx*H.T + eps) W = W_factor*W #################################################### print "done...\n\n" def coo_fill(V): print "Filling coo_matrix" n,m = V.shape r = 200 eps=1e-9 nonzeros = V.nonzero() nonzero_indices = [tuple(ij) for ij in transpose(nonzeros)] L = len(nonzeros[0]) W = (random([n,r]) ).astype(float32) H = random([r,m]).astype(float32) print "filling..." t0=time.time() row = nonzeros[0] # row indices go here col = nonzeros[1] # column indices go here data = zeros(L) for index, (i,j) in enumerate(nonzero_indices): data[index] = dot(W[i,:],H[:,j]) print "data assignment done, filling matrix", time.time() -t0 #data = ones(L) # data values go here V_approx = sparse.coo_matrix((data,(row,col)), shape=(n,m)) print "total time to fill coo_matrix:", time.time() -t0 W_factor = (V*H.T + eps ) / ( V_approx*H.T + eps) W = W_factor*W #################################################### print "done...\n\n" def main(): # number of rows 1,916 # number of columns 1,916 # nonzeros 195,985 print "loading matrix..." f = urllib.urlopen("http://www.cise.ufl.edu/research/sparse/MM/JGD_CAG/CAG_mat1916.tar.gz") tar_download = open('CAG_mat1916.tar.gz','w') print >> tar_download, f.read() tar_download.close() tar = tarfile.open("CAG_mat1916.tar.gz", "r:gz") tar.extractall() # V is a sparse matrix, with around 5 % of entries populated V = io.mmread("CAG_mat1916/CAG_mat1916.mtx").tocsr() lil_fill(V) coo_fill(V) if __name__ == '__main__': main() On Wed, Dec 10, 2008 at 1:46 PM, Nathan Bell <wnbell@gmail.com> wrote:

...

On Wed, Dec 10, 2008 at 12:54 PM, Peter Skomoroch <peter.skomoroch@gmail.com> wrote:

...
What is the fastest way to replace non-zero elements of a sparse matrix with corresponding elements from a product of dense matrices, without the memory overhead of computing the entire dense matrix product?

The code below demonstrates the way I am doing it now: looping through the nonzero elements in the sparse matrix, and forming the corresponding row - column product from the dense matrices. It uses the sparse module from the latest scipy trunk.

The fastest way to construct a sparse matrix is using the COO format as discussed here: http://www.scipy.org/SciPyPackages/Sparse#head-be8a0be5d0e44c4d59550d64fb017...

Using COO instead of LIL should be considerably faster.

-- Nathan Bell wnbell@gmail.com http://graphics.cs.uiuc.edu/~wnbell/ _______________________________________________ SciPy-user mailing list SciPy-user@scipy.org http://projects.scipy.org/mailman/listinfo/scipy-user

-- Peter N. Skomoroch peter.skomoroch@gmail.com http://www.datawrangling.com http://del.icio.us/pskomoroch

Nathan Bell

3:46 p.m.

New subject: [SciPy-user] fastest way to populate sparse matrix?

On Wed, Dec 10, 2008 at 4:18 PM, Peter Skomoroch <peter.skomoroch@gmail.com> wrote:

...

Nathan,

Thanks for the pointer, I had missed that wiki page.

It's fairly recent, so don't feel bad :)

...

The bottleneck now seems to be this for-loop, which takes the majority of the remaining time (1.82258105278 seconds):

for index, (i,j) in enumerate(nonzero_indices): data[index] = dot(W[i,:],H[:,j])

Is there a better approach for this assignment block?

You could vectorize the loop: W = random([n,r]).astype(float32) H = random([m,r]).astype(float32) # note, shape is (m,r) I,J = V.nonzero() X = (W[I,:] * H[J,:]).sum(axis=1) V_approx = sparse.coo_matrix((X,(I,J)), shape=(n,m)) If memory usage of the above is too costly, you could use the same approach, but on fixed-sized chunks of the arrays. -- Nathan Bell wnbell@gmail.com http://graphics.cs.uiuc.edu/~wnbell/

Peter Skomoroch

5:56 p.m.

New subject: [SciPy-user] fastest way to populate sparse matrix?

Hmmm, surprisingly the vectorized version seems to take longer: Original method: Filling coo_matrix filling... data assignment done, filling matrix 1.84783697128 total time to fill coo_matrix: 1.85190200806 done... Vectorized: Filling coo_matrix filling... data assignment done, filling matrix 3.22157812119 total time to fill coo_matrix: 3.2216091156 done... On Wed, Dec 10, 2008 at 4:46 PM, Nathan Bell <wnbell@gmail.com> wrote:

...

On Wed, Dec 10, 2008 at 4:18 PM, Peter Skomoroch <peter.skomoroch@gmail.com> wrote:

...
Nathan,

Thanks for the pointer, I had missed that wiki page.

It's fairly recent, so don't feel bad :)

...
The bottleneck now seems to be this for-loop, which takes the majority of the remaining time (1.82258105278 seconds):

for index, (i,j) in enumerate(nonzero_indices): data[index] = dot(W[i,:],H[:,j])

Is there a better approach for this assignment block?

You could vectorize the loop:

W = random([n,r]).astype(float32) H = random([m,r]).astype(float32) # note, shape is (m,r)

I,J = V.nonzero() X = (W[I,:] * H[J,:]).sum(axis=1) V_approx = sparse.coo_matrix((X,(I,J)), shape=(n,m))

If memory usage of the above is too costly, you could use the same approach, but on fixed-sized chunks of the arrays.

-- Nathan Bell wnbell@gmail.com http://graphics.cs.uiuc.edu/~wnbell/ _______________________________________________ SciPy-user mailing list SciPy-user@scipy.org http://projects.scipy.org/mailman/listinfo/scipy-user

-- Peter N. Skomoroch peter.skomoroch@gmail.com http://www.datawrangling.com http://del.icio.us/pskomoroch

Gary Strangman

6:51 p.m.

New subject: [SciPy-user] Rotate volume data and regrid?

Hi scipy-experts, I have a 3D array (180x200x200 elements) that represents a 3D volume in space. I want to rotate the data in this array by an arbitrary amount around an arbitrary point inside the volume, then re-grid the result back into the original voxels. Does anyone know of a scipythonic or numpythonic function, package, or even an algorithm to achieve this? -best Gary

Gael Varoquaux

11:59 p.m.

New subject: [SciPy-user] Rotate volume data and regrid?

On Wed, Dec 10, 2008 at 07:51:27PM -0500, Gary Strangman wrote:

...

Hi scipy-experts,

...

I have a 3D array (180x200x200 elements) that represents a 3D volume in space. I want to rotate the data in this array by an arbitrary amount around an arbitrary point inside the volume, then re-grid the result back into the original voxels. Does anyone know of a scipythonic or numpythonic function, package, or even an algorithm to achieve this?

You will probably find what you need in one of the functions listed on http://docs.scipy.org/scipy/docs/scipy.ndimage.interpolation/#scipy-ndimage-... Gaël

Gary Strangman

11 Dec 11 Dec

6:30 a.m.

New subject: [SciPy-user] Rotate volume data and regrid?

...

...
Hi scipy-experts,

...
I have a 3D array (180x200x200 elements) that represents a 3D volume in space. I want to rotate the data in this array by an arbitrary amount around an arbitrary point inside the volume, then re-grid the result back into the original voxels. Does anyone know of a scipythonic or numpythonic function, package, or even an algorithm to achieve this?

You will probably find what you need in one of the functions listed on http://docs.scipy.org/scipy/docs/scipy.ndimage.interpolation/#scipy-ndimage-...

Not only had I forgotten about ndimage, but had no recollection of all the nifty things in there. Thanks for saving me a week of wheel-reinvention! G

5683

Age (days ago)

5684

Last active (days ago)

List overview

Download

7 comments

4 participants

participants (4)

Gael Varoquaux
Gary Strangman
Nathan Bell
Peter Skomoroch

fastest way to populate sparse matrix?

tags

participants (4)