![](https://secure.gravatar.com/avatar/742736902afc4e0926b96932ada8cbc1.jpg?s=120&d=mm&r=g)
What is the fastest way to replace non-zero elements of a sparse matrix with corresponding elements from a product of dense matrices, without the memory overhead of computing the entire dense matrix product? The code below demonstrates the way I am doing it now: looping through the nonzero elements in the sparse matrix, and forming the corresponding row - column product from the dense matrices. It uses the sparse module from the latest scipy trunk. -Pete #!/usr/bin/env python # encoding: utf-8 """ sparse_fill.py uses latest sparse package from scipy svn """ import sys import os import time from numpy import * from numpy.random import random import scipy.sparse as sparse from scipy import io import urllib import tarfile def main(): # number of rows 1,916 # number of columns 1,916 # nonzeros 195,985 print "loading matrix..." f = urllib.urlopen("http://www.cise.ufl.edu/research/sparse/MM/JGD_CAG/CAG_mat1916.tar.gz") tar_download = open('CAG_mat1916.tar.gz','w') print >> tar_download, f.read() tar_download.close() tar = tarfile.open("CAG_mat1916.tar.gz", "r:gz") tar.extractall() # V is a sparse matrix, with around 5 % of entries populated V = io.mmread("CAG_mat1916/CAG_mat1916.mtx").tocsr() n,m = V.shape r = 200 eps=1e-9 nonzeros = [tuple(ij) for ij in transpose(V.nonzero())] W = (random([n,r]) ).astype(float32) H = random([r,m]).astype(float32) ################################################# # This block needs to be executed many times... # Fill non-zero elements of a sparse matrix with # corresponding elements from dense matrix product. # We don't want to compute the full matrix product so # we can save memory print "filling..." t0=time.time() V_approx = sparse.lil_matrix((n,m), dtype=float32) for i,j in nonzeros: V_approx[i,j] = dot(W[i,:],H[:,j]) print "time to fill:", time.time() -t0 W_factor = (V*H.T + eps ) / ( V_approx*H.T + eps) W = W_factor*W #################################################### if __name__ == '__main__': main() -- Peter N. Skomoroch peter.skomoroch@gmail.com http://www.datawrangling.com http://del.icio.us/pskomoroch
![](https://secure.gravatar.com/avatar/3e3b4dff04d6c945d0f990ea8c362abb.jpg?s=120&d=mm&r=g)
On Wed, Dec 10, 2008 at 12:54 PM, Peter Skomoroch <peter.skomoroch@gmail.com> wrote:
What is the fastest way to replace non-zero elements of a sparse matrix with corresponding elements from a product of dense matrices, without the memory overhead of computing the entire dense matrix product?
The code below demonstrates the way I am doing it now: looping through the nonzero elements in the sparse matrix, and forming the corresponding row - column product from the dense matrices. It uses the sparse module from the latest scipy trunk.
The fastest way to construct a sparse matrix is using the COO format as discussed here: http://www.scipy.org/SciPyPackages/Sparse#head-be8a0be5d0e44c4d59550d64fb017... Using COO instead of LIL should be considerably faster. -- Nathan Bell wnbell@gmail.com http://graphics.cs.uiuc.edu/~wnbell/
![](https://secure.gravatar.com/avatar/742736902afc4e0926b96932ada8cbc1.jpg?s=120&d=mm&r=g)
Nathan, Thanks for the pointer, I had missed that wiki page. Using coo_matrix as you suggest gives over a 5x speedup: time to fill lil_matrix: 10.4162230492 total time to fill coo_matrix: 1.82639312744 The bottleneck now seems to be this for-loop, which takes the majority of the remaining time (1.82258105278 seconds): for index, (i,j) in enumerate(nonzero_indices): data[index] = dot(W[i,:],H[:,j]) Is there a better approach for this assignment block? -Pete New code: #!/usr/bin/env python # encoding: utf-8 """ sparse_fill.py uses latest sparse package from scipy svn """ import sys import os import time from numpy import * from numpy.random import random import scipy.sparse as sparse from scipy import io import urllib import tarfile def lil_fill(V): print "Filling lil_matrix" n,m = V.shape r = 200 eps=1e-9 nonzeros = [tuple(ij) for ij in transpose(V.nonzero())] W = (random([n,r]) ).astype(float32) H = random([r,m]).astype(float32) print "filling..." t0=time.time() V_approx = sparse.lil_matrix((n,m), dtype=float32) for i,j in nonzeros: V_approx[i,j] = dot(W[i,:],H[:,j]) print "time to fill:", time.time() -t0 W_factor = (V*H.T + eps ) / ( V_approx*H.T + eps) W = W_factor*W #################################################### print "done...\n\n" def coo_fill(V): print "Filling coo_matrix" n,m = V.shape r = 200 eps=1e-9 nonzeros = V.nonzero() nonzero_indices = [tuple(ij) for ij in transpose(nonzeros)] L = len(nonzeros[0]) W = (random([n,r]) ).astype(float32) H = random([r,m]).astype(float32) print "filling..." t0=time.time() row = nonzeros[0] # row indices go here col = nonzeros[1] # column indices go here data = zeros(L) for index, (i,j) in enumerate(nonzero_indices): data[index] = dot(W[i,:],H[:,j]) print "data assignment done, filling matrix", time.time() -t0 #data = ones(L) # data values go here V_approx = sparse.coo_matrix((data,(row,col)), shape=(n,m)) print "total time to fill coo_matrix:", time.time() -t0 W_factor = (V*H.T + eps ) / ( V_approx*H.T + eps) W = W_factor*W #################################################### print "done...\n\n" def main(): # number of rows 1,916 # number of columns 1,916 # nonzeros 195,985 print "loading matrix..." f = urllib.urlopen("http://www.cise.ufl.edu/research/sparse/MM/JGD_CAG/CAG_mat1916.tar.gz") tar_download = open('CAG_mat1916.tar.gz','w') print >> tar_download, f.read() tar_download.close() tar = tarfile.open("CAG_mat1916.tar.gz", "r:gz") tar.extractall() # V is a sparse matrix, with around 5 % of entries populated V = io.mmread("CAG_mat1916/CAG_mat1916.mtx").tocsr() lil_fill(V) coo_fill(V) if __name__ == '__main__': main() On Wed, Dec 10, 2008 at 1:46 PM, Nathan Bell <wnbell@gmail.com> wrote:
On Wed, Dec 10, 2008 at 12:54 PM, Peter Skomoroch <peter.skomoroch@gmail.com> wrote:
What is the fastest way to replace non-zero elements of a sparse matrix with corresponding elements from a product of dense matrices, without the memory overhead of computing the entire dense matrix product?
The code below demonstrates the way I am doing it now: looping through the nonzero elements in the sparse matrix, and forming the corresponding row - column product from the dense matrices. It uses the sparse module from the latest scipy trunk.
The fastest way to construct a sparse matrix is using the COO format as discussed here: http://www.scipy.org/SciPyPackages/Sparse#head-be8a0be5d0e44c4d59550d64fb017...
Using COO instead of LIL should be considerably faster.
-- Nathan Bell wnbell@gmail.com http://graphics.cs.uiuc.edu/~wnbell/ _______________________________________________ SciPy-user mailing list SciPy-user@scipy.org http://projects.scipy.org/mailman/listinfo/scipy-user
-- Peter N. Skomoroch peter.skomoroch@gmail.com http://www.datawrangling.com http://del.icio.us/pskomoroch
![](https://secure.gravatar.com/avatar/3e3b4dff04d6c945d0f990ea8c362abb.jpg?s=120&d=mm&r=g)
On Wed, Dec 10, 2008 at 4:18 PM, Peter Skomoroch <peter.skomoroch@gmail.com> wrote:
Nathan,
Thanks for the pointer, I had missed that wiki page.
It's fairly recent, so don't feel bad :)
The bottleneck now seems to be this for-loop, which takes the majority of the remaining time (1.82258105278 seconds):
for index, (i,j) in enumerate(nonzero_indices): data[index] = dot(W[i,:],H[:,j])
Is there a better approach for this assignment block?
You could vectorize the loop: W = random([n,r]).astype(float32) H = random([m,r]).astype(float32) # note, shape is (m,r) I,J = V.nonzero() X = (W[I,:] * H[J,:]).sum(axis=1) V_approx = sparse.coo_matrix((X,(I,J)), shape=(n,m)) If memory usage of the above is too costly, you could use the same approach, but on fixed-sized chunks of the arrays. -- Nathan Bell wnbell@gmail.com http://graphics.cs.uiuc.edu/~wnbell/
![](https://secure.gravatar.com/avatar/742736902afc4e0926b96932ada8cbc1.jpg?s=120&d=mm&r=g)
Hmmm, surprisingly the vectorized version seems to take longer: Original method: Filling coo_matrix filling... data assignment done, filling matrix 1.84783697128 total time to fill coo_matrix: 1.85190200806 done... Vectorized: Filling coo_matrix filling... data assignment done, filling matrix 3.22157812119 total time to fill coo_matrix: 3.2216091156 done... On Wed, Dec 10, 2008 at 4:46 PM, Nathan Bell <wnbell@gmail.com> wrote:
On Wed, Dec 10, 2008 at 4:18 PM, Peter Skomoroch <peter.skomoroch@gmail.com> wrote:
Nathan,
Thanks for the pointer, I had missed that wiki page.
It's fairly recent, so don't feel bad :)
The bottleneck now seems to be this for-loop, which takes the majority of the remaining time (1.82258105278 seconds):
for index, (i,j) in enumerate(nonzero_indices): data[index] = dot(W[i,:],H[:,j])
Is there a better approach for this assignment block?
You could vectorize the loop:
W = random([n,r]).astype(float32) H = random([m,r]).astype(float32) # note, shape is (m,r)
I,J = V.nonzero() X = (W[I,:] * H[J,:]).sum(axis=1) V_approx = sparse.coo_matrix((X,(I,J)), shape=(n,m))
If memory usage of the above is too costly, you could use the same approach, but on fixed-sized chunks of the arrays.
-- Nathan Bell wnbell@gmail.com http://graphics.cs.uiuc.edu/~wnbell/ _______________________________________________ SciPy-user mailing list SciPy-user@scipy.org http://projects.scipy.org/mailman/listinfo/scipy-user
-- Peter N. Skomoroch peter.skomoroch@gmail.com http://www.datawrangling.com http://del.icio.us/pskomoroch
![](https://secure.gravatar.com/avatar/7d25e66cab04d869b99bf41281f11d07.jpg?s=120&d=mm&r=g)
Hi scipy-experts, I have a 3D array (180x200x200 elements) that represents a 3D volume in space. I want to rotate the data in this array by an arbitrary amount around an arbitrary point inside the volume, then re-grid the result back into the original voxels. Does anyone know of a scipythonic or numpythonic function, package, or even an algorithm to achieve this? -best Gary
![](https://secure.gravatar.com/avatar/5c9fb379c4e97b58960d74dcbfc5dee5.jpg?s=120&d=mm&r=g)
On Wed, Dec 10, 2008 at 07:51:27PM -0500, Gary Strangman wrote:
Hi scipy-experts,
I have a 3D array (180x200x200 elements) that represents a 3D volume in space. I want to rotate the data in this array by an arbitrary amount around an arbitrary point inside the volume, then re-grid the result back into the original voxels. Does anyone know of a scipythonic or numpythonic function, package, or even an algorithm to achieve this?
You will probably find what you need in one of the functions listed on http://docs.scipy.org/scipy/docs/scipy.ndimage.interpolation/#scipy-ndimage-... Gaƫl
![](https://secure.gravatar.com/avatar/7d25e66cab04d869b99bf41281f11d07.jpg?s=120&d=mm&r=g)
Hi scipy-experts,
I have a 3D array (180x200x200 elements) that represents a 3D volume in space. I want to rotate the data in this array by an arbitrary amount around an arbitrary point inside the volume, then re-grid the result back into the original voxels. Does anyone know of a scipythonic or numpythonic function, package, or even an algorithm to achieve this?
You will probably find what you need in one of the functions listed on http://docs.scipy.org/scipy/docs/scipy.ndimage.interpolation/#scipy-ndimage-...
Not only had I forgotten about ndimage, but had no recollection of all the nifty things in there. Thanks for saving me a week of wheel-reinvention! G
participants (4)
-
Gael Varoquaux
-
Gary Strangman
-
Nathan Bell
-
Peter Skomoroch