![](https://secure.gravatar.com/avatar/742736902afc4e0926b96932ada8cbc1.jpg?s=120&d=mm&r=g)
Hmmm, surprisingly the vectorized version seems to take longer: Original method: Filling coo_matrix filling... data assignment done, filling matrix 1.84783697128 total time to fill coo_matrix: 1.85190200806 done... Vectorized: Filling coo_matrix filling... data assignment done, filling matrix 3.22157812119 total time to fill coo_matrix: 3.2216091156 done... On Wed, Dec 10, 2008 at 4:46 PM, Nathan Bell <wnbell@gmail.com> wrote:
On Wed, Dec 10, 2008 at 4:18 PM, Peter Skomoroch <peter.skomoroch@gmail.com> wrote:
Nathan,
Thanks for the pointer, I had missed that wiki page.
It's fairly recent, so don't feel bad :)
The bottleneck now seems to be this for-loop, which takes the majority of the remaining time (1.82258105278 seconds):
for index, (i,j) in enumerate(nonzero_indices): data[index] = dot(W[i,:],H[:,j])
Is there a better approach for this assignment block?
You could vectorize the loop:
W = random([n,r]).astype(float32) H = random([m,r]).astype(float32) # note, shape is (m,r)
I,J = V.nonzero() X = (W[I,:] * H[J,:]).sum(axis=1) V_approx = sparse.coo_matrix((X,(I,J)), shape=(n,m))
If memory usage of the above is too costly, you could use the same approach, but on fixed-sized chunks of the arrays.
-- Nathan Bell wnbell@gmail.com http://graphics.cs.uiuc.edu/~wnbell/ _______________________________________________ SciPy-user mailing list SciPy-user@scipy.org http://projects.scipy.org/mailman/listinfo/scipy-user
-- Peter N. Skomoroch peter.skomoroch@gmail.com http://www.datawrangling.com http://del.icio.us/pskomoroch