[SciPy-user] Sparse csr_matrix and column sum

Dinesh B Vadhia dineshbvadhia at hotmail.com
Sun Apr 27 22:32:29 EDT 2008


Thanks Nathan.

Sorry, I wasn't being imprecise as A.sum(0) didn't work and still doesn't - I've just tried again.  However, A.todense().sum(0) does work - but takes a performance hit.  My import statements are:

> import numpy
> import scipy
> from scipy import sparse

Which means that I have to qualify each function/operation with a numpy. or scipy.  - is the problem that I haven't qualified the statement:

> colSum = A.sum(0) 

correctly?

Anyway, A.todense().sum() works for small sized matrices but unfortunately, because of the large matrices being used the A.todense().sum(0) results in a memory error.  For I = 20000 and J = 66000, here is the Traceback:

Traceback (most recent call last):
  File "C:\... sparseTest.py", line 42
    colSum = A.todense().sum(0)     # sum of each column of A
  File "C:\Python25\Lib\site-packages\scipy\sparse\base.py", line 416, in todense
    return asmatrix(self.toarray())
  File "C:\Python25\Lib\site-packages\scipy\sparse\compressed.py", line 627, in toarray
    M = zeros(self.shape, dtype=self.dtype)
MemoryError

Is there a way around this?

Cheers

Dinesh


--------------------------------------------------------------------------------
From: Nathan Bell <wnbell <at> gmail.com>
Subject: Re: Sparse csr_matrix and column sum
Newsgroups: gmane.comp.python.scientific.user
Date: 2008-04-28 00:08:23 GMT (1 hour and 38 minutes ago)

On Sun, Apr 27, 2008 at 6:41 PM, Dinesh B Vadhia
<dineshbvadhia <at> hotmail.com> wrote:
>
>
> If A is a sparse csr_matrix and you want to calculate the sum of each column
> then the 'normal' method is:
>
> import numpy
> import scipy
> from scipy import sparse
>
> colSum = scipy.asmatrix(scipy.zeros((1,J), dtype=numpy.float))
> colSum = A.mean(0)
>
> This isn't working.  Do we have to do something else (eg. a todense()) for a
> sparse matrix?  If so, how?

What do you mean by "isn't working"?

In [1]: from scipy import *
In [2]: from scipy.sparse import *
In [3]: A = csr_matrix(rand(3,3))
In [4]: A.todense()
Out[4]:
matrix([[ 0.95297535,  0.81029421,  0.79146232],
        [ 0.88477059,  0.9025494 ,  0.80259054],
        [ 0.06691343,  0.76691617,  0.68518027]])
In [5]: A.mean(0)
Out[5]: matrix([[ 0.63488646,  0.82658659,  0.75974438]])
In [6]: A.mean(1)
Out[6]:
matrix([[ 0.85157729],
        [ 0.86330351],
        [ 0.50633662]])
In [7]: A.todense().mean(0)
Out[7]: matrix([[ 0.63488646,  0.82658659,  0.75974438]])
In [8]: A.todense().mean(1)
Out[8]:
matrix([[ 0.85157729],
        [ 0.86330351],
        [ 0.50633662]])

Dinesh, as a courtesy, would you provide specific details when
reporting your problems with SciPy?  I'd rather not have to speculate
on the precise nature of each issue raised.

-- 
Nathan Bell wnbell <at> gmail.com
http://graphics.cs.uiuc.edu/~wnbell/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20080427/e68c41e4/attachment.html>


More information about the SciPy-User mailing list