Re: [SciPy-user] Monte Carlo & sparse.linsolve for medium size arrays - any strategies?

26 Mar 2008

      On 24/03/2008, Christopher Mutel <cmutel@gmail.com> wrote:
...
I am working with medium size sparse matrices (4000 by 4000, coverage
 of ~1%), and need to do uncertainty analysis using a Monte Carlo
 approach. For each element in the matrix, I have an average value, the
 distribution (mostly lognormal, some normal and uniform), and the
 relevant uncertainty parameters. I am solving the classic Ax = B
 matrix problem, where the large matrix is A; B is a constant array. I
 anticipate needing to do on the order of 1000 iterations.
SciPy is a fantastic library, but I am not creative enough to come up
 with an efficient way to store the relevant uncertainty information so
 that I am not iteratively generating the large matrix for each Monte
 Carlo run. Is there a clever way to construct or sublcass a sparse
 matrix as a generator, so that each time it is referenced a new matrix
 is generated? Or is there a better approach (i.e. I am sure there is a
 better approach that I haven't thought of)? I know that this is a
 rather general question, but I have been thinking about this off and
 on for quite a while, and have had great luck in the past getting help
 on this list.
I assume the sparsity pattern never changes? That is, it's not just
sparse because many of the random values are often small...

The way I would address the problem would be to construct A each time
but as a csc_matrix:

In [13]: ij = N.vstack((N.arange(1000), N.random.randint(1000, size=1000)))

In [14]: data = N.zeros(1000)

In [15]: data[:500] = N.random.randn(500)

In [16]: data[500:] = N.exp(N.random.randn(500))

In [19]: A = sparse.csc_matrix((data,ij),(1000,1000))

In [20]: A
Out[20]:
<1000x1000 sparse matrix of type '<type 'numpy.float64'>'
        with 1000 stored elements (space for 1000)
        in Compressed Sparse Column format>

This process is quite efficient; if you need to rerun the simulation,
you'd just redo steps 15, 16, and 19. If your sparsity pattern changes
you'd change ij. The order of ij probably affects the speed at which
this runs; probably it should be sorted lexicographically, but if you
have several kinds of entry it will probably be easier to group each
kind of entry in data[] and ij[].

Good luck,
Anne

Re: [SciPy-user] Monte Carlo & sparse.linsolve for medium size arrays - any strategies?

Anne Archibald