![](https://secure.gravatar.com/avatar/ab7e74f2443b81e5175638d72be65e07.jpg?s=120&d=mm&r=g)
On 24/03/2008, Christopher Mutel <cmutel@gmail.com> wrote:
I am working with medium size sparse matrices (4000 by 4000, coverage of ~1%), and need to do uncertainty analysis using a Monte Carlo approach. For each element in the matrix, I have an average value, the distribution (mostly lognormal, some normal and uniform), and the relevant uncertainty parameters. I am solving the classic Ax = B matrix problem, where the large matrix is A; B is a constant array. I anticipate needing to do on the order of 1000 iterations.
SciPy is a fantastic library, but I am not creative enough to come up with an efficient way to store the relevant uncertainty information so that I am not iteratively generating the large matrix for each Monte Carlo run. Is there a clever way to construct or sublcass a sparse matrix as a generator, so that each time it is referenced a new matrix is generated? Or is there a better approach (i.e. I am sure there is a better approach that I haven't thought of)? I know that this is a rather general question, but I have been thinking about this off and on for quite a while, and have had great luck in the past getting help on this list.
I assume the sparsity pattern never changes? That is, it's not just sparse because many of the random values are often small... The way I would address the problem would be to construct A each time but as a csc_matrix: In [13]: ij = N.vstack((N.arange(1000), N.random.randint(1000, size=1000))) In [14]: data = N.zeros(1000) In [15]: data[:500] = N.random.randn(500) In [16]: data[500:] = N.exp(N.random.randn(500)) In [19]: A = sparse.csc_matrix((data,ij),(1000,1000)) In [20]: A Out[20]: <1000x1000 sparse matrix of type '<type 'numpy.float64'>' with 1000 stored elements (space for 1000) in Compressed Sparse Column format> This process is quite efficient; if you need to rerun the simulation, you'd just redo steps 15, 16, and 19. If your sparsity pattern changes you'd change ij. The order of ij probably affects the speed at which this runs; probably it should be sorted lexicographically, but if you have several kinds of entry it will probably be easier to group each kind of entry in data[] and ij[]. Good luck, Anne