Jesper Larsen wrote:
Here is my solution for calculating the correlation coefficients for masked arrays. Comments are appreciated:
def macorrcoef(data1, data2): """ Calculates correlation coefficients taking masked out values into account.
It is assumed (but not checked) that data1.shape == data2.shape. """ nv, no = data1.shape cc = ma.array(0., mask=ones((nv, nv))) if no > 1: for i in range(nv): for j in range(nv): m = ma.getmaskarray(data1[i,:]) | ma.getmaskarray(data2[j,:]) d1 = ma.array(data1[i,:], copy=False, mask=m).compressed() d2 = ma.array(data2[j,:], copy=False, mask=m).compressed() if ma.count(d1) > 1: c = corrcoef(d1, d2) cc[i,j] = c[0,1]
return cc
I'm afraid this doesn't work, either. Correlation matrices are constrained to be positive semidefinite; that is, all of their eigenvalues must be >= 0. Calculating each of the correlation coefficients in a pairwise fashion doesn't incorporate this constraint.
But you're on the right track. My preferred approach to this problem is to find the pairwise correlation matrix as you did and then find the closest positive semidefinite matrix to it using the method of alternating projections. I can't give you the code I wrote for this since it belongs to a customer, but here is the reference I used:
http://eprints.ma.man.ac.uk/232/