Hi Jesper On Fri, May 25, 2007 at 10:37:44AM +0200, Jesper Larsen wrote:
I have a masked array of dimension (nvariables, nobservations) that contain missing values at arbitrary points. Is it safe to rely on numpy.corrcoeff to calculate the correlation coefficients of a masked array (it seems to give reasonable results)?
I don't think it is. If my thinking is correct, you would expect the following to have different results: In [38]: x = N.random.random(100) In [39]: y = N.random.random(100) In [40]: N.corrcoef(x,y) Out[40]: array([[ 1. , -0.07291798], [-0.07291798, 1. ]]) In [41]: x_ = N.ma.masked_array(x,mask=(N.random.random(100)>0.5).astype(bool)) In [42]: y_ = N.ma.masked_array(y,mask=(N.random.random(100)>0.5).astype(bool)) In [43]: N.corrcoef(x_,y_) Out[43]: array([[ 1. , -0.07291798], [-0.07291798, 1. ]]) Regards Stéfan