[Numpy-discussion] corrcoef of masked array

Stefan van der Walt stefan at sun.ac.za
Sat May 26 03:16:32 EDT 2007


Hi Jesper

On Fri, May 25, 2007 at 10:37:44AM +0200, Jesper Larsen wrote:
> I have a masked array of dimension (nvariables, nobservations) that contain 
> missing values at arbitrary points. Is it safe to rely on numpy.corrcoeff to 
> calculate the correlation coefficients of a masked array (it seems to give 
> reasonable results)?

I don't think it is.  If my thinking is correct, you would expect the
following to have different results:

In [38]: x = N.random.random(100)

In [39]: y = N.random.random(100)

In [40]: N.corrcoef(x,y)
Out[40]: 
array([[ 1.        , -0.07291798],
       [-0.07291798,  1.        ]])

In [41]: x_ = N.ma.masked_array(x,mask=(N.random.random(100)>0.5).astype(bool))

In [42]: y_ = N.ma.masked_array(y,mask=(N.random.random(100)>0.5).astype(bool))

In [43]: N.corrcoef(x_,y_)
Out[43]: 
array([[ 1.        , -0.07291798],
       [-0.07291798,  1.        ]])

Regards
Stéfan



More information about the NumPy-Discussion mailing list