[SciPy-User] np.corrcoef ddof is redundant?

Tue Mar 10 23:58:11 EDT 2015

On 11/03/15 03:56, Matthew Brett wrote:

>> np.corrcoef should not be computed with np.cov because it just adds
>> additional rounding error to the result.
>
> What algorithm do you think we should use to minimize rounding error?

I was not actually thinking about that. I just thought we could reuse 
some of the code from np.cov to avoid the redundant division and 
multiplications.

But since you asked, to minimize rounding error there is a two-pass 
method which can be used for both cov and corrcoef. Cf. this Matlab code:

http://home.online.no/~pjacklam/matlab/software/util/statutil/covmat.m

This would be very easy to use in NumPy.

Another method which is less known is to use the SVD. It can also be 
used to compute the corrcoef. Here for real values and rowvar=False:

def cov(X, ddof):
     nx,p = X.shape
     mean = X.mean(axis=0)
     CX = X - mean[None,:]
     u,s,pc = np.linalg.svd(CX/np.sqrt(nx-ddof), full_matrices=False)
     s2 = s**2
     tmp = np.eye(p) * s2[:,None]
     return np.dot(pc.T,np.dot(tmp,pc))

Sturla