[SciPy-User] np.corrcoef ddof is redundant?
Sturla Molden
sturla.molden at gmail.com
Tue Mar 10 23:58:11 EDT 2015
On 11/03/15 03:56, Matthew Brett wrote:
>> np.corrcoef should not be computed with np.cov because it just adds
>> additional rounding error to the result.
>
> What algorithm do you think we should use to minimize rounding error?
I was not actually thinking about that. I just thought we could reuse
some of the code from np.cov to avoid the redundant division and
multiplications.
But since you asked, to minimize rounding error there is a two-pass
method which can be used for both cov and corrcoef. Cf. this Matlab code:
http://home.online.no/~pjacklam/matlab/software/util/statutil/covmat.m
This would be very easy to use in NumPy.
Another method which is less known is to use the SVD. It can also be
used to compute the corrcoef. Here for real values and rowvar=False:
def cov(X, ddof):
nx,p = X.shape
mean = X.mean(axis=0)
CX = X - mean[None,:]
u,s,pc = np.linalg.svd(CX/np.sqrt(nx-ddof), full_matrices=False)
s2 = s**2
tmp = np.eye(p) * s2[:,None]
return np.dot(pc.T,np.dot(tmp,pc))
Sturla
More information about the SciPy-User
mailing list