numpy/scipy: correlation

sturlamolden sturlamolden at yahoo.no
Sun Nov 12 10:32:05 EST 2006


Robert Kern wrote:

> The difference between the two models is that the first places no restrictions
> on the distribution of x. The second does; both the x and y marginal
> distributions need to be normal. Under the first model, the correlation
> coefficient has no meaning.

That is not correct. The correlation coefficient is meaningful in both
models, but must be interpreted differently. However, in both cases a
correlation coefficient of 1 or -1 indicates an exact linear
relationship between x and y.

Under the first model ("linear regression"), the squared correlation
coefficient is the "explained variance", i.e. the the proportion of y's
variance accounted for by the model y = m*x  + o.

Under the second model ("multivariate normal distribution"), the
correlation coefficient is the covariance of y and x divided by the
product of the standard deviations, cov(x,y)/(std(x)*std(y)).




More information about the Python-list mailing list