[SciPy-User] Strange behaviour from corrcoef when calculating correlation-matrix in SciPy/NumPy.

Thu Mar 3 15:18:31 EST 2011

Hi,

On Thu, Mar 3, 2011 at 11:44 AM, Pauli Virtanen <pav at iki.fi> wrote:

> Hi,
>
> Wed, 02 Mar 2011 14:36:18 -0500, josef.pktd wrote:
> [clip]
> >> The Matlab convention
> >>
> >>        corrcoef(x, y) == corrcoef(c_[x.ravel(), y.ravel()])
> >
> > I don't remember matlab exactly, but I don't think there is a ravel, and
> > I think R also does
> >
> > cov(x, y) = np.dot((x-x.mean()).T, y-y.mean())
> >
> > and normalized for corrcoef.
>
> There's a ravel, according to their docs:
>
>        http://www.mathworks.com/help/techdoc/ref/cov.html
>
> """cov(X,Y), where X and Y are matrices with the same number of elements,
> is equivalent to cov([X(:) Y(:)])."""
>
> X(:) is the matlab notation for raveling.
>
FWIW, please note following matlab/ octave behavior:
> X= [1 2 7 3; 2 1 1 2]'
X =
   1   2
   2   1
   7   1
   3   2
> Y= [4 2 7 1; 9 1 7 3]'
Y =
   4   9
   2   1
   7   7
   1   3

> *corrcoef([X(:) Y(:)]) %(1*
ans =
   1.00000   0.26328
   0.26328   1.00000
> *corrcoef([X Y]) %(2*
ans =
   1.00000  -0.54882   0.69462   0.13884
  -0.54882   1.00000  -0.43644   0.31623
   0.69462  -0.43644   1.00000   0.69007
   0.13884   0.31623   0.69007   1.00000
> *corrcoef(X, Y) %(3*
ans =
   0.69462   0.13884
  -0.43644   0.31623

and then equivalent numpy:
In []: X= array([[1, 2, 7, 3], [2, 1, 1, 2]])
In []: X
Out[]:
array([[1, 2, 7, 3],
       [2, 1, 1, 2]])
In []: Y= array([[4, 2, 7, 1], [9, 1, 7, 3]])
In []: Y
Out[]:
array([[4, 2, 7, 1],
       [9, 1, 7, 3]])

In []: *corrcoef(X.ravel(), Y.ravel()) **#(1*
Out[]:
array([[ 1.        ,  0.26328398],
       [ 0.26328398,  1.        ]])
In []: *corrcoef(X, Y) #(2*
Out[]:
array([[ 1.        , -0.5488213 ,  0.69462323,  0.13884203],
       [-0.5488213 ,  1.        , -0.43643578,  0.31622777],
       [ 0.69462323, -0.43643578,  1.        ,  0.69006556],
       [ 0.13884203,  0.31622777,  0.69006556,  1.        ]])
> corrcoef(X, Y) %(3
In []: *corrcoef(?) #(3*
Out[]:
array([[ 0.69462   0.13884],
       [-0.43644   0.31623]])

So perhaps there does not exist any really simple and straightforward
translation
(of corrcoef) from matlab to numpy? Just as an example; how would you
implement case %(3 properly  with numpy?

Regards,
eat

>
> --
> Pauli Virtanen
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20110303/caea13fd/attachment.html>