[Numpy-discussion] Identifying Colinear Columns of a Matrix

Fri Aug 26 14:38:28 EDT 2011

Charles!  That looks like it could be a winner!  It looks like you always choose the last column of the U matrix and ID the columns that have the same values?  It works when I add extra columns as well!  BTW, sorry for my lack of knowledge... but what was the point of the dot multiply at the end?  That they add up to essentially zero, indicating singularity?  Thanks so much!

MJ

From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion-bounces at scipy.org] On Behalf Of Charles R Harris
Sent: Friday, August 26, 2011 11:04 AM
To: Discussion of Numerical Python
Subject: Re: [Numpy-discussion] Identifying Colinear Columns of a Matrix

On Fri, Aug 26, 2011 at 11:41 AM, Mark Janikas <mjanikas at esri.com<mailto:mjanikas at esri.com>> wrote:
I wonder if my last statement is essentially the only answer... which I wanted to avoid...

Should I just use combinations of the columns and try and construct the corrcoef() (then ID whether NaNs are present), or use the condition number to ID the singularity?  I just wanted to avoid the whole k! algorithm.

MJ

-----Original Message-----
From: numpy-discussion-bounces at scipy.org<mailto:numpy-discussion-bounces at scipy.org> [mailto:numpy-discussion-bounces at scipy.org<mailto:numpy-discussion-bounces at scipy.org>] On Behalf Of Mark Janikas
Sent: Friday, August 26, 2011 10:35 AM
To: Discussion of Numerical Python
Subject: Re: [Numpy-discussion] Identifying Colinear Columns of a Matrix

I actually use the VIF when the design matrix can be inverted.... I do it the quick and dirty way as opposed to the step regression:

1. Calc the correlation coefficient of the matrix (w/o the intercept)
2. Return the diagonal of the inversion of the correlation matrix in step 1.

Again, the problem lies in the multiple column relationship... I wouldn't be able to run sub regressions at all when the columns are perfectly collinear.

MJ

-----Original Message-----
From: numpy-discussion-bounces at scipy.org<mailto:numpy-discussion-bounces at scipy.org> [mailto:numpy-discussion-bounces at scipy.org<mailto:numpy-discussion-bounces at scipy.org>] On Behalf Of Skipper Seabold
Sent: Friday, August 26, 2011 10:28 AM
To: Discussion of Numerical Python
Subject: Re: [Numpy-discussion] Identifying Colinear Columns of a Matrix

On Fri, Aug 26, 2011 at 1:10 PM, Mark Janikas <mjanikas at esri.com<mailto:mjanikas at esri.com>> wrote:
> Hello All,
>
>
>
> I am trying to identify columns of a matrix that are perfectly collinear.
> It is not that difficult to identify when two columns are identical are have
> zero variance, but I do not know how to ID when the culprit is of a higher
> order. i.e. columns 1 + 2 + 3 = column 4.  NUM.corrcoef(matrix.T) will
> return NaNs when the matrix is singular, and LA.cond(matrix.T) will provide
> a very large condition number.. But they do not tell me which columns are
> causing the problem.   For example:
>
>
>
> zt = numpy. array([[ 1.  ,  1.  ,  1.  ,  1.  ,  1.  ],
>
>                            [ 0.25,  0.1 ,  0.2 ,  0.25,  0.5 ],
>
>                            [ 0.75,  0.9 ,  0.8 ,  0.75,  0.5 ],
>
>                            [ 3.  ,  8.  ,  0.  ,  5.  ,  0.  ]])
>
>
>
> How can I identify that columns 0,1,2 are the issue because: column 1 +
> column 2 = column 0?
>
>
>
> Any input would be greatly appreciated.  Thanks much,
>

The way that I know to do this in a regression context for (near
perfect) multicollinearity is VIF. It's long been on my todo list for
statsmodels.

http://en.wikipedia.org/wiki/Variance_inflation_factor

Maybe there are other ways with decompositions. I'd be happy to hear about them.

Please post back if you write any code to do this.

Why not svd?

In [13]: u,d,v = svd(zt)

In [14]: d
Out[14]:
array([  1.01307066e+01,   1.87795095e+00,   3.03454566e-01,
         3.29253945e-16])

In [15]: u[:,3]
Out[15]: array([ 0.57735027, -0.57735027, -0.57735027,  0.        ])

In [16]: dot(u[:,3], zt)
Out[16]:
array([ -7.77156117e-16,  -6.66133815e-16,  -7.21644966e-16,
        -7.77156117e-16,  -8.88178420e-16])

Chuck

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110826/792557b6/attachment.html>