# [Numpy-discussion] Efficient orthogonalisation with scipy/numpy

Charles R Harris charlesr.harris at gmail.com
Tue Jan 19 21:47:04 EST 2010

```On Tue, Jan 19, 2010 at 6:08 PM, <josef.pktd at gmail.com> wrote:

> On Tue, Jan 19, 2010 at 6:48 PM, Charles R Harris
> <charlesr.harris at gmail.com> wrote:
> >
> >
> > On Tue, Jan 19, 2010 at 2:34 PM, <josef.pktd at gmail.com> wrote:
> >>
> >> On Tue, Jan 19, 2010 at 4:29 PM, Charles R Harris
> >> <charlesr.harris at gmail.com> wrote:
> >> >
> >> >
> >> > On Tue, Jan 19, 2010 at 1:47 PM, Gael Varoquaux
> >> > <gael.varoquaux at normalesup.org> wrote:
> >> >>
> >> >> On Tue, Jan 19, 2010 at 02:22:30PM -0600, Robert Kern wrote:
> >> >> > > y = y - np.dot(confounds.T, linalg.lstsq(confounds.T, y)[0])
> >> >>
> >> >> > > with np = numpy and linalg = scipy.linalg where scipy calls
> ATLAS.
> >> >>
> >> >> > For clarification, are you trying to find the components of the y
> >> >> > vectors that are perpendicular to the space spanned by the 10
> >> >> > orthonormal vectors in confounds?
> >> >>
> >> >> Yes. Actually, what I am doing is calculating partial correlation
> >> >> between
> >> >> x and y conditionally to confounds, with the following code:
> >> >>
> >> >> def cond_partial_cor(y, x, confounds=[]):
> >> >>    """ Returns the partial correlation of y and x, conditionning on
> >> >>        confounds.
> >> >>    """
> >> >>    # First orthogonalise y and x relative to confounds
> >> >>    if len(confounds):
> >> >>        y = y - np.dot(confounds.T, linalg.lstsq(confounds.T, y)[0])
> >> >>        x = x - np.dot(confounds.T, linalg.lstsq(confounds.T, x)[0])
> >> >>    return np.dot(x, y)/sqrt(np.dot(y, y)*np.dot(x, x))
> >> >>
> >> >> I am not sure that what I am doing is optimal.
> >> >>
> >> >> > > Most of the time is spent in linalg.lstsq. The length of the
> >> >> > > vectors
> >> >> > > is
> >> >> > > 810, and there are about 10 confounds.
> >> >>
> >> >> > Exactly what are the shapes? y.shape = (810, N); confounds.shape =
> >> >> > (810,
> >> >> > 10)?
> >> >>
> >> >> Sorry, I should have been more precise:
> >> >>
> >> >> y.shape = (810, )
> >> >> confounds.shape = (10, 810)
> >> >>
> >> >
> >> > Column stack the bunch so that the last column is y, then do a qr
> >> > decomposition. The last column of q is the (normalized) orthogonal
> >> > vector
> >> > and its amplitude is the last (bottom right) component of r.
> >>
> >> do you have to do qr twice, once with x and once with y in the last
> >> column or can this be combined?
> >>
> >> I was trying to do something similar for partial autocorrelation for
> >> timeseries but didn't manage or try anything better than repeated
> >> leastsq or a variant.
> >>
> >
> > Depends on what you want to do. The QR decomposition is essentially
> > Gram-Schmidt on the columns. So if you just want an orthonormal basis for
> > the subspace spanned by a bunch of columns, the columns of Q are they. To
> > get the part of y orthogonal to that subspace you can do y - Q*Q.T*y,
> which
> > is probably what you want if the x's are fixed and the y's vary. If there
> is
> > just one y, then putting it as the last column lets the QR algorithm do
> that
> > last bit of projection.
>
> Gram-Schmidt (looking at it for the first time) looks a lot like
> sequential least squares projection. So, I'm trying to figure out if I
> can use the partial results up to a specific column as partial least
> squares and then work my way to the end by including/looking at more
> columns.
>
>
I don't the QR factorization would work for normal PLS. IIRC, one of the
algorithms does a svd of the cross correlation matrix. The difference is
that in some sense the svd picks out the best linear combination of columns,
while the qr factorization without column pivoting just takes them in order.
The QR factorization used to be the method of choice for least squares
because it is straight forward to compute, no iterating needed as in svd,
but these days that advantage is pretty much gone. It is still a common
first step in the svd, however. The matrix is factored to Q*R, then the svd
of R is computed.

> But unfortunately I don't have time to play with it long enough to
> figure out whether and how it works, but I keep this in mind for the
> future.
>
>
Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20100119/61ce594a/attachment.html>
```