[SciPy-user] OLS matrix-f(x) = 0 problem (Was: linear regression)

Thu May 28 16:43:33 EDT 2009

On Thu, May 28, 2009 at 3:37 PM, Gael Varoquaux
<gael.varoquaux at normalesup.org> wrote:
> On Wed, May 27, 2009 at 06:27:29PM -0400, josef.pktd at gmail.com wrote:
>> Sounds like a recursive system of linear (simultaneous) equations with
>> linear restrictions to me. If you want an unbiased estimator, then
>> going row by row, and solving each linear OLS, linalg.lstsq, would be
>> the standard way to go. Substuting the previous estimates of the Y's
>> into the next step.
>
> Oups, I realise I forgot to answer.
>
> You are right, this is a way to interpret it, and I was solving the
> system as you suggest. What didn't like is that the solution I was
> getting was dependant on the order of the variables, but I had forgotten
> that the lower triangular matrix was an approximation. The
> non-permutation-invariance came from this approximation, not the way I
> was solving the system.
>
> Unfortunately, it seems that the solution to the complete problem is
> still an open research question (FYI the problem is to find the OLS
> solution to "M X = X + e", with M definite positive, and with a given
> support.
>
> X's dimension are everywhere between (50, 50) to (300, 500), including
> the bad situation (300, 50).
>
> This is related sparse covariance matrix estimation. I don't think there
> is (yet) an easy answer.
>
> Thanks for your answer, it brought me back to Earth, making me realize
> that I was already doing the right thing, and look for the problem
> elsewhere.
>
> Gaël

I'm not sure I understand anymore.

When estimating the parameters of a simultaneous system of equations
with least squares, we need a lot of identifying restrictions, the
lower triangular parameter matrix is the simplest one. And you don't
get permutation invariance because the sequence of your equation is
what identifies the parameters. In your case, you need to have enough
identifying restrictions on the support of M, and given that you don't
have any additional exogenous variables the identifying restrictions
might require that it can be reordered to a lower triangular form.
(Disclaimer: After I mixed up the bias yesterday, I should mentioned
that I haven't looked at this in a pretty long time.)

For the rest I'm a bit vague:
If you don't want to impose the sequential identifying restriction,
than you are just looking for a subspace that spans your X matrix with
certain properties.

Given that you have an X that can have more rows than columns and
reversed, you have either more or fewer equations than unknowns, which
should already create a large multiplicity of solutions for some
cases. Also I expect your X'X (or in numpy X.T * X) matrix to be
singular.  (maybe it is X*X.T in your notation)
So I would think that the solution will depend more on the eigenvector
decomposition, or SVD, or pinv of X'X, and there might be many
possibilities to span the space of X. I'm not sure how to get the
subspace that satisfies your support in M restrictions, if M is not
lower triangular.

I don't really understand what permutation-invariance you want, but if
you want to impose some kind of symmetry maybe this gives you
identification of a unique solution.

Josef