[SciPy-user] linear regression

Wed May 27 16:29:45 EDT 2009

On Wed, May 27, 2009 at 3:37 PM, Robert Kern <robert.kern at gmail.com> wrote:
> On Wed, May 27, 2009 at 14:22,  <josef.pktd at gmail.com> wrote:
>> On Wed, May 27, 2009 at 3:03 PM, Robert Kern <robert.kern at gmail.com> wrote:
>>> On Wed, May 27, 2009 at 13:28,  <josef.pktd at gmail.com> wrote:
>>>> On Wed, May 27, 2009 at 12:35 PM, ms <devicerandom at gmail.com> wrote:
>>>>> josef.pktd at gmail.com ha scritto:
>>>>>>> Have a look here <http://www.scipy.org/Cookbook/LinearRegression>
>>>>>>
>>>>>> y = Beta0 + Beta1 * x + Beta2 * x**2   is the second order polynomial.
>>>>>>
>>>>>> I also should have looked, polyfit returns the polynomial coefficients
>>>>>> but doesn't calculate the variance-covariance matrix or standard
>>>>>> errors of the OLS estimate.
>>>>>
>>>>> AFAIK, the ODR fitting routines return all these parameters, so one can
>>>>> maybe use that for linear fitting too.
>>>>
>>>> you mean scipy.odr?
>>>>
>>>> I never looked at it in details. Conceptionally it is very similar to
>>>> standard regression, but I've never seen an application for it, nor do
>>>> I know the probability theoretic or econometric background of it.
>>>
>>> ODR is nonlinear least-squares with errors in both variables (e.g.
>>> minimizing the weighted sum of squared distances from each point to
>>> the corresponding closest points on the curve rather than "straight
>>> down" as in OLS). scipy.odr implements both ODR and OLS. It also
>>> implements implicit regression, where the relationship between
>>> variables is not expressed as "y=f(x)" but "f(x,y)=0" such as fitting
>>> an ellipse.
>>>
>>>> The
>>>> results for many cases will be relatively close to standard least
>>>> squares.
>>>> A google search shows links to curve fitting but not to any
>>>> econometric theory. On the other hand, there is a very large
>>>> literature on how to treat measurement errors and endogeneity of
>>>> regressors for (standard) least squares and maximum likelihood.
>>>
>>> The extension is straightforward. ODR is really just a generalization
>>> of least-squares. Unfortunately, the links to the relevant papers seem
>>> to have died. I've put them up here:
>>>
>>> http://www.mechanicalkern.com/static/odr_vcv.pdf
>>> http://www.mechanicalkern.com/static/odr_ams.pdf
>>> http://www.mechanicalkern.com/static/odrpack_guide.pdf
>>>
>>
>> Thanks for the links, I finally also found out that in Wikipedia it is
>> under "Total Regression". Under "Errors-in-Variables model" it says
>>
>> "
>> Error-in-variables models can be estimated in several different ways.
>> Besides those outlined here, see:
>>        * total least squares for a method of fitting which does not
>> arise from a statistical model;
>> "
>>
>> >From a brief reading, I think that the main limitation is that it
>> doesn't allow you to explicitly model the joint error structure. I
>> looks like, this will be implicitly done by the scaling factors and
>> other function parameters. But this is just my first impression.
>
> For "y=f(x)" models, this is true. Both y and x can be multivariate,
> and you can express the covariance of the uncertainties for each, but
> not covariance between the y and x uncertainties. This is because of
> the numerical tricks used for efficient implementation

In this case, OLS would still be unbiased in the linear case, but
maybe not efficient. I don't know about the non-linear case.

> . However,
> "f(x)=0" models can express covariances between all dimensions of x.

When I saw initially the implicit function estimation, I thought this
might be pretty useful.

But I will have to play with odr, to see how much it can be used for
more "traditional" statistical analysis.

Josef