[SciPy-user] linear regression

Robert Kern robert.kern at gmail.com
Wed May 27 19:35:10 EDT 2009


On Wed, May 27, 2009 at 15:29,  <josef.pktd at gmail.com> wrote:
> On Wed, May 27, 2009 at 3:37 PM, Robert Kern <robert.kern at gmail.com> wrote:

>> For "y=f(x)" models, this is true. Both y and x can be multivariate,
>> and you can express the covariance of the uncertainties for each, but
>> not covariance between the y and x uncertainties. This is because of
>> the numerical tricks used for efficient implementation
>
> In this case, OLS would still be unbiased in the linear case, but
> maybe not efficient.

Are you sure? I see significant deviations using a simple example
(albeit one which is utterly rigged in ODR's favor). The X
uncertainties start small and grow with increasing X. The Y
uncertainties start large and shrink with increasing X. Plotting the
estimates shows some strange structure in the OLS estimates.


import numpy as np

from scipy.odr import RealData, ODR
from scipy.odr.models import unilinear


beta_true = np.array([-0.47960828215176365,  5.47674024758398481])
p_x = np.array([0.,.9,1.8,2.6,3.3,4.4,5.2,6.1,6.5,7.4])
p_y = beta_true[0] * p_x + beta_true[1]
p_sx = np.array([.03,.03,.04,.035,.07,.11,.13,.22,.74,1.])
p_sy = np.array([1.,.74,.5,.35,.22,.22,.12,.12,.1,.04])


def random_betas(n=500, prng=np.random):
    """ Compute random parameter vectors from both ODR and OLS by generating
    random data and fitting it.
    """
    odr_betas = []
    ols_betas = []
    for i in range(n):
        x = np.random.normal(p_x, p_sx)
        y = np.random.normal(p_y, p_sy)

        # ODR:
        data = RealData(x, y, sx=p_sx, sy=p_sy)
        odr = ODR(data, unilinear, beta0=[1., 1.])
        odr_out = odr.run()
        odr_betas.append(odr_out.beta)

        # Weighted OLS:
        A = np.ones((len(x), 2))
        A[:,0] = x
        # Weight by the Y error.
        A /= p_sy[:,np.newaxis]
        b, res, rank, s = np.linalg.lstsq(A, y/p_sy)
        ols_betas.append(b)

        # Alternately:
        #ols = ODR(data, unilinear, beta0=[1., 1.])
        #ols.set_job(fit_type=2)
        #ols_out = ols.run()
        #ols_betas.append(ols_out.beta)
    return np.array(odr_betas).T, np.array(ols_betas).T

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco



More information about the SciPy-User mailing list