[Numpy-discussion] lstsq functionality

Mon Jul 19 23:40:56 EDT 2010

On Mon, Jul 19, 2010 at 8:27 PM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
>
>
> On Mon, Jul 19, 2010 at 9:02 PM, Keith Goodman <kwgoodman at gmail.com> wrote:
>>
>> On Mon, Jul 19, 2010 at 6:53 PM, Joshua Holbrook
>> <josh.holbrook at gmail.com> wrote:
>> > On Mon, Jul 19, 2010 at 5:50 PM, Charles R Harris
>> > <charlesr.harris at gmail.com> wrote:
>> >> Hi All,
>> >>
>> >> I'm thinking about adding some functionality to lstsq because I find
>> >> myself
>> >> doing the same fixes over and over. List follows.
>> >>
>> >> Add weights so data points can be weighted.
>> >> Use column scaling so condition numbers make more sense.
>> >> Compute covariance approximation?
>> >>
>> >> Unfortunately, the last will require using svd since there no linear
>> >> least
>> >> squares routines in LAPACK that also compute the covariance, at least
>> >> that
>> >> google knows about.
>> >>
>> >> Thoughts?
>> >
>> > Maybe make 2 functions--one which implements 1 and 2, and another
>> > which implements 3? I think weights is an excellent idea!
>>
>> I'd like a lstsq that did less, like not calculate the sum of squared
>> residuals. That's useful in tight loops. So I also think having two
>> lstsq makes sense. One as basic as possible; one with bells. How does
>> scipy's lstsq fit into all this?
>
> I think the computation of the residues is cheap in lstsq. The algolrithm
> used starts by reducing the design matrix to bidiagonal from and reduces the
> rhs at the same time. In other words an mxn problem becomes a (n+1)xn
> problem. That's why the summed square of residuals is available but not the
> individual residuals, after the reduction there is only one residual and its
> square it the residue.

That does sound good. But it must take some time. There's indexing,
array creation, if statement, summing:

        if results['rank'] == n and m > n:
            resids = sum((transpose(bstar)[n:,:])**2, axis=0).astype(result_t)

Here are the timings after removing the sum of squared residuals:

>> x = np.random.rand(1000,10)
>> y = np.random.rand(1000)
>> timeit np.linalg.lstsq(x, y)
1000 loops, best of 3: 369 us per loop
>> timeit np.linalg.lstsq2(x, y)
1000 loops, best of 3: 344 us per loop
>>
>> x = np.random.rand(10,2)
>> y = np.random.rand(10)
>> timeit np.linalg.lstsq(x, y)
10000 loops, best of 3: 102 us per loop
>> timeit np.linalg.lstsq2(x, y)
10000 loops, best of 3: 77 us per loop