[SciPy-User] Least squares speed

Wed Oct 2 15:58:06 EDT 2013

On Tue, Oct 1, 2013 at 10:45 PM,  <josef.pktd at gmail.com> wrote:
> On Tue, Oct 1, 2013 at 5:24 PM, Nathaniel Smith <njs at pobox.com> wrote:
>> Here are several ways to solve the least squares problem XB = Y:
>>
>> scipy.linalg.lstsq(x, y)
>> np.linalg.lstsq(x, y)
>> np.dot(scipy.linalg.pinv(x), y)
>> np.dot(np.linalg.pinv(x), y)
>>
>> >From the documentation, I would expect these to be ordered by speed,
>> fastest up top and slowest at the bottom. It turns out they *are*
>> ordered by speed, except, on my system (linux x86-64, numpy 1.7.1,
>> scipy 0.12.0, ubuntu-installed atlas), it's exactly backwards, with
>> very substantial differences (more than a factor of 20!):
>>
>> # Typical smallish problem for me:
>> In [40]: x = np.random.randn(780, 2)
>>
>> In [41]: y = np.random.randn(780, 32 * 275)
>>
>> In [42]: %timeit scipy.linalg.lstsq(x, y)
>> 1 loops, best of 3: 494 ms per loop
>>
>> In [43]: %timeit np.linalg.lstsq(x, y)
>> 1 loops, best of 3: 356 ms per loop
>>
>> In [44]: %timeit np.dot(scipy.linalg.pinv(x), y)
>> 10 loops, best of 3: 62.2 ms per loop
>>
>> In [45]: %timeit np.dot(np.linalg.pinv(x), y)
>> 10 loops, best of 3: 23.2 ms per loop
>>
>> Is this expected? I'm particularly confused at why scipy's lstsq
>> should be almost 40% slower than numpy's. (And shouldn't we have a
>> fast one-function way to solve least squares problems?)
>>
>> Any reason not to use the last option? Is it as numerically stable as lstsq?
>>
>> Is there any other version that's even faster or even more numerically stable?
>
> If you have very long x, then using normal equation is faster for univariate y.
>
> There are many different ways to calculate pinv
> https://github.com/scipy/scipy/pull/289
>
> np.lstsq breaks on rank deficient x IIRC, uses different Lapack
> functions than scipy's lstsq
>
> In very badly scaled cases (worst NIST case), scipy's pinv was a bit
> more accurate than numpy's, but maybe just different defaults.
> numpy's pinv was also faster than scipy's in the cases that I tried.
>
> There is only a single NIST case that can fail using the defaults with
> numpy pinv.
>
> (what about using qr or chol_solve ?)
>
> Lots of different ways to solve this and I never figured out a ranking
> across different cases, speed, precision, robustness to
> near-singularity.

Amazingly enough, in this case doing a full rank-revealing QR,
including calculating the rank via condition number of submatrices, is
tied for the fastest method:

In [66]: %timeit q, r, p = scipy.linalg.qr(x, mode="economic",
pivoting=True); [np.linalg.cond(r[:i, :i]) for i in xrange(1,
r.shape[0])]; np.linalg.solve(r[:, p], np.dot(q.T, y))
10 loops, best of 3: 22 ms per loop

Also tied for fastest with np.linalg.pinv (above) are the direct
method, cho_solve, and a non-rank-revealing QR with lower-triangular
backsolve:

In [70]: %timeit np.linalg.solve(np.dot(x.T, x), np.dot(x.T, y))
10 loops, best of 3: 21.4 ms per loop

In [71]: %timeit c = scipy.linalg.cho_factor(np.dot(x.T, x));
scipy.linalg.cho_solve(c, np.dot(x.T, y))
10 loops, best of 3: 21.2 ms per loop

In [72]: %timeit q, r = scipy.linalg.qr(x, mode="economic");
scipy.linalg.solve_triangular(r, np.dot(q.T, y))
10 loops, best of 3: 22.4 ms per loop

But AFAIK QR is the gold standard for precision on the kinds of linear
regression problems I care about (and definitely has better numerical
stability than the versions that involve dot(x.T, x)), and in my case
I actually want to detect and reject ill-conditioned problems rather
than impose some secondary disambiguating constraint, so... RRQR seems
to be the obvious choice.

Not sure if there's anything that could or should be done to make
these trade-offs more obvious to people less obsessive than me...

-n