[SciPy-User] Least squares speed
josef.pktd at gmail.com
josef.pktd at gmail.com
Tue Oct 1 17:54:46 EDT 2013
On Tue, Oct 1, 2013 at 5:45 PM, <josef.pktd at gmail.com> wrote:
> On Tue, Oct 1, 2013 at 5:24 PM, Nathaniel Smith <njs at pobox.com> wrote:
>> Here are several ways to solve the least squares problem XB = Y:
>>
>> scipy.linalg.lstsq(x, y)
>> np.linalg.lstsq(x, y)
>> np.dot(scipy.linalg.pinv(x), y)
>> np.dot(np.linalg.pinv(x), y)
>>
>> >From the documentation, I would expect these to be ordered by speed,
>> fastest up top and slowest at the bottom. It turns out they *are*
>> ordered by speed, except, on my system (linux x86-64, numpy 1.7.1,
>> scipy 0.12.0, ubuntu-installed atlas), it's exactly backwards, with
>> very substantial differences (more than a factor of 20!):
>>
>> # Typical smallish problem for me:
>> In [40]: x = np.random.randn(780, 2)
>>
>> In [41]: y = np.random.randn(780, 32 * 275)
>>
>> In [42]: %timeit scipy.linalg.lstsq(x, y)
>> 1 loops, best of 3: 494 ms per loop
>>
>> In [43]: %timeit np.linalg.lstsq(x, y)
>> 1 loops, best of 3: 356 ms per loop
>>
>> In [44]: %timeit np.dot(scipy.linalg.pinv(x), y)
>> 10 loops, best of 3: 62.2 ms per loop
>>
>> In [45]: %timeit np.dot(np.linalg.pinv(x), y)
>> 10 loops, best of 3: 23.2 ms per loop
>>
>> Is this expected? I'm particularly confused at why scipy's lstsq
>> should be almost 40% slower than numpy's. (And shouldn't we have a
>> fast one-function way to solve least squares problems?)
>>
>> Any reason not to use the last option? Is it as numerically stable as lstsq?
>>
>> Is there any other version that's even faster or even more numerically stable?
>
> If you have very long x, then using normal equation is faster for univariate y.
>
> There are many different ways to calculate pinv
> https://github.com/scipy/scipy/pull/289
>
> np.lstsq breaks on rank deficient x IIRC, uses different Lapack
> functions than scipy's lstsq
>
> In very badly scaled cases (worst NIST case), scipy's pinv was a bit
> more accurate than numpy's, but maybe just different defaults.
> numpy's pinv was also faster than scipy's in the cases that I tried.
>
> There is only a single NIST case that can fail using the defaults with
> numpy pinv.
for example, when I was playing with precision problems
Addendum 2: scipy.linalg versus numpy.linalg
http://jpktd.blogspot.ca/2012/03/numerical-accuracy-in-linear-least.html
Josef
>
> (what about using qr or chol_solve ?)
>
> Lots of different ways to solve this and I never figured out a ranking
> across different cases, speed, precision, robustness to
> near-singularity.
>
> Josef
>
>
>>
>> -n
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
More information about the SciPy-User
mailing list