[Numpy-discussion] svd error checking vs. speed

Sebastian Berg sebastian at sipsolutions.net
Sat Feb 15 18:06:49 EST 2014


On Sa, 2014-02-15 at 17:35 -0500, josef.pktd at gmail.com wrote:
> On Sat, Feb 15, 2014 at 5:12 PM, Skipper Seabold <jsseabold at gmail.com> wrote:
> > On Sat, Feb 15, 2014 at 5:08 PM, <josef.pktd at gmail.com> wrote:
> >>
> >> On Sat, Feb 15, 2014 at 4:56 PM, Sebastian Berg
> >> <sebastian at sipsolutions.net> wrote:
> >> > On Sa, 2014-02-15 at 16:37 -0500, alex wrote:
> >> >> Hello list,
> >> >>
> >> >> Here's another idea resurrection from numpy github comments that I've
> >> >> been advised could be posted here for re-discussion.
> >> >>
> >> >> The proposal would be to make np.linalg.svd more like scipy.linalg.svd
> >> >> with respect to input checking.  The argument against the change is
> >> >> raw speed; if you know that you will never feed non-finite input to
> >> >> svd, then np.linalg.svd is a bit faster than scipy.linalg.svd.  An
> >> >> argument for the change could be to avoid issues reported on github
> >> >> like crashes, hangs, spurious non-convergence exceptions, etc. from
> >> >> the undefined behavior of svd of non-finite input.
> >> >>
> >> >
> >> > +1, unless this is a huge speed penalty, correctness (and decent error
> >> > messages) should come first in my opinion, this is python after all. If
> >> > this is a noticable speed difference, a kwarg may be an option (but
> >> > would think about that some more).
> >>
> >> maybe -1
> >>
> >> statsmodels is using np.linalg.pinv which uses svd
> >> I never ran heard of any crash (*), and the only time I compared with
> >> scipy I didn't like the slowdown.
> >> I didn't do any serious timings just a few examples.
> >>
> >> (*) not converged, ...
> >>
> >> pinv(x.T).dot(x) -> pinv(x.T, please_don_t_check=True).dot(y)
> >>
> >> numbers ?
> >
> >
> > FWIW, I see this spurious SVD did not converge warning very frequently with
> > ARMA when there is a nan that has creeped in. I usually know where to find
> > the problem, but I think it'd be nice if this error message was a little
> > better.
> 
> maybe I'm +1
> 
> While we don't see crashes, when I run Alex's example I see 13% cpu
> usage for a hanging process which looks very familiar to me, I see it
> reasonably often when I'm debugging code.
> 
> I never tried to track down where it hangs.
> 

If this should not cause big hangs/crashes (just "not converged" after a
long time or so), then maybe we should just check afterwards to give the
user a better idea of where to look for the error. I think I remember
people running into this and being confused (but without crash/hang).

- Sebsatian

> Josef
> 
> >
> > Skipper
> >
> >>
> >>
> >> grep: we also use scipy.linalg.pinv in some cases
> >>
> >> Josef
> >>
> >>
> >> >
> >> > - Sebastian
> >> >
> >> >> """
> >> >> [...] the following numpy code hangs until I `kill -9` it.
> >> >>
> >> >> ```
> >> >> $ python runtests.py --shell
> >> >> $ python
> >> >> Python 2.7.5+
> >> >> [GCC 4.8.1] on linux2
> >> >> >>> import numpy as np
> >> >> >>> np.__version__
> >> >> '1.9.0.dev-e3f0f53'
> >> >> >>> A = np.array([[1e3, 0], [0, 1]])
> >> >> >>> B = np.array([[1e300, 0], [0, 1]])
> >> >> >>> C = np.array([[1e3000, 0], [0, 1]])
> >> >> >>> np.linalg.svd(A)
> >> >> (array([[ 1.,  0.],
> >> >>        [ 0.,  1.]]), array([ 1000.,     1.]), array([[ 1.,  0.],
> >> >>        [ 0.,  1.]]))
> >> >> >>> np.linalg.svd(B)
> >> >> (array([[ 1.,  0.],
> >> >>        [ 0.,  1.]]), array([  1.00000000e+300,   1.00000000e+000]),
> >> >> array([[ 1.,  0.],
> >> >>        [ 0.,  1.]]))
> >> >> >>> np.linalg.svd(C)
> >> >> [hangs forever]
> >> >> ```
> >> >> """
> >> >>
> >> >> Alex
> >> >> _______________________________________________
> >> >> NumPy-Discussion mailing list
> >> >> NumPy-Discussion at scipy.org
> >> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >> >>
> >> >
> >> >
> >> > _______________________________________________
> >> > NumPy-Discussion mailing list
> >> > NumPy-Discussion at scipy.org
> >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >> _______________________________________________
> >> NumPy-Discussion mailing list
> >> NumPy-Discussion at scipy.org
> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> >
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 





More information about the NumPy-Discussion mailing list