[Numpy-discussion] svd error checking vs. speed

alex argriffi at ncsu.edu
Mon Feb 17 10:03:38 EST 2014


On Mon, Feb 17, 2014 at 4:49 AM, Dave Hirschfeld <novin01 at gmail.com> wrote:
> alex <argriffi <at> ncsu.edu> writes:
>
>>
>> Hello list,
>>
>> Here's another idea resurrection from numpy github comments that I've
>> been advised could be posted here for re-discussion.
>>
>> The proposal would be to make np.linalg.svd more like scipy.linalg.svd
>> with respect to input checking.  The argument against the change is
>> raw speed; if you know that you will never feed non-finite input to
>> svd, then np.linalg.svd is a bit faster than scipy.linalg.svd.  An
>> argument for the change could be to avoid issues reported on github
>> like crashes, hangs, spurious non-convergence exceptions, etc. from
>> the undefined behavior of svd of non-finite input.
>>
>> """
>> [...] the following numpy code hangs until I `kill -9` it.
>>
>> ```
>> $ python runtests.py --shell
>> $ python
>> Python 2.7.5+
>> [GCC 4.8.1] on linux2
>> >>> import numpy as np
>> >>> np.__version__
>> '1.9.0.dev-e3f0f53'
>> >>> A = np.array([[1e3, 0], [0, 1]])
>> >>> B = np.array([[1e300, 0], [0, 1]])
>> >>> C = np.array([[1e3000, 0], [0, 1]])
>> >>> np.linalg.svd(A)
>> (array([[ 1.,  0.],
>>        [ 0.,  1.]]), array([ 1000.,     1.]), array([[ 1.,  0.],
>>        [ 0.,  1.]]))
>> >>> np.linalg.svd(B)
>> (array([[ 1.,  0.],
>>        [ 0.,  1.]]), array([  1.00000000e+300,   1.00000000e+000]),
>> array([[ 1.,  0.],
>>        [ 0.,  1.]]))
>> >>> np.linalg.svd(C)
>> [hangs forever]
>> ```
>> """
>>
>> Alex
>>
>
> I'm -1 on checking finiteness - if there's one place you usually want
> maximum performance it's linear algebra operations.
>
> It certainly shouldn't crash or hang though and for me at least it doesn't -
> it returns NaN

btw when I use the python/numpy/openblas packaged for ubuntu, I also
get NaN.  The infinite loop appears when I build numpy letting it use
its lapack lite.  I don't know which LAPACK Josef uses to get the
weird behavior he observes "13% cpu usage for a hanging process".

This is consistent with the scipy svd docstring describing its
check_finite flag, where it warns "Disabling may give a performance
gain, but may result in problems (crashes, non-termination) if the
inputs do contain infinities or NaNs."  I think this caveat also
applies to most numpy linalg functions that connect more or less
directly to lapack.



More information about the NumPy-Discussion mailing list