Problem with numpy.linalg.eig?
![](https://secure.gravatar.com/avatar/010a29141d99dad896beef506bfa3dca.jpg?s=120&d=mm&r=g)
Hi all, The following code calling numpy v1.0.4 fails to terminate on my machine, which was not the case with v1.0.3.1 from numpy import arange, float64 from numpy.linalg import eig a = arange(13*13, dtype = float64) a.shape = (13,13) a = a%17 eig(a) Regards, Peter
![](https://secure.gravatar.com/avatar/6a1dc50b8d79fe3b9a5e9f5d8a118901.jpg?s=120&d=mm&r=g)
On Nov 12, 2007 10:10 AM, Peter Creasey <p.e.creasey.00@googlemail.com> wrote:
It sounds like the same problem that was reported in this thread: http://thread.gmane.org/gmane.comp.python.numeric.general/17456/focus=17465 A friend of mine says the windows binary of 1.0.4 also hangs on eigh and lstsq (so linalg in general). I don't have that problem on 1.0.5 compiled on GNU/Linux.
![](https://secure.gravatar.com/avatar/59bdb3784070f0a6836aca9ee03ad817.jpg?s=120&d=mm&r=g)
On Nov 13, 2007 3:37 AM, Keith Goodman <kwgoodman@gmail.com> wrote:
Could you friend try the binaries there: http://projects.scipy.org/pipermail/numpy-discussion/2007-November/029811.ht... This may be a problem related to the blas/lapack used for the binaries. The binaries posted above use non optimized BLAS/LAPACK (I can update to 1.0.4 if this is a problem). cheers, David
![](https://secure.gravatar.com/avatar/9820b5956634e5bbad7f4ed91a232822.jpg?s=120&d=mm&r=g)
Keith Goodman wrote:
Here we are: http://www.ar.media.kyoto-u.ac.jp/members/david/archives/numpy-1.0.4.win32-p... David
![](https://secure.gravatar.com/avatar/6a1dc50b8d79fe3b9a5e9f5d8a118901.jpg?s=120&d=mm&r=g)
On Nov 13, 2007 8:42 PM, David Cournapeau <david@ar.media.kyoto-u.ac.jp> wrote:
Thank you. He said it worked. He didn't even notice a slow down without ATLAS. On some calculations the results were different (between 1.0.2 and 1.0.4) in the last three decimal places. But that's to be expected, right? ATLAS must give different results than the non optimized alternative.
![](https://secure.gravatar.com/avatar/9820b5956634e5bbad7f4ed91a232822.jpg?s=120&d=mm&r=g)
Keith Goodman wrote:
So all in all, I think it worths considering just using netlib BLAS/LAPACK instead of ATLAS for binaries, at least on windows (I don't know who is responsible for the windows binaries); note that we still do not know why the official binaries hang, which is bothering. cheers, David
![](https://secure.gravatar.com/avatar/9820b5956634e5bbad7f4ed91a232822.jpg?s=120&d=mm&r=g)
Geoffrey Zhu wrote: previously, it is hard to know for sure without being able to reproduce the bug on our own workstation, but if this is a problem between fortran and C argument passing, then the result can be pretty random since the problem reduced to a pointer pointing at a wrong address (crash, wrong value, etc...). cheers, David
![](https://secure.gravatar.com/avatar/11bc81e43c7f85543e5245030195455d.jpg?s=120&d=mm&r=g)
On Nov 12, 2007 10:26 PM, David Cournapeau <david@ar.media.kyoto-u.ac.jp> wrote:
Yes, with the MSI I can always reproduce the problem with numpy.test(). It always hangs.With the egg it does not hang. Pointer problems are usually random, but not random if we are using the same binaries in EGG and MSI and variables are always initialized to certain value.
![](https://secure.gravatar.com/avatar/9820b5956634e5bbad7f4ed91a232822.jpg?s=120&d=mm&r=g)
Geoffrey Zhu wrote:
Ok, could you try this: http://www.ar.media.kyoto-u.ac.jp/members/david/archives/numpy-1.0.4.win32-p... I built it with mingwin, and with NETLIB BLAS/LAPACK. This is just for testing, do not use it for anything else. David
![](https://secure.gravatar.com/avatar/11bc81e43c7f85543e5245030195455d.jpg?s=120&d=mm&r=g)
On Nov 13, 2007 2:37 AM, David Cournapeau <david@ar.media.kyoto-u.ac.jp> wrote:
I just tried. Your new MSI does not hang on numpy.test() and the OP's test case, and it seems faster. The EGG version does not hang on numpy.test() but does on OP's test. The original MSI version hangs on numpy.test() if I open IDLE and type import numpy numpy.test() If I try the OP's test first, once it hang on "from numpy.linalg import eig" and the other time it ran successfully. After it ran successfully, it ran numpy.test() successfully, too. As you said, it is random.
![](https://secure.gravatar.com/avatar/d1189ce0778ab7465ef33b4883250d01.jpg?s=120&d=mm&r=g)
On 13 Nov 2007, at 9:43 AM, Geoffrey Zhu wrote:
I have also been having random problems with the latest numpy from svn built on an Intel core 2 Duo Linux box running in 64 bit mode under Red Hat 3.4.6-8 with the gcc 3.4.6 20060404 and ATLAS 3.8.0. I am having a problem with numpy.linalg.eigh and complex Hermitian matrices. Randomly, I get seemingly correct answers, and then eigenvectors full of Nan's (though not completely. The first row the the eigenvectors seem to be numbers, but incorrect.) Sometimes, I can stop just after the error with pdb and "play". Calling eigh from the debugger sometimes gives a correct answer, and then other times gives eigenvalues and eigenvectors full of Nan's (not completely full mind you). For example: (Pdb) p numpy.linalg.eigh(HH) (array([-50.50589438, -45.86305013, -40.56713543, -35.57216233, 38.1497506 , 40.17291371, 43.35773763, 46.59527636, 49.42413434, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN]), array([[-0.00072424 +0.j, -0.00136655 +0.j, 0.00200233 +0.j, ..., 0.00000000 +0.j, 0.00000000 +0.j, 0.00000000 +0.j], [ NaN NaNj, NaN NaNj, NaN NaNj, ..., NaN NaNj, NaN NaNj, NaN NaNj], [ NaN NaNj, NaN NaNj, NaN NaNj, ..., NaN NaNj, NaN NaNj, NaN NaNj], ..., [ NaN NaNj, NaN NaNj, NaN NaNj, ..., NaN NaNj, NaN NaNj, NaN NaNj], [ NaN NaNj, NaN NaNj, NaN NaNj, ..., NaN NaNj, NaN NaNj, NaN NaNj], [ NaN NaNj, NaN NaNj, NaN NaNj, ..., NaN NaNj, NaN NaNj, NaN NaNj]])) (Pdb) p numpy.linalg.eigh(HH) (array([-51.06208813, -48.50332834, -48.49643331, -46.25814405, -46.25813858, -44.33668063, -44.33668063, -42.73548661, -42.73548661, -41.45454929, -41.45454929, -40.49386126, -40.49386126, -39.85344006, -39.85344006, -39.53308677, -39.53308677, 37.91885011, 37.91885011, 38.2392034 , 38.2392034 , 38.8796246 , 38.8796246 , 39.84031263, 39.84031263, 41.12124995, 41.12124995, 42.72244397, 42.72244398, 44.64390192, 44.6439074 , 46.88219666, 46.88909168, 49.44785148]), array([[ -5.28060016e-04 +0.00000000e+00j, -3.92271866e-05 +0.00000000e+00j, 7.72453920e-04 +0.00000000e +00j, ..., -3.36896226e-01 +0.00000000e+00j, 6.28651296e-02 +0.00000000e+00j, -2.42202473e-01 +0.00000000e+00j], [ 1.48818848e-03 +2.78190640e-04j, 1.06069959e-03 +1.98279117e-04j, -1.88322135e-03 -3.52035081e-04j, ..., 2.86677919e-01 +5.35893907e-02j, -1.77188491e-01 -3.31222694e-02j, 2.38244862e-01 +4.45356831e-02j], [ -2.14234988e-03 -8.29950766e-04j, -2.44246082e-03 -9.46214364e-04j, 1.92200953e-03 +7.44590459e-04j, ..., -1.92999931e-01 -7.47685718e-02j, 2.55119386e-01 +9.88337767e-02j, -2.26238355e-01 -8.76452055e-02j], ..., [ 2.06281453e-01 -1.27724068e-01j, -2.32614835e-01 +1.44029008e-01j, -1.75975052e-01 +1.08959139e-01j, ..., 1.75246553e-03 -1.08508072e-03j, 2.22700685e-03 -1.37890426e-03j, 1.95336925e-03 -1.20947504e-03j], [ -2.26004880e-01 +8.75547569e-02j, 1.68085319e-01 -6.51165996e-02j, 2.71949658e-01 -1.05353859e-01j, ..., -1.78646965e-03 +6.92082029e-04j, -1.00620547e-03 +3.89806076e-04j, -1.41173185e-03 +5.46907831e-04j], [ 2.38078516e-01 -4.45045876e-02j, -6.17947313e-02 +1.15514373e-02j, -3.31159928e-01 +6.19045191e-02j, ..., 7.59301424e-04 -1.41938035e-04j, 3.85592692e-05 -7.20797663e-06j, 5.19068791e-04 -9.70307734e-05j]])) Here is the version info (Everything build from scratch, numpy from svn):
Using ATLAS-3.8.0. This is extremely annoying, and difficult to reproduce. I will try recompiling with some different versions and see if I can reproduce the problem. Running numpy.test() does *not* fail... Michael.
![](https://secure.gravatar.com/avatar/d1189ce0778ab7465ef33b4883250d01.jpg?s=120&d=mm&r=g)
On 16 Nov 2007, at 1:46 AM, Michael McNeil Forbes wrote:
Just an update. I am still having this problem, along with some additional problems where occasionally even dot returns nan's. I have confirmed that without ATLAS everything seems to be fine, and that the problem still remains with newer versions of ATLAS, Python, gcc etc. ATLAS was configured with ../configure --prefix=${BASE}/apps/${ATLAS}_${SUFFIX}\ --with-netlib-lapack=${BASE}/src/${LAPACK}_${SUFFIX}/ lapack_LINUX.a\ -A Core2Duo64SSE3\ --cflags=-fPIC\ -Fa alg -fPIC and it passed all the tests. The problem still exists with ATLAS version 3.8.1, gcc 4.3.0, and recent versions of numpy.
I have managed to extract a matrix that causes this failure repeatedly once every two or four times eigh is called, so hopefully I should be able to run gdb and track down the problem...
![](https://secure.gravatar.com/avatar/6a1dc50b8d79fe3b9a5e9f5d8a118901.jpg?s=120&d=mm&r=g)
On Nov 12, 2007 10:10 AM, Peter Creasey <p.e.creasey.00@googlemail.com> wrote:
It sounds like the same problem that was reported in this thread: http://thread.gmane.org/gmane.comp.python.numeric.general/17456/focus=17465 A friend of mine says the windows binary of 1.0.4 also hangs on eigh and lstsq (so linalg in general). I don't have that problem on 1.0.5 compiled on GNU/Linux.
![](https://secure.gravatar.com/avatar/59bdb3784070f0a6836aca9ee03ad817.jpg?s=120&d=mm&r=g)
On Nov 13, 2007 3:37 AM, Keith Goodman <kwgoodman@gmail.com> wrote:
Could you friend try the binaries there: http://projects.scipy.org/pipermail/numpy-discussion/2007-November/029811.ht... This may be a problem related to the blas/lapack used for the binaries. The binaries posted above use non optimized BLAS/LAPACK (I can update to 1.0.4 if this is a problem). cheers, David
![](https://secure.gravatar.com/avatar/9820b5956634e5bbad7f4ed91a232822.jpg?s=120&d=mm&r=g)
Keith Goodman wrote:
Here we are: http://www.ar.media.kyoto-u.ac.jp/members/david/archives/numpy-1.0.4.win32-p... David
![](https://secure.gravatar.com/avatar/6a1dc50b8d79fe3b9a5e9f5d8a118901.jpg?s=120&d=mm&r=g)
On Nov 13, 2007 8:42 PM, David Cournapeau <david@ar.media.kyoto-u.ac.jp> wrote:
Thank you. He said it worked. He didn't even notice a slow down without ATLAS. On some calculations the results were different (between 1.0.2 and 1.0.4) in the last three decimal places. But that's to be expected, right? ATLAS must give different results than the non optimized alternative.
![](https://secure.gravatar.com/avatar/9820b5956634e5bbad7f4ed91a232822.jpg?s=120&d=mm&r=g)
Keith Goodman wrote:
So all in all, I think it worths considering just using netlib BLAS/LAPACK instead of ATLAS for binaries, at least on windows (I don't know who is responsible for the windows binaries); note that we still do not know why the official binaries hang, which is bothering. cheers, David
![](https://secure.gravatar.com/avatar/9820b5956634e5bbad7f4ed91a232822.jpg?s=120&d=mm&r=g)
Geoffrey Zhu wrote: previously, it is hard to know for sure without being able to reproduce the bug on our own workstation, but if this is a problem between fortran and C argument passing, then the result can be pretty random since the problem reduced to a pointer pointing at a wrong address (crash, wrong value, etc...). cheers, David
![](https://secure.gravatar.com/avatar/11bc81e43c7f85543e5245030195455d.jpg?s=120&d=mm&r=g)
On Nov 12, 2007 10:26 PM, David Cournapeau <david@ar.media.kyoto-u.ac.jp> wrote:
Yes, with the MSI I can always reproduce the problem with numpy.test(). It always hangs.With the egg it does not hang. Pointer problems are usually random, but not random if we are using the same binaries in EGG and MSI and variables are always initialized to certain value.
![](https://secure.gravatar.com/avatar/9820b5956634e5bbad7f4ed91a232822.jpg?s=120&d=mm&r=g)
Geoffrey Zhu wrote:
Ok, could you try this: http://www.ar.media.kyoto-u.ac.jp/members/david/archives/numpy-1.0.4.win32-p... I built it with mingwin, and with NETLIB BLAS/LAPACK. This is just for testing, do not use it for anything else. David
![](https://secure.gravatar.com/avatar/11bc81e43c7f85543e5245030195455d.jpg?s=120&d=mm&r=g)
On Nov 13, 2007 2:37 AM, David Cournapeau <david@ar.media.kyoto-u.ac.jp> wrote:
I just tried. Your new MSI does not hang on numpy.test() and the OP's test case, and it seems faster. The EGG version does not hang on numpy.test() but does on OP's test. The original MSI version hangs on numpy.test() if I open IDLE and type import numpy numpy.test() If I try the OP's test first, once it hang on "from numpy.linalg import eig" and the other time it ran successfully. After it ran successfully, it ran numpy.test() successfully, too. As you said, it is random.
![](https://secure.gravatar.com/avatar/d1189ce0778ab7465ef33b4883250d01.jpg?s=120&d=mm&r=g)
On 13 Nov 2007, at 9:43 AM, Geoffrey Zhu wrote:
I have also been having random problems with the latest numpy from svn built on an Intel core 2 Duo Linux box running in 64 bit mode under Red Hat 3.4.6-8 with the gcc 3.4.6 20060404 and ATLAS 3.8.0. I am having a problem with numpy.linalg.eigh and complex Hermitian matrices. Randomly, I get seemingly correct answers, and then eigenvectors full of Nan's (though not completely. The first row the the eigenvectors seem to be numbers, but incorrect.) Sometimes, I can stop just after the error with pdb and "play". Calling eigh from the debugger sometimes gives a correct answer, and then other times gives eigenvalues and eigenvectors full of Nan's (not completely full mind you). For example: (Pdb) p numpy.linalg.eigh(HH) (array([-50.50589438, -45.86305013, -40.56713543, -35.57216233, 38.1497506 , 40.17291371, 43.35773763, 46.59527636, 49.42413434, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN]), array([[-0.00072424 +0.j, -0.00136655 +0.j, 0.00200233 +0.j, ..., 0.00000000 +0.j, 0.00000000 +0.j, 0.00000000 +0.j], [ NaN NaNj, NaN NaNj, NaN NaNj, ..., NaN NaNj, NaN NaNj, NaN NaNj], [ NaN NaNj, NaN NaNj, NaN NaNj, ..., NaN NaNj, NaN NaNj, NaN NaNj], ..., [ NaN NaNj, NaN NaNj, NaN NaNj, ..., NaN NaNj, NaN NaNj, NaN NaNj], [ NaN NaNj, NaN NaNj, NaN NaNj, ..., NaN NaNj, NaN NaNj, NaN NaNj], [ NaN NaNj, NaN NaNj, NaN NaNj, ..., NaN NaNj, NaN NaNj, NaN NaNj]])) (Pdb) p numpy.linalg.eigh(HH) (array([-51.06208813, -48.50332834, -48.49643331, -46.25814405, -46.25813858, -44.33668063, -44.33668063, -42.73548661, -42.73548661, -41.45454929, -41.45454929, -40.49386126, -40.49386126, -39.85344006, -39.85344006, -39.53308677, -39.53308677, 37.91885011, 37.91885011, 38.2392034 , 38.2392034 , 38.8796246 , 38.8796246 , 39.84031263, 39.84031263, 41.12124995, 41.12124995, 42.72244397, 42.72244398, 44.64390192, 44.6439074 , 46.88219666, 46.88909168, 49.44785148]), array([[ -5.28060016e-04 +0.00000000e+00j, -3.92271866e-05 +0.00000000e+00j, 7.72453920e-04 +0.00000000e +00j, ..., -3.36896226e-01 +0.00000000e+00j, 6.28651296e-02 +0.00000000e+00j, -2.42202473e-01 +0.00000000e+00j], [ 1.48818848e-03 +2.78190640e-04j, 1.06069959e-03 +1.98279117e-04j, -1.88322135e-03 -3.52035081e-04j, ..., 2.86677919e-01 +5.35893907e-02j, -1.77188491e-01 -3.31222694e-02j, 2.38244862e-01 +4.45356831e-02j], [ -2.14234988e-03 -8.29950766e-04j, -2.44246082e-03 -9.46214364e-04j, 1.92200953e-03 +7.44590459e-04j, ..., -1.92999931e-01 -7.47685718e-02j, 2.55119386e-01 +9.88337767e-02j, -2.26238355e-01 -8.76452055e-02j], ..., [ 2.06281453e-01 -1.27724068e-01j, -2.32614835e-01 +1.44029008e-01j, -1.75975052e-01 +1.08959139e-01j, ..., 1.75246553e-03 -1.08508072e-03j, 2.22700685e-03 -1.37890426e-03j, 1.95336925e-03 -1.20947504e-03j], [ -2.26004880e-01 +8.75547569e-02j, 1.68085319e-01 -6.51165996e-02j, 2.71949658e-01 -1.05353859e-01j, ..., -1.78646965e-03 +6.92082029e-04j, -1.00620547e-03 +3.89806076e-04j, -1.41173185e-03 +5.46907831e-04j], [ 2.38078516e-01 -4.45045876e-02j, -6.17947313e-02 +1.15514373e-02j, -3.31159928e-01 +6.19045191e-02j, ..., 7.59301424e-04 -1.41938035e-04j, 3.85592692e-05 -7.20797663e-06j, 5.19068791e-04 -9.70307734e-05j]])) Here is the version info (Everything build from scratch, numpy from svn):
Using ATLAS-3.8.0. This is extremely annoying, and difficult to reproduce. I will try recompiling with some different versions and see if I can reproduce the problem. Running numpy.test() does *not* fail... Michael.
![](https://secure.gravatar.com/avatar/d1189ce0778ab7465ef33b4883250d01.jpg?s=120&d=mm&r=g)
On 16 Nov 2007, at 1:46 AM, Michael McNeil Forbes wrote:
Just an update. I am still having this problem, along with some additional problems where occasionally even dot returns nan's. I have confirmed that without ATLAS everything seems to be fine, and that the problem still remains with newer versions of ATLAS, Python, gcc etc. ATLAS was configured with ../configure --prefix=${BASE}/apps/${ATLAS}_${SUFFIX}\ --with-netlib-lapack=${BASE}/src/${LAPACK}_${SUFFIX}/ lapack_LINUX.a\ -A Core2Duo64SSE3\ --cflags=-fPIC\ -Fa alg -fPIC and it passed all the tests. The problem still exists with ATLAS version 3.8.1, gcc 4.3.0, and recent versions of numpy.
I have managed to extract a matrix that causes this failure repeatedly once every two or four times eigh is called, so hopefully I should be able to run gdb and track down the problem...
participants (6)
-
David Cournapeau
-
David Cournapeau
-
Geoffrey Zhu
-
Keith Goodman
-
Michael McNeil Forbes
-
Peter Creasey