Problem with numpy.linalg.eig?
Hi all, The following code calling numpy v1.0.4 fails to terminate on my machine, which was not the case with v1.0.3.1 from numpy import arange, float64 from numpy.linalg import eig a = arange(13*13, dtype = float64) a.shape = (13,13) a = a%17 eig(a) Regards, Peter
On Nov 12, 2007 10:10 AM, Peter Creasey <p.e.creasey.00@googlemail.com> wrote:
The following code calling numpy v1.0.4 fails to terminate on my machine, which was not the case with v1.0.3.1
from numpy import arange, float64 from numpy.linalg import eig a = arange(13*13, dtype = float64) a.shape = (13,13) a = a%17 eig(a)
It sounds like the same problem that was reported in this thread: http://thread.gmane.org/gmane.comp.python.numeric.general/17456/focus=17465 A friend of mine says the windows binary of 1.0.4 also hangs on eigh and lstsq (so linalg in general). I don't have that problem on 1.0.5 compiled on GNU/Linux.
On Nov 13, 2007 3:37 AM, Keith Goodman <kwgoodman@gmail.com> wrote:
On Nov 12, 2007 10:10 AM, Peter Creasey <p.e.creasey.00@googlemail.com> wrote:
The following code calling numpy v1.0.4 fails to terminate on my machine, which was not the case with v1.0.3.1
from numpy import arange, float64 from numpy.linalg import eig a = arange(13*13, dtype = float64) a.shape = (13,13) a = a%17 eig(a)
It sounds like the same problem that was reported in this thread:
http://thread.gmane.org/gmane.comp.python.numeric.general/17456/focus=17465
A friend of mine says the windows binary of 1.0.4 also hangs on eigh and lstsq (so linalg in general). I don't have that problem on 1.0.5 compiled on GNU/Linux.
Could you friend try the binaries there: http://projects.scipy.org/pipermail/numpydiscussion/2007November/029811.ht... This may be a problem related to the blas/lapack used for the binaries. The binaries posted above use non optimized BLAS/LAPACK (I can update to 1.0.4 if this is a problem). cheers, David
On Nov 12, 2007 10:51 AM, David Cournapeau <cournape@gmail.com> wrote:
On Nov 13, 2007 3:37 AM, Keith Goodman <kwgoodman@gmail.com> wrote:
On Nov 12, 2007 10:10 AM, Peter Creasey <p.e.creasey.00@googlemail.com> wrote:
The following code calling numpy v1.0.4 fails to terminate on my machine, which was not the case with v1.0.3.1
from numpy import arange, float64 from numpy.linalg import eig a = arange(13*13, dtype = float64) a.shape = (13,13) a = a%17 eig(a)
It sounds like the same problem that was reported in this thread:
http://thread.gmane.org/gmane.comp.python.numeric.general/17456/focus=17465
A friend of mine says the windows binary of 1.0.4 also hangs on eigh and lstsq (so linalg in general). I don't have that problem on 1.0.5 compiled on GNU/Linux.
Could you friend try the binaries there:
http://projects.scipy.org/pipermail/numpydiscussion/2007November/029811.ht...
This may be a problem related to the blas/lapack used for the binaries. The binaries posted above use non optimized BLAS/LAPACK (I can update to 1.0.4 if this is a problem).
He tried. But he is using python 2.4. (He said your binary was for 2.5).
Keith Goodman wrote:
On Nov 12, 2007 10:51 AM, David Cournapeau <cournape@gmail.com> wrote:
On Nov 13, 2007 3:37 AM, Keith Goodman <kwgoodman@gmail.com> wrote:
On Nov 12, 2007 10:10 AM, Peter Creasey <p.e.creasey.00@googlemail.com> wrote:
The following code calling numpy v1.0.4 fails to terminate on my machine, which was not the case with v1.0.3.1
from numpy import arange, float64 from numpy.linalg import eig a = arange(13*13, dtype = float64) a.shape = (13,13) a = a%17 eig(a)
It sounds like the same problem that was reported in this thread:
http://thread.gmane.org/gmane.comp.python.numeric.general/17456/focus=17465
A friend of mine says the windows binary of 1.0.4 also hangs on eigh and lstsq (so linalg in general). I don't have that problem on 1.0.5 compiled on GNU/Linux.
Could you friend try the binaries there:
http://projects.scipy.org/pipermail/numpydiscussion/2007November/029811.ht...
This may be a problem related to the blas/lapack used for the binaries. The binaries posted above use non optimized BLAS/LAPACK (I can update to 1.0.4 if this is a problem).
He tried. But he is using python 2.4. (He said your binary was for 2.5).
Here we are: http://www.ar.media.kyotou.ac.jp/members/david/archives/numpy1.0.4.win32p... David
On Nov 13, 2007 8:42 PM, David Cournapeau <david@ar.media.kyotou.ac.jp> wrote:
Keith Goodman wrote:
On Nov 12, 2007 10:51 AM, David Cournapeau <cournape@gmail.com> wrote:
On Nov 13, 2007 3:37 AM, Keith Goodman <kwgoodman@gmail.com> wrote:
On Nov 12, 2007 10:10 AM, Peter Creasey <p.e.creasey.00@googlemail.com> wrote:
The following code calling numpy v1.0.4 fails to terminate on my machine, which was not the case with v1.0.3.1
from numpy import arange, float64 from numpy.linalg import eig a = arange(13*13, dtype = float64) a.shape = (13,13) a = a%17 eig(a)
It sounds like the same problem that was reported in this thread:
http://thread.gmane.org/gmane.comp.python.numeric.general/17456/focus=17465
A friend of mine says the windows binary of 1.0.4 also hangs on eigh and lstsq (so linalg in general). I don't have that problem on 1.0.5 compiled on GNU/Linux.
Could you friend try the binaries there:
http://projects.scipy.org/pipermail/numpydiscussion/2007November/029811.ht...
This may be a problem related to the blas/lapack used for the binaries. The binaries posted above use non optimized BLAS/LAPACK (I can update to 1.0.4 if this is a problem).
He tried. But he is using python 2.4. (He said your binary was for 2.5).
Here we are:
http://www.ar.media.kyotou.ac.jp/members/david/archives/numpy1.0.4.win32p...
Thank you. He said it worked. He didn't even notice a slow down without ATLAS. On some calculations the results were different (between 1.0.2 and 1.0.4) in the last three decimal places. But that's to be expected, right? ATLAS must give different results than the non optimized alternative.
Keith Goodman wrote:
On Nov 13, 2007 8:42 PM, David Cournapeau <david@ar.media.kyotou.ac.jp> wrote:
Here we are:
http://www.ar.media.kyotou.ac.jp/members/david/archives/numpy1.0.4.win32p...
Thank you. He said it worked. He didn't even notice a slow down without ATLAS. On some calculations the results were different (between 1.0.2 and 1.0.4) in the last three decimal places. But that's to be expected, right? I don't think there is any chance to have exactly the same results (compiler/OS/CPU/BLAS all enter the equation). ATLAS will not change much the general performances of numpy: this only enter the equation for some functions (numpy.dot) and linear algebra of course, for relatively big numbers. For example, in my use case (linear algebra with maximum a few tens dimensions), ATLAS does not give much outside numpy.dot. And anyway, if you want good performance from atlas, you should compile it by yourself (ATLAS performance seems to really depend on the size of L1 cache, for example).
So all in all, I think it worths considering just using netlib BLAS/LAPACK instead of ATLAS for binaries, at least on windows (I don't know who is responsible for the windows binaries); note that we still do not know why the official binaries hang, which is bothering. cheers, David
On Nov 12, 2007 12:37 PM, Keith Goodman <kwgoodman@gmail.com> wrote:
On Nov 12, 2007 10:10 AM, Peter Creasey <p.e.creasey.00@googlemail.com> wrote:
The following code calling numpy v1.0.4 fails to terminate on my machine, which was not the case with v1.0.3.1
from numpy import arange, float64 from numpy.linalg import eig a = arange(13*13, dtype = float64) a.shape = (13,13) a = a%17 eig(a)
It sounds like the same problem that was reported in this thread:
http://thread.gmane.org/gmane.comp.python.numeric.general/17456/focus=17465
A friend of mine says the windows binary of 1.0.4 also hangs on eigh and lstsq (so linalg in general). I don't have that problem on 1.0.5 compiled on GNU/Linux.
_______________________________________________ Numpydiscussion mailing list Numpydiscussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpydiscussion
The code hangs on my machine too. In the thread you mentioned above, I wrote that using the EGG instead of MSI appears to fix the numpy.test() problem, but maybe it just somehow hides it.
On Nov 12, 2007 12:37 PM, Keith Goodman <kwgoodman@gmail.com> wrote:
On Nov 12, 2007 10:10 AM, Peter Creasey <p.e.creasey.00@googlemail.com> wrote:
The following code calling numpy v1.0.4 fails to terminate on my machine, which was not the case with v1.0.3.1
from numpy import arange, float64 from numpy.linalg import eig a = arange(13*13, dtype = float64) a.shape = (13,13) a = a%17 eig(a) It sounds like the same problem that was reported in this thread:
http://thread.gmane.org/gmane.comp.python.numeric.general/17456/focus=17465
The code hangs on my machine too. In the thread you mentioned above, I wrote that using the EGG instead of MSI appears to fix the numpy.test() problem, but maybe it just somehow hides it. When you use the MSI, can you always reproduce the problem ? As I said
Geoffrey Zhu wrote: previously, it is hard to know for sure without being able to reproduce the bug on our own workstation, but if this is a problem between fortran and C argument passing, then the result can be pretty random since the problem reduced to a pointer pointing at a wrong address (crash, wrong value, etc...). cheers, David
On Nov 12, 2007 10:26 PM, David Cournapeau <david@ar.media.kyotou.ac.jp> wrote:
On Nov 12, 2007 12:37 PM, Keith Goodman <kwgoodman@gmail.com> wrote:
On Nov 12, 2007 10:10 AM, Peter Creasey <p.e.creasey.00@googlemail.com> wrote:
The following code calling numpy v1.0.4 fails to terminate on my machine, which was not the case with v1.0.3.1
from numpy import arange, float64 from numpy.linalg import eig a = arange(13*13, dtype = float64) a.shape = (13,13) a = a%17 eig(a) It sounds like the same problem that was reported in this thread:
http://thread.gmane.org/gmane.comp.python.numeric.general/17456/focus=17465
The code hangs on my machine too. In the thread you mentioned above, I wrote that using the EGG instead of MSI appears to fix the numpy.test() problem, but maybe it just somehow hides it. When you use the MSI, can you always reproduce the problem ? As I said
Geoffrey Zhu wrote: previously, it is hard to know for sure without being able to reproduce the bug on our own workstation, but if this is a problem between fortran and C argument passing, then the result can be pretty random since the problem reduced to a pointer pointing at a wrong address (crash, wrong value, etc...).
cheers,
David
_______________________________________________ Numpydiscussion mailing list Numpydiscussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpydiscussion
Yes, with the MSI I can always reproduce the problem with numpy.test(). It always hangs.With the egg it does not hang. Pointer problems are usually random, but not random if we are using the same binaries in EGG and MSI and variables are always initialized to certain value.
Geoffrey Zhu wrote:
Yes, with the MSI I can always reproduce the problem with numpy.test(). It always hangs.With the egg it does not hang. Pointer problems are usually random, but not random if we are using the same binaries in EGG and MSI and variables are always initialized to certain value.
Ok, could you try this: http://www.ar.media.kyotou.ac.jp/members/david/archives/numpy1.0.4.win32p... I built it with mingwin, and with NETLIB BLAS/LAPACK. This is just for testing, do not use it for anything else. David
On Nov 13, 2007 2:37 AM, David Cournapeau <david@ar.media.kyotou.ac.jp> wrote:
Geoffrey Zhu wrote:
Yes, with the MSI I can always reproduce the problem with numpy.test(). It always hangs.With the egg it does not hang. Pointer problems are usually random, but not random if we are using the same binaries in EGG and MSI and variables are always initialized to certain value.
Ok, could you try this:
http://www.ar.media.kyotou.ac.jp/members/david/archives/numpy1.0.4.win32p...
I built it with mingwin, and with NETLIB BLAS/LAPACK. This is just for testing, do not use it for anything else.
David _______________________________________________ Numpydiscussion mailing list Numpydiscussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpydiscussion
I just tried. Your new MSI does not hang on numpy.test() and the OP's test case, and it seems faster. The EGG version does not hang on numpy.test() but does on OP's test. The original MSI version hangs on numpy.test() if I open IDLE and type import numpy numpy.test() If I try the OP's test first, once it hang on "from numpy.linalg import eig" and the other time it ran successfully. After it ran successfully, it ran numpy.test() successfully, too. As you said, it is random.
On 13 Nov 2007, at 9:43 AM, Geoffrey Zhu wrote:
On Nov 13, 2007 2:37 AM, David Cournapeau <david@ar.media.kyoto u.ac.jp> wrote:
Geoffrey Zhu wrote:
Pointer problems are usually random... ... The original MSI version hangs on numpy.test() if I open IDLE and type
import numpy numpy.test()
If I try the OP's test first, once it hang on "from numpy.linalg import eig" and the other time it ran successfully. After it ran successfully, it ran numpy.test() successfully, too.
As you said, it is random.
I have also been having random problems with the latest numpy from svn built on an Intel core 2 Duo Linux box running in 64 bit mode under Red Hat 3.4.68 with the gcc 3.4.6 20060404 and ATLAS 3.8.0. I am having a problem with numpy.linalg.eigh and complex Hermitian matrices. Randomly, I get seemingly correct answers, and then eigenvectors full of Nan's (though not completely. The first row the the eigenvectors seem to be numbers, but incorrect.) Sometimes, I can stop just after the error with pdb and "play". Calling eigh from the debugger sometimes gives a correct answer, and then other times gives eigenvalues and eigenvectors full of Nan's (not completely full mind you). For example: (Pdb) p numpy.linalg.eigh(HH) (array([50.50589438, 45.86305013, 40.56713543, 35.57216233, 38.1497506 , 40.17291371, 43.35773763, 46.59527636, 49.42413434, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN]), array([[0.00072424 +0.j, 0.00136655 +0.j, 0.00200233 +0.j, ..., 0.00000000 +0.j, 0.00000000 +0.j, 0.00000000 +0.j], [ NaN NaNj, NaN NaNj, NaN NaNj, ..., NaN NaNj, NaN NaNj, NaN NaNj], [ NaN NaNj, NaN NaNj, NaN NaNj, ..., NaN NaNj, NaN NaNj, NaN NaNj], ..., [ NaN NaNj, NaN NaNj, NaN NaNj, ..., NaN NaNj, NaN NaNj, NaN NaNj], [ NaN NaNj, NaN NaNj, NaN NaNj, ..., NaN NaNj, NaN NaNj, NaN NaNj], [ NaN NaNj, NaN NaNj, NaN NaNj, ..., NaN NaNj, NaN NaNj, NaN NaNj]])) (Pdb) p numpy.linalg.eigh(HH) (array([51.06208813, 48.50332834, 48.49643331, 46.25814405, 46.25813858, 44.33668063, 44.33668063, 42.73548661, 42.73548661, 41.45454929, 41.45454929, 40.49386126, 40.49386126, 39.85344006, 39.85344006, 39.53308677, 39.53308677, 37.91885011, 37.91885011, 38.2392034 , 38.2392034 , 38.8796246 , 38.8796246 , 39.84031263, 39.84031263, 41.12124995, 41.12124995, 42.72244397, 42.72244398, 44.64390192, 44.6439074 , 46.88219666, 46.88909168, 49.44785148]), array([[ 5.28060016e04 +0.00000000e+00j, 3.92271866e05 +0.00000000e+00j, 7.72453920e04 +0.00000000e +00j, ..., 3.36896226e01 +0.00000000e+00j, 6.28651296e02 +0.00000000e+00j, 2.42202473e01 +0.00000000e+00j], [ 1.48818848e03 +2.78190640e04j, 1.06069959e03 +1.98279117e04j, 1.88322135e03 3.52035081e04j, ..., 2.86677919e01 +5.35893907e02j, 1.77188491e01 3.31222694e02j, 2.38244862e01 +4.45356831e02j], [ 2.14234988e03 8.29950766e04j, 2.44246082e03 9.46214364e04j, 1.92200953e03 +7.44590459e04j, ..., 1.92999931e01 7.47685718e02j, 2.55119386e01 +9.88337767e02j, 2.26238355e01 8.76452055e02j], ..., [ 2.06281453e01 1.27724068e01j, 2.32614835e01 +1.44029008e01j, 1.75975052e01 +1.08959139e01j, ..., 1.75246553e03 1.08508072e03j, 2.22700685e03 1.37890426e03j, 1.95336925e03 1.20947504e03j], [ 2.26004880e01 +8.75547569e02j, 1.68085319e01 6.51165996e02j, 2.71949658e01 1.05353859e01j, ..., 1.78646965e03 +6.92082029e04j, 1.00620547e03 +3.89806076e04j, 1.41173185e03 +5.46907831e04j], [ 2.38078516e01 4.45045876e02j, 6.17947313e02 +1.15514373e02j, 3.31159928e01 +6.19045191e02j, ..., 7.59301424e04 1.41938035e04j, 3.85592692e05 7.20797663e06j, 5.19068791e04 9.70307734e05j]])) Here is the version info (Everything build from scratch, numpy from svn):
sys.version '2.5.1 (r251:54863, Nov 10 2007, 00:44:16) \n[GCC 3.4.6 20060404 (Red Hat 3.4.68)]' numpy.version.version '1.0.5.dev4427' scipy.version.version '0.7.0.dev3511'
Using ATLAS3.8.0. This is extremely annoying, and difficult to reproduce. I will try recompiling with some different versions and see if I can reproduce the problem. Running numpy.test() does *not* fail... Michael.
Michael McNeil Forbes wrote:
On 13 Nov 2007, at 9:43 AM, Geoffrey Zhu wrote:
On Nov 13, 2007 2:37 AM, David Cournapeau <david@ar.media.kyoto u.ac.jp> wrote:
Geoffrey Zhu wrote:
Pointer problems are usually random...
...
The original MSI version hangs on numpy.test() if I open IDLE and type
import numpy numpy.test()
If I try the OP's test first, once it hang on "from numpy.linalg import eig" and the other time it ran successfully. After it ran successfully, it ran numpy.test() successfully, too.
As you said, it is random.
I have also been having random problems with the latest numpy from svn built on an Intel core 2 Duo Linux box running in 64 bit mode under Red Hat 3.4.68 with the gcc 3.4.6 20060404 and ATLAS 3.8.0.
I am having a problem with numpy.linalg.eigh and complex Hermitian matrices. Randomly, I get seemingly correct answers, and then eigenvectors full of Nan's (though not completely. The first row the the eigenvectors seem to be numbers, but incorrect.)
Which fortran compiler are you using ? David
Michael McNeil Forbes wrote:
On 13 Nov 2007, at 9:43 AM, Geoffrey Zhu wrote:
On Nov 13, 2007 2:37 AM, David Cournapeau <david@ar.media.kyoto u.ac.jp> wrote:
Geoffrey Zhu wrote:
Pointer problems are usually random...
...
The original MSI version hangs on numpy.test() if I open IDLE and type
import numpy numpy.test()
If I try the OP's test first, once it hang on "from numpy.linalg import eig" and the other time it ran successfully. After it ran successfully, it ran numpy.test() successfully, too.
As you said, it is random.
I have also been having random problems with the latest numpy from svn built on an Intel core 2 Duo Linux box running in 64 bit mode under Red Hat 3.4.68 with the gcc 3.4.6 20060404 and ATLAS 3.8.0.
Could you try without atlas ? Also, how did you configure atlas when building it ? It seems that atlas is definitely part of the problem (everybody having the problem does use atlas), and that it involves Core 2 duo. David
I have also been having random problems with the latest numpy from svn built on an Intel core 2 Duo Linux box running in 64 bit mode under Red Hat 3.4.68 with the gcc 3.4.6 20060404 and ATLAS 3.8.0.
Could you try without atlas ? Also, how did you configure atlas when building it ? It seems that atlas is definitely part of the problem (everybody having the problem does use atlas), and that it involves Core 2 duo.
I will try that. I made a mistake: it is a Core 2, not a core 2 duo... model name : Intel(R) Core(TM)2 CPU 6700 @ 2.66GHz Michael.
On 15 Nov 2007, at 8:23 PM, David Cournapeau wrote:
Could you try without atlas ? Also, how did you configure atlas when building it ? It seems that atlas is definitely part of the problem (everybody having the problem does use atlas), and that it involves Core 2 duo.
David
It seems to work fine without ATLAS, but then again, it is a somewhat "random" error. I will let some code run tonight and see if I detect anything. Michael.
On 16 Nov 2007, at 1:46 AM, Michael McNeil Forbes wrote:
On 15 Nov 2007, at 8:23 PM, David Cournapeau wrote:
Could you try without atlas ? Also, how did you configure atlas when building it ? It seems that atlas is definitely part of the problem (everybody having the problem does use atlas), and that it involves Core 2 duo.
David
It seems to work fine without ATLAS, but then again, it is a somewhat "random" error. I will let some code run tonight and see if I detect anything.
Just an update. I am still having this problem, along with some additional problems where occasionally even dot returns nan's. I have confirmed that without ATLAS everything seems to be fine, and that the problem still remains with newer versions of ATLAS, Python, gcc etc. ATLAS was configured with ../configure prefix=${BASE}/apps/${ATLAS}_${SUFFIX}\ withnetliblapack=${BASE}/src/${LAPACK}_${SUFFIX}/ lapack_LINUX.a\ A Core2Duo64SSE3\ cflags=fPIC\ Fa alg fPIC and it passed all the tests. The problem still exists with ATLAS version 3.8.1, gcc 4.3.0, and recent versions of numpy.
sys.version '2.5.2 (r252:60911, Mar 29 2008, 02:55:47) \n[GCC 4.3.0]' numpy.version.version '1.0.5.dev4915'
I have managed to extract a matrix that causes this failure repeatedly once every two or four times eigh is called, so hopefully I should be able to run gdb and track down the problem...
Yes, with the MSI I can always reproduce the problem with numpy.test(). It always hangs.With the egg it does not hang. Pointer problems are usually random, but not random if we are using the same binaries in EGG and MSI and variables are always initialized to certain value.
I can consistently reproduce this problem with the EGG. It disappears, however, when I replace the lapack_lite.pyd with the version from the 1.0.3.1 EGG (python 2.4). cheers, Peter
participants (6)

David Cournapeau

David Cournapeau

Geoffrey Zhu

Keith Goodman

Michael McNeil Forbes

Peter Creasey