Hi All, I've isolated numarray.ieeespecial as the cause of a strange error which I posted on c.l.py: http://groups.google.com/groups?dq=&hl=en&lr=&ie=UTF-8&threadm=41198B29.7090600%40NOSPAM-DELETE-THIS.astraw.com&prev=/groups%3Fhl%3Den%26lr%3D%26ie%3DUTF-8%26group%3Dcomp.lang.python My non-expert guess as to an explanation: some statements early in numarray.ieeespecial (e.g. _na.array(1.0)/_na.array(0.0)) result in a floating point exception being set which lies dormant until I call certain functions in the Intel IPP library (e.g. ippiAddWeighted_8u32f_C1IR), at which point the latent exception bit my Python program is killed while the console says "Floating point exception". I've come to this tentative conclusion based on 3 minimal Pyrex programs: ---------------------------------------------------- Version 1: import numarray.numarrayall as _na plus_inf = inf = _na.array(1.0)/_na.array(0.0) cimport avg avg.work() # calls IPP, terminates with "Floating point exception". ---------------------------------------------------- Version 2: import numarray.numarrayall as _na # Define *ieee special values* _na.Error.pushMode(all="ignore") plus_inf = inf = _na.array(1.0)/_na.array(0.0) _na.Error.popMode() cimport avg avg.work() # calls IPP, terminates with "Floating point exception". ---------------------------------------------------- Version 3: import numarray.numarrayall as _na cimport avg avg.work() # calls IPP, terminates normally In the short term, I'll try and work around this problem by not importing numarray.ieeespecial, but in the long term, I think we should try and fix this. I'm happy to try to debug further if someone gives me advice on where to go from here. Cheers! Andrew
Hi Andrew, So far you mentioned debian, linux-2.6.7, Python-2.3.4, and the Intel IPP library. What compiler are you using? More comments below. On Wed, 2004-08-11 at 02:26, Andrew Straw wrote:
Hi All,
I've isolated numarray.ieeespecial as the cause of a strange error which I posted on c.l.py: http://groups.google.com/groups?dq=&hl=en&lr=&ie=UTF-8&threadm=41198B29.7090600%40NOSPAM-DELETE-THIS.astraw.com&prev=/groups%3Fhl%3Den%26lr%3D%26ie%3DUTF-8%26group%3Dcomp.lang.python
My non-expert guess as to an explanation: some statements early in numarray.ieeespecial (e.g. _na.array(1.0)/_na.array(0.0)) result in a floating point exception being set which lies dormant until I call certain functions in the Intel IPP library (e.g. ippiAddWeighted_8u32f_C1IR), at which point the latent exception bit my Python program is killed while the console says "Floating point exception".
It sounds to me like the IPP library is aborting as a result of previously irrelevant floating point error state. My guess is that the problem is not confined to ieeespecial, but rather to floating point exception handling in general in combination with the IPP library and/or whatever compiler you're using. The key numarray code for clearing the floating point error state is NA_checkFPErrors() in numarray/Src/newarray.ch. Some things that occur to me are: 1. Verify that any array divide by zero results in the same failure.
x = arange(10,20)/zeros((10,)) call_your_ipp_code()
2. Look in the IPP library for routines related to clearing floating point exceptions. Look in your compiler runtime library for routines related to clearing floating point exceptions. Modify NA_checkFPErrors accordingly and re-build numarray with % python setup.py install --genapi 4. Walk through the Pyrex generated wrapper code and make sure there's nothing going on there. I'm doubtful that this has anything to do with the problem. More ideas will come as you let us know what your compiler is and verify item 1. Regards, Todd
Dear Todd, I'll respond to your other comments when I get a chance to investigate further. Todd Miller wrote:
So far you mentioned debian, linux-2.6.7, Python-2.3.4, and the Intel IPP library. What compiler are you using?
astraw@flygate:~$ gcc -v Reading specs from /usr/lib/gcc-lib/i486-linux/3.3.4/specs Configured with: ../src/configure -v --enable-languages=c,c++,java,f77,pascal,objc,ada,treelang --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-gxx-include-dir=/usr/include/c++/3.3 --enable-shared --with-system-zlib --enable-nls --without-included-gettext --enable-__cxa_atexit --enable-clocale=gnu --enable-debug --enable-java-gc=boehm --enable-java-awt=xlib --enable-objc-gc i486-linux Thread model: posix gcc version 3.3.4 (Debian) Geez, I hope that's enough info to tell you gcc 3.3.4. :) (Also, I could mention this is with numarray-1.0 -- tested both with whatever Debian gives me in addition to a source build of the 1.0 tarball.) Cheers! Andrew
I've isolated a bug I first reported on this mailing list in August. I've now confined it to a small code snippet using entirely open-source software (previously I saw it while using Intel's IPP). In a nutshell, importing numarray.ieeespecial triggers a floating point exception (which kills my program) when I call Numeric's singular_value_decomposition() function: import Numeric from LinearAlgebra import singular_value_decomposition if want_FPE: import numarray.ieeespecial A= [[-5.7, 2.2, -0.53, 46.0], [-2.3, -5.5, -1.0, 1091.0], [5.9, 1.4, -0.1, -142.0], [-1.3, 5.7, -1.5, 2673.0]] A=Numeric.array(A) u,s,v = singular_value_decomposition(A) # FPE triggered here Here's my setup: $ python Python 2.3.4 (#2, Sep 24 2004, 08:39:09) [GCC 3.3.4 (Debian 1:3.3.4-12)] on linux2 Type "help", "copyright", "credits" or "license" for more information.
import Numeric Numeric.__version__ '23.6' import numarray numarray.__version__ '1.2a'
$ gcc -v Reading specs from /usr/lib/gcc-lib/i486-linux/3.3.4/specs Configured with: ../src/configure -v --enable-languages=c,c++,java,f77,pascal,objc,ada,treelang --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-gxx-include-dir=/usr/include/c++/3.3 --enable-shared --with-system-zlib --enable-nls --without-included-gettext --enable-__cxa_atexit --enable-clocale=gnu --enable-debug --enable-java-gc=boehm --enable-java-awt=xlib --enable-objc-gc i486-linux Thread model: posix gcc version 3.3.4 (Debian 1:3.3.4-13) Now, for the clue: the above error is ONLY triggered when I compile Numeric to use system blas and friends, not when I use lapack_lite included with Numeric. This leads me to suspect it is related to the SSE2 unit -- I have Debian sarge's atlas3-base, atlas3-see, atlas3-sse2, blas, lapack, lapack3, and refblas3 packages installed on my P4 machine. So, to propose a hypothesis: numarray.ieeespecial sets the FPE bit in the SSE2 hardware, but for some reason this does not raise SIGFPE. However, when the next call that touches SSE2 happens, the kernel sees that error bit and throws the signal. Does this explanation make sense? Is it easy to fix? Cheers! Andrew
Just a small addendum, (which I hope will spur on bug-fixing once Todd et al. are back from the conference -- let me know if I should file a sourceforge bug report): Numeric is not necessary to trigger the bug in the below code -- numarray is sufficient on its own. Furthermore, I can confirm that merely removing the "atlas3-sse2" Debian package from my system causes the code, whether or not numarray.ieeespecial is imported, to run without being killed by an FPE. Andrew Straw wrote:
I've isolated a bug I first reported on this mailing list in August. I've now confined it to a small code snippet using entirely open-source software (previously I saw it while using Intel's IPP). In a nutshell, importing numarray.ieeespecial triggers a floating point exception (which kills my program) when I call Numeric's singular_value_decomposition() function:
import Numeric from LinearAlgebra import singular_value_decomposition
if want_FPE: import numarray.ieeespecial
A= [[-5.7, 2.2, -0.53, 46.0], [-2.3, -5.5, -1.0, 1091.0], [5.9, 1.4, -0.1, -142.0], [-1.3, 5.7, -1.5, 2673.0]] A=Numeric.array(A) u,s,v = singular_value_decomposition(A) # FPE triggered here
Here's my setup:
$ python Python 2.3.4 (#2, Sep 24 2004, 08:39:09) [GCC 3.3.4 (Debian 1:3.3.4-12)] on linux2 Type "help", "copyright", "credits" or "license" for more information.
import Numeric Numeric.__version__ '23.6' import numarray numarray.__version__ '1.2a'
$ gcc -v Reading specs from /usr/lib/gcc-lib/i486-linux/3.3.4/specs Configured with: ../src/configure -v --enable-languages=c,c++,java,f77,pascal,objc,ada,treelang --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-gxx-include-dir=/usr/include/c++/3.3 --enable-shared --with-system-zlib --enable-nls --without-included-gettext --enable-__cxa_atexit --enable-clocale=gnu --enable-debug --enable-java-gc=boehm --enable-java-awt=xlib --enable-objc-gc i486-linux Thread model: posix gcc version 3.3.4 (Debian 1:3.3.4-13)
Now, for the clue: the above error is ONLY triggered when I compile Numeric to use system blas and friends, not when I use lapack_lite included with Numeric. This leads me to suspect it is related to the SSE2 unit -- I have Debian sarge's atlas3-base, atlas3-see, atlas3-sse2, blas, lapack, lapack3, and refblas3 packages installed on my P4 machine.
So, to propose a hypothesis: numarray.ieeespecial sets the FPE bit in the SSE2 hardware, but for some reason this does not raise SIGFPE. However, when the next call that touches SSE2 happens, the kernel sees that error bit and throws the signal. Does this explanation make sense? Is it easy to fix?
Cheers! Andrew
------------------------------------------------------- This SF.net email is sponsored by: IT Product Guide on ITManagersJournal Use IT products in your business? Tell us what you think of them. Give us Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more http://productguide.itmanagersjournal.com/guidepromo.tmpl _______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
I hunted down the error and modified Doc/INSTALL.txt (patch below, please modify numarray sources) to indicate that the problem is not confined to Fedora Core1 but to any i386 linux with libc < 2.3.3 and SSE-using code. With my newly patched and compiled libc, my system runs great, and no more mysterious Floating Point Exceptions! Cheers! Andrew Index: Doc/INSTALL.txt =================================================================== RCS file: /cvsroot/numpy/numarray/Doc/INSTALL.txt,v retrieving revision 1.9 diff -r1.9 INSTALL.txt 209,212c209,212 < 1. Fedora Core1 -- if compiling against GNU libc on i386 and enabling < SSE processor functions (with something like "-march=athlon-xp") then < libc versions above 2.3.3 will be needed. < ---
1. i386 linux -- if compiling against GNU libc on i386 and enabling SSE processor functions (with something like "-march=athlon-xp" or using other libraries that utilize SSE such as atlas or Intel IPP) then libc version 2.3.3 or above will be needed.
Thanks for the update Andrew. This is in CVS now. Regards, Todd On Tue, 2004-11-02 at 11:58, Andrew Straw wrote:
I hunted down the error and modified Doc/INSTALL.txt (patch below, please modify numarray sources) to indicate that the problem is not confined to Fedora Core1 but to any i386 linux with libc < 2.3.3 and SSE-using code.
With my newly patched and compiled libc, my system runs great, and no more mysterious Floating Point Exceptions!
Cheers! Andrew
Index: Doc/INSTALL.txt =================================================================== RCS file: /cvsroot/numpy/numarray/Doc/INSTALL.txt,v retrieving revision 1.9 diff -r1.9 INSTALL.txt 209,212c209,212 < 1. Fedora Core1 -- if compiling against GNU libc on i386 and enabling < SSE processor functions (with something like "-march=athlon-xp") then < libc versions above 2.3.3 will be needed. < ---
1. i386 linux -- if compiling against GNU libc on i386 and enabling SSE processor functions (with something like "-march=athlon-xp" or using other libraries that utilize SSE such as atlas or Intel IPP) then libc version 2.3.3 or above will be needed.
------------------------------------------------------- This SF.Net email is sponsored by: Sybase ASE Linux Express Edition - download now for FREE LinuxWorld Reader's Choice Award Winner for best database on Linux. http://ads.osdn.com/?ad_id=5588&alloc_id=12065&op=click _______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion --
participants (2)
-
Andrew Straw
-
Todd Miller