[Numpy-discussion] floating point exception weirdness

Fri Oct 22 11:25:58 EDT 2004

I've isolated a bug I first reported on this mailing list in August.  
I've now confined it to a small code snippet using entirely open-source 
software (previously I saw it while using Intel's IPP).  In a nutshell, 
importing numarray.ieeespecial triggers a floating point exception 
(which kills my program) when I call Numeric's 
singular_value_decomposition() function:

import Numeric
from LinearAlgebra import singular_value_decomposition

if want_FPE:
    import numarray.ieeespecial

A= [[-5.7, 2.2, -0.53, 46.0],
    [-2.3, -5.5, -1.0, 1091.0],
    [5.9, 1.4, -0.1, -142.0],
    [-1.3, 5.7, -1.5, 2673.0]]
A=Numeric.array(A)
u,s,v = singular_value_decomposition(A) # FPE triggered here

Here's my setup:

$ python
Python 2.3.4 (#2, Sep 24 2004, 08:39:09)
[GCC 3.3.4 (Debian 1:3.3.4-12)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
 >>> import Numeric
 >>> Numeric.__version__
'23.6'
 >>> import numarray
 >>> numarray.__version__
'1.2a'

$ gcc -v
Reading specs from /usr/lib/gcc-lib/i486-linux/3.3.4/specs
Configured with: ../src/configure -v 
--enable-languages=c,c++,java,f77,pascal,objc,ada,treelang --prefix=/usr 
--mandir=/usr/share/man --infodir=/usr/share/info 
--with-gxx-include-dir=/usr/include/c++/3.3 --enable-shared 
--with-system-zlib --enable-nls --without-included-gettext 
--enable-__cxa_atexit --enable-clocale=gnu --enable-debug 
--enable-java-gc=boehm --enable-java-awt=xlib --enable-objc-gc i486-linux
Thread model: posix
gcc version 3.3.4 (Debian 1:3.3.4-13)

Now, for the clue:  the above error is ONLY triggered when I compile 
Numeric to use system blas and friends, not when I use lapack_lite 
included with Numeric.  This leads me to suspect it is related to the 
SSE2 unit -- I have Debian sarge's atlas3-base, atlas3-see, atlas3-sse2, 
blas, lapack, lapack3, and refblas3 packages installed on my P4 machine.

So, to propose a hypothesis: numarray.ieeespecial sets the FPE bit in 
the SSE2 hardware, but for some reason this does not raise SIGFPE.  
However, when the next call that touches SSE2 happens, the kernel sees 
that error bit and throws the signal.  Does this explanation make 
sense?  Is it easy to fix?

Cheers!
Andrew