Re: [Numpy-discussion] Floating point exception with numpy and embedded python interpreter
![](https://secure.gravatar.com/avatar/d5321459a9b36ca748932987de93e083.jpg?s=120&d=mm&r=g)
Arkaitz Bitorika wrote:
I think that numpy only accesses the SSE units through ATLAS or other external library. So, build numpy without ATLAS. But I'm not 100% sure anymore if there aren't any optimizations that directly use SSE if it's available.
Or alternatively, how would I check if my program is messing with the SSE bits?
Hmm, I think that's a bit hairy. I'd suggest simply asking the C++ library's mailing list if they alter the error bits on the control registers of the SSE unit. (Out of curiousity, what library is it?) If you want hairy, though, I think you'd have to check from C with the appropriate calls -- I'd start with the source code in that bug report. It looks like they're inlining an assembly statement to query a SSE control register.
![](https://secure.gravatar.com/avatar/cae4991f398a3d24a3f1ca2e45591da4.jpg?s=120&d=mm&r=g)
On Wed, 19 Apr 2006 11:29:49 -0700 Andrew Straw <strawman@astraw.com> wrote:
We had to disable attlas-sse on our debian system for these exact reasons. Simon. -- Simon Burton, B.Sc. Licensed PO Box 8066 ANU Canberra 2601 Australia Ph. 61 02 6249 6940 http://arrowtheory.com
![](https://secure.gravatar.com/avatar/d5321459a9b36ca748932987de93e083.jpg?s=120&d=mm&r=g)
Simon Burton wrote:
If you're using debian sarge and the problem is your glibc, you can fix it: http://www.its.caltech.edu/~astraw/coding.html#id3
![](https://secure.gravatar.com/avatar/4e06374bbea7e9143bed4ace1d2887f8.jpg?s=120&d=mm&r=g)
I've tried getting rid of all atlas, blas and lapack packages in my system and rebuilding numpy to use its own unoptimised lapack_lite, but no luck. Just trying to import numpy with PyImport_ImportModule("numpy") causes the program to crash with just a "Floating point exception" message output. The program I'm embedding Python in is the NS Network Simulator (http://www.isi.edu/nsnam/ns/). It's a complex C++ beast with its own Object-Tcl interpreter, but it's been working fine with embedded Python except for this numpy crash. I've used Numeric before and it worked fine as well. I'm lost now regarding what to work on to find a solution, anyone familiar with numpy internals has any suggestion? Thanks, Arkaitz
![](https://secure.gravatar.com/avatar/d5321459a9b36ca748932987de93e083.jpg?s=120&d=mm&r=g)
bitorika@cs.tcd.ie wrote:
OK, going back to your original gdb traceback, it looks like the SIGFPE originated in the following funtion in umathmodule.c: static double pinf_init(void) { double mul = 1e10; double tmp = 0.0; double pinf; pinf = mul; for (;;) { pinf *= mul; if (pinf == tmp) break; tmp = pinf; } return pinf; } If you try just that function (instead of the whole Python interpreter and numpy module) and still get the exception, you'll be that much closer to narrowing down the issue.
![](https://secure.gravatar.com/avatar/9cba35ec986fbe41c794e5970e49e8aa.jpg?s=120&d=mm&r=g)
Andrew, I've verified that the function causes the exception when embedded in the program but not when used from a simple C program with just a main () function. The successful version iterates 31 times over the for loop while the crashing one fails the 30th time that it does "pinf *= mul". Now we know exactly where the crash is, but no idea how to fix it ;). It doesn't look it should be related to SSE2 flags, it's just doing a big multiplication, but I don't know enough about low level C and floating point operations to understand why it may be throwing the exception there. Any idea how I could avoid that function crashing? Thanks, Arkaitz On 22 Apr 2006, at 20:12, Andrew Straw wrote:
![](https://secure.gravatar.com/avatar/d5321459a9b36ca748932987de93e083.jpg?s=120&d=mm&r=g)
This doesn't seem like an issue with numpy. Your test proved that. I'm curious what the outcome is, but I'm afraid there's not much we can do. At this point I think you should write the ns2 people and see what they say. Their program seems to be responsible for twiddling the FPU/SSE flags, so I think the issue is better solved, or at least discussed, by them. Cheers! Andrew Arkaitz Bitorika wrote:
![](https://secure.gravatar.com/avatar/a6ff3de66f20cea0c4f11a2bce972e57.jpg?s=120&d=mm&r=g)
Arkaitz Bitorika <arkaitz.bitorika <at> gmail.com> writes:
Hi, I just found this old thread, and it looks like I've got the very same problem: Turns out that Borland C++ Builder (which I'm using, and you are most probably as well) can't get to infinity by multiplying a number with 1E10 over and over, but throws an exception instead when ecceeding number space: On 22 Apr 2006, at 20:12, Andrew Straw wrote:
My proposal is to ask the numpy people to patch numpy as follows: Don't multiply, but instead create pinf according to IEEE 754 specifications: char inf_string[9] = "\x00\x00\x00\x00\x00\x00\xF0\x7F"; double pinf = ((double*)inf_string)[0]; This will get rid of the overflow for little endian machines. For big endian architectures, just reverse the byte order in inf_string. I already submitted a bug report to their bugtracker. Cheers, Thomas
![](https://secure.gravatar.com/avatar/cae4991f398a3d24a3f1ca2e45591da4.jpg?s=120&d=mm&r=g)
On Wed, 19 Apr 2006 11:29:49 -0700 Andrew Straw <strawman@astraw.com> wrote:
We had to disable attlas-sse on our debian system for these exact reasons. Simon. -- Simon Burton, B.Sc. Licensed PO Box 8066 ANU Canberra 2601 Australia Ph. 61 02 6249 6940 http://arrowtheory.com
![](https://secure.gravatar.com/avatar/d5321459a9b36ca748932987de93e083.jpg?s=120&d=mm&r=g)
Simon Burton wrote:
If you're using debian sarge and the problem is your glibc, you can fix it: http://www.its.caltech.edu/~astraw/coding.html#id3
![](https://secure.gravatar.com/avatar/4e06374bbea7e9143bed4ace1d2887f8.jpg?s=120&d=mm&r=g)
I've tried getting rid of all atlas, blas and lapack packages in my system and rebuilding numpy to use its own unoptimised lapack_lite, but no luck. Just trying to import numpy with PyImport_ImportModule("numpy") causes the program to crash with just a "Floating point exception" message output. The program I'm embedding Python in is the NS Network Simulator (http://www.isi.edu/nsnam/ns/). It's a complex C++ beast with its own Object-Tcl interpreter, but it's been working fine with embedded Python except for this numpy crash. I've used Numeric before and it worked fine as well. I'm lost now regarding what to work on to find a solution, anyone familiar with numpy internals has any suggestion? Thanks, Arkaitz
![](https://secure.gravatar.com/avatar/d5321459a9b36ca748932987de93e083.jpg?s=120&d=mm&r=g)
bitorika@cs.tcd.ie wrote:
OK, going back to your original gdb traceback, it looks like the SIGFPE originated in the following funtion in umathmodule.c: static double pinf_init(void) { double mul = 1e10; double tmp = 0.0; double pinf; pinf = mul; for (;;) { pinf *= mul; if (pinf == tmp) break; tmp = pinf; } return pinf; } If you try just that function (instead of the whole Python interpreter and numpy module) and still get the exception, you'll be that much closer to narrowing down the issue.
![](https://secure.gravatar.com/avatar/9cba35ec986fbe41c794e5970e49e8aa.jpg?s=120&d=mm&r=g)
Andrew, I've verified that the function causes the exception when embedded in the program but not when used from a simple C program with just a main () function. The successful version iterates 31 times over the for loop while the crashing one fails the 30th time that it does "pinf *= mul". Now we know exactly where the crash is, but no idea how to fix it ;). It doesn't look it should be related to SSE2 flags, it's just doing a big multiplication, but I don't know enough about low level C and floating point operations to understand why it may be throwing the exception there. Any idea how I could avoid that function crashing? Thanks, Arkaitz On 22 Apr 2006, at 20:12, Andrew Straw wrote:
![](https://secure.gravatar.com/avatar/d5321459a9b36ca748932987de93e083.jpg?s=120&d=mm&r=g)
This doesn't seem like an issue with numpy. Your test proved that. I'm curious what the outcome is, but I'm afraid there's not much we can do. At this point I think you should write the ns2 people and see what they say. Their program seems to be responsible for twiddling the FPU/SSE flags, so I think the issue is better solved, or at least discussed, by them. Cheers! Andrew Arkaitz Bitorika wrote:
![](https://secure.gravatar.com/avatar/a6ff3de66f20cea0c4f11a2bce972e57.jpg?s=120&d=mm&r=g)
Arkaitz Bitorika <arkaitz.bitorika <at> gmail.com> writes:
Hi, I just found this old thread, and it looks like I've got the very same problem: Turns out that Borland C++ Builder (which I'm using, and you are most probably as well) can't get to infinity by multiplying a number with 1E10 over and over, but throws an exception instead when ecceeding number space: On 22 Apr 2006, at 20:12, Andrew Straw wrote:
My proposal is to ask the numpy people to patch numpy as follows: Don't multiply, but instead create pinf according to IEEE 754 specifications: char inf_string[9] = "\x00\x00\x00\x00\x00\x00\xF0\x7F"; double pinf = ((double*)inf_string)[0]; This will get rid of the overflow for little endian machines. For big endian architectures, just reverse the byte order in inf_string. I already submitted a bug report to their bugtracker. Cheers, Thomas
participants (5)
-
Andrew Straw
-
Arkaitz Bitorika
-
bitorika@cs.tcd.ie
-
Simon Burton
-
Thomas Schreiner