troubling math bug under IRIX 6.5

Hey Folks, One of these days I'll figure that SOurceForge stuff out so I can submit a real bug report, but this one is freaky enough that I thought I'd just send it right out... from the latest CVS (as of 9:30am pacific) I run 'make test' and get: ... PYTHONPATH= ./python -tt ./Lib/test/regrtest.py -l make: *** [test] Bus error (core dumped) a quick search around shows that just importing regrtest.py seg faults, and further simply importing random.py seg faults (which regrtest.py does). it all boils down to this line in random.py NV_MAGICCONST = 4 * _exp(-0.5)/_sqrt(2.0) and the problem can be further reduced thusly:
but it isn't the math.exp that's the problem, its multiplying the result times 4! (tommy@mace)/u0/tommy/pycvs/python/dist/src$ ./python Python 2.1a2 (#2, Feb 13 2001, 09:49:17) [C] on irix6 Type "copyright", "credits" or "license" for more information.
is it just me? any guesses what might be the cause of this?

[Flying Cougar Burnette]
Now let's look at the important <wink> part:
The first thing to try on any SGI box is to recompile Python with optimization turned off. After that confirms it's the compiler's fault, we can try to figure out where it's screwing up. Do either of these blow up too?
4 * 0.60653065971263342 4.0 * math.exp(-0.5)
Reason for asking: "NV_MAGICCONST = 4 * _exp(-0.5)/_sqrt(2.0)" is the first time random.py does either of a floating multiply or an int-to-float conversion.
is it just me?
Yup. So long as you use SGI software, it always will be <wink>. and-i-say-that-as-an-sgi-shareholder-ly y'rs - tim

Tim Peters writes: | [Flying Cougar Burnette] | > ... | > >>> 4 * math.exp(-0.5) | > Bus error (core dumped) | | Now let's look at the important <wink> part: | | > (tommy@mace)/u0/tommy/pycvs/python/dist/src$ ./python | > Python 2.1a2 (#2, Feb 13 2001, 09:49:17) [C] on irix6 | ^^^^^ figgered as much... | | The first thing to try on any SGI box is to recompile Python with | optimization turned off. After that confirms it's the compiler's fault, we | can try to figure out where it's screwing up. Do either of these blow up | too? | | >>> 4 * 0.60653065971263342 | >>> 4.0 * math.exp(-0.5) yup. | | > is it just me? | | Yup. So long as you use SGI software, it always will be <wink>. | | and-i-say-that-as-an-sgi-shareholder-ly y'rs - tim one these days sgi... Pow! Right to the Moon! ;) Okay, I recompiled after blanking OPT= in Makefile and things now work. This is where I start swearing "But, this has never happened to me before!" and the kind, gentle response is "Don't worry, it happens to lots of guys..." ;) And the next step is... ?

[Tommy turns off optimization, and all is well]
yup.
OK. Does the first one blow up? Does the second one blow up? Or do both blow up? Fourth question: does
4.0 * 0.60653065971263342
blow up?
... And the next step is... ?
Stop making me pull your teeth <wink>. I'm trying to narrow down where it's screwing up. At worst, then, you can disable optimization only for that particular file, and create a tiny bug case to send off to SGI World Headquarters so they fix this someday. At best, perhaps a tiny bit of code rearrangement will unstick your compiler (I'm good at guessing what might work in that respect, but need to narrow it down to a single function within Python first), and I can check that in for 2.1.

sorry- BOTH blew up until I turned off optimization. now neither does. shall I turn opts back on and try a few more cases? Tim Peters writes: | [Tommy turns off optimization, and all is well] | | >> Do either of these blow up too? | >> | >> >>> 4 * 0.60653065971263342 | >> >>> 4.0 * math.exp(-0.5) | | > yup. | | OK. Does the first one blow up? Does the second one blow up? Or do both | blow up? | | Fourth question: does | | >> 4.0 * 0.60653065971263342 | | blow up? | | > ... | > And the next step is... ? | | Stop making me pull your teeth <wink>. I'm trying to narrow down where it's | screwing up. At worst, then, you can disable optimization only for that | particular file, and create a tiny bug case to send off to SGI World | Headquarters so they fix this someday. At best, perhaps a tiny bit of code | rearrangement will unstick your compiler (I'm good at guessing what might | work in that respect, but need to narrow it down to a single function within | Python first), and I can check that in for 2.1.

[Tommy]
sorry- BOTH blew up until I turned off optimization.
OK, that rules out int->float conversion as the cause (one of the examples didn't do any conversions). That multiplication by 4 triggered it rules out that any IEEE exceptions are to blame either (mult by 4 doesn't even trigger the IEEE "inexact" exception).
now neither does. shall I turn opts back on and try a few more cases?
Yes, please, one more: 4.0 * 3.1 Or, if that works, go back to the failing 4.0 * math.exp(-0.5) In any failing case, can you jump into a debubber and get a stack trace? Do you happen to have WANT_SIGFPE_HANDLER #define'd when you compile Python on this platform? If so, it complicates the code a lot. I wonder about that because you got a "bus error", and when WANT_SIGFPE_HANDLER is #defined we get a whole pile of ugly setjmp/longjmp code that doesn't show up on my box. Another tack, as a temporary workaround: try disabling optimization only for Objects/floatobject.c. That will probably fix the problem, and if so that's enough of a workaround to get you unstuck while pursuing these other irritations.

Tim Peters writes: | | > now neither does. shall I turn opts back on and try a few more | > cases? | | Yes, please, one more: | | 4.0 * 3.1 | | Or, if that works, go back to the failing | | 4.0 * math.exp(-0.5) both of these work, but changing the 4.0 to an integer 4 produces the bus error. so it is definitely a conversion to double/float thats the problem. | | In any failing case, can you jump into a debubber and get a stack trace? sure. I've included an entire dbx session at the end of this mail. | | Do you happen to have | | WANT_SIGFPE_HANDLER | | #define'd when you compile Python on this platform? If so, it complicates | the code a lot. I wonder about that because you got a "bus error", and when | WANT_SIGFPE_HANDLER is #defined we get a whole pile of ugly setjmp/longjmp | code that doesn't show up on my box. a peek at config.h shows the WANT_SIGFPE_HANDLER define commented out. should I turn it on and see what happens? | | Another tack, as a temporary workaround: try disabling optimization only | for Objects/floatobject.c. That will probably fix the problem, and if so | that's enough of a workaround to get you unstuck while pursuing these other | irritations. this one works just fine. workarounds aren't a problem for me right now since I'm in no hurry to get this version in use here. I'm just trying to help debug this version for irix users in general. ------------%< snip %<----------------------%< snip %<------------ (tommy@mace)/u0/tommy/pycvs/python/dist/src$ dbx python dbx version 7.3 65959_Jul11 patchSG0003841 Jul 11 2000 02:29:30 Executable /usr/u0/tommy/pycvs/python/dist/src/python (dbx) run Process 563746 (python) started Python 2.1a2 (#6, Feb 13 2001, 17:43:32) [C] on irix6 Type "copyright", "credits" or "license" for more information. >>> 3 * 4.0 12.0 >>> import math >>> 4 * math.exp(-.5) Process 563746 (python) stopped on signal SIGBUS: Bus error (default) at [float_mul:383 +0x4,0x1004c158] 383 CONVERT_TO_DOUBLE(v, a); (dbx) l >* 383 CONVERT_TO_DOUBLE(v, a); 384 CONVERT_TO_DOUBLE(w, b); 385 PyFPE_START_PROTECT("multiply", return 0) 386 a = a * b; 387 PyFPE_END_PROTECT(a) 388 return PyFloat_FromDouble(a); 389 } 390 391 static PyObject * 392 float_div(PyObject *v, PyObject *w) (dbx) t > 0 float_mul(0x100b69fc, 0x10116788, 0x8, 0x100a1318, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Objects/floatobject.c":383, 0x1004c158] 1 binary_op1(0x100b69fc, 0x10116788, 0x0, 0x100a1318, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Objects/abstract.c":337, 0x1003ac5c] 2 binary_op(0x100b69fc, 0x10116788, 0x8, 0x0, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Objects/abstract.c":373, 0x1003ae70] 3 PyNumber_Multiply(0x100b69fc, 0x10116788, 0x8, 0x100a1318, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Objects/abstract.c":544, 0x1003b5a4] 4 eval_code2(0x1012c688, 0x0, 0xffffffec, 0x0, 0x0, 0x0, 0x0, 0x0) ["/usr/u0/tommy/pycvs/python/dist/src/Python/ceval.c":896, 0x10034a54] 5 PyEval_EvalCode(0x100b69fc, 0x10116788, 0x8, 0x100a1318, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Python/ceval.c":336, 0x10031768] 6 run_node(0x100f88c0, 0x10116788, 0x0, 0x0, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Python/pythonrun.c":931, 0x10040444] 7 PyRun_InteractiveOne(0x0, 0x100b1878, 0x8, 0x100a1318, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Python/pythonrun.c":540, 0x1003f1f0] 8 PyRun_InteractiveLoop(0xfb4a398, 0x100b1878, 0x8, 0x100a1318, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Python/pythonrun.c":486, 0x1003ef84] 9 PyRun_AnyFileEx(0xfb4a398, 0x100b1878, 0x0, 0x100a1318, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Python/pythonrun.c":461, 0x1003eeac] 10 Py_Main(0x1, 0x0, 0x8, 0x100a1318, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Modules/main.c":292, 0x1000bba4] 11 main(0x100b69fc, 0x10116788, 0x8, 0x100a1318, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Modules/python.c":10, 0x1000b7bc] More (n if no)?y 12 __start() ["/xlv55/kudzu-apr12/work/irix/lib/libc/libc_n32_M4/csu/crt1text.s":177, 0x1000b558] (dbx)

As an extra datapoint: I just tried this (4 * math.exp(-0.5)) on my SGI O2 and on our SGI file server with the current CVS version of Python, compiled with -O. I don't get a crash. I am running IRIX 6.5.10m on the O2 and 6.5.2m on the server. What version are you running? On Tue, Feb 13 2001 Flying Cougar Burnette wrote: > Tim Peters writes: > | > | > now neither does. shall I turn opts back on and try a few more > | > cases? > | > | Yes, please, one more: > | > | 4.0 * 3.1 > | > | Or, if that works, go back to the failing > | > | 4.0 * math.exp(-0.5) > > both of these work, but changing the 4.0 to an integer 4 produces the > bus error. so it is definitely a conversion to double/float thats > the problem. > > | > | In any failing case, can you jump into a debubber and get a stack trace? > > sure. I've included an entire dbx session at the end of this mail. > > | > | Do you happen to have > | > | WANT_SIGFPE_HANDLER > | > | #define'd when you compile Python on this platform? If so, it complicates > | the code a lot. I wonder about that because you got a "bus error", and when > | WANT_SIGFPE_HANDLER is #defined we get a whole pile of ugly setjmp/longjmp > | code that doesn't show up on my box. > > a peek at config.h shows the WANT_SIGFPE_HANDLER define commented > out. should I turn it on and see what happens? > > > | > | Another tack, as a temporary workaround: try disabling optimization only > | for Objects/floatobject.c. That will probably fix the problem, and if so > | that's enough of a workaround to get you unstuck while pursuing these other > | irritations. > > this one works just fine. workarounds aren't a problem for me right > now since I'm in no hurry to get this version in use here. I'm just > trying to help debug this version for irix users in general. > > > ------------%< snip %<----------------------%< snip %<------------ > > (tommy@mace)/u0/tommy/pycvs/python/dist/src$ dbx python > dbx version 7.3 65959_Jul11 patchSG0003841 Jul 11 2000 02:29:30 > Executable /usr/u0/tommy/pycvs/python/dist/src/python > (dbx) run > Process 563746 (python) started > Python 2.1a2 (#6, Feb 13 2001, 17:43:32) [C] on irix6 > Type "copyright", "credits" or "license" for more information. > >>> 3 * 4.0 > 12.0 > >>> import math > >>> 4 * math.exp(-.5) > Process 563746 (python) stopped on signal SIGBUS: Bus error (default) at [float_mul:383 +0x4,0x1004c158] > 383 CONVERT_TO_DOUBLE(v, a); > (dbx) l > >* 383 CONVERT_TO_DOUBLE(v, a); > 384 CONVERT_TO_DOUBLE(w, b); > 385 PyFPE_START_PROTECT("multiply", return 0) > 386 a = a * b; > 387 PyFPE_END_PROTECT(a) > 388 return PyFloat_FromDouble(a); > 389 } > 390 > 391 static PyObject * > 392 float_div(PyObject *v, PyObject *w) > (dbx) t > > 0 float_mul(0x100b69fc, 0x10116788, 0x8, 0x100a1318, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Objects/floatobject.c":383, 0x1004c158] > 1 binary_op1(0x100b69fc, 0x10116788, 0x0, 0x100a1318, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Objects/abstract.c":337, 0x1003ac5c] > 2 binary_op(0x100b69fc, 0x10116788, 0x8, 0x0, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Objects/abstract.c":373, 0x1003ae70] > 3 PyNumber_Multiply(0x100b69fc, 0x10116788, 0x8, 0x100a1318, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Objects/abstract.c":544, 0x1003b5a4] > 4 eval_code2(0x1012c688, 0x0, 0xffffffec, 0x0, 0x0, 0x0, 0x0, 0x0) ["/usr/u0/tommy/pycvs/python/dist/src/Python/ceval.c":896, 0x10034a54] > 5 PyEval_EvalCode(0x100b69fc, 0x10116788, 0x8, 0x100a1318, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Python/ceval.c":336, 0x10031768] > 6 run_node(0x100f88c0, 0x10116788, 0x0, 0x0, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Python/pythonrun.c":931, 0x10040444] > 7 PyRun_InteractiveOne(0x0, 0x100b1878, 0x8, 0x100a1318, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Python/pythonrun.c":540, 0x1003f1f0] > 8 PyRun_InteractiveLoop(0xfb4a398, 0x100b1878, 0x8, 0x100a1318, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Python/pythonrun.c":486, 0x1003ef84] > 9 PyRun_AnyFileEx(0xfb4a398, 0x100b1878, 0x0, 0x100a1318, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Python/pythonrun.c":461, 0x1003eeac] > 10 Py_Main(0x1, 0x0, 0x8, 0x100a1318, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Modules/main.c":292, 0x1000bba4] > 11 main(0x100b69fc, 0x10116788, 0x8, 0x100a1318, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Modules/python.c":10, 0x1000b7bc] > More (n if no)?y > 12 __start() ["/xlv55/kudzu-apr12/work/irix/lib/libc/libc_n32_M4/csu/crt1text.s":177, 0x1000b558] > (dbx) > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev > -- Sjoerd Mullender <sjoerd.mullender@oratrix.com>

'uname -a' tells me I'm running plain old 6.5 on my R10k O2 with version 7.3.1.1m of the sgi compiler. Which version of the compiler do you have? That might be the real culprit here. in fact... I just hopped onto a co-worker's machine that has version 7.3.1.2m of the compiler, remade everything, and the problem is gone. I think we can chalk this up to a compiler bug and take no further action. Thanks for listening... Sjoerd Mullender writes: | As an extra datapoint: | | I just tried this (4 * math.exp(-0.5)) on my SGI O2 and on our SGI | file server with the current CVS version of Python, compiled with -O. | I don't get a crash. | | I am running IRIX 6.5.10m on the O2 and 6.5.2m on the server. What | version are you running? | | On Tue, Feb 13 2001 Flying Cougar Burnette wrote: | | > Tim Peters writes: | > | | > | > now neither does. shall I turn opts back on and try a few more | > | > cases? | > | | > | Yes, please, one more: | > | | > | 4.0 * 3.1 | > | | > | Or, if that works, go back to the failing | > | | > | 4.0 * math.exp(-0.5) | > | > both of these work, but changing the 4.0 to an integer 4 produces the | > bus error. so it is definitely a conversion to double/float thats | > the problem. | > | > | | > | In any failing case, can you jump into a debubber and get a stack trace? | > | > sure. I've included an entire dbx session at the end of this mail. | > | > | | > | Do you happen to have | > | | > | WANT_SIGFPE_HANDLER | > | | > | #define'd when you compile Python on this platform? If so, it complicates | > | the code a lot. I wonder about that because you got a "bus error", and when | > | WANT_SIGFPE_HANDLER is #defined we get a whole pile of ugly setjmp/longjmp | > | code that doesn't show up on my box. | > | > a peek at config.h shows the WANT_SIGFPE_HANDLER define commented | > out. should I turn it on and see what happens? | > | > | > | | > | Another tack, as a temporary workaround: try disabling optimization only | > | for Objects/floatobject.c. That will probably fix the problem, and if so | > | that's enough of a workaround to get you unstuck while pursuing these other | > | irritations. | > | > this one works just fine. workarounds aren't a problem for me right | > now since I'm in no hurry to get this version in use here. I'm just | > trying to help debug this version for irix users in general. | > | > | > ------------%< snip %<----------------------%< snip %<------------ | > | > (tommy@mace)/u0/tommy/pycvs/python/dist/src$ dbx python | > dbx version 7.3 65959_Jul11 patchSG0003841 Jul 11 2000 02:29:30 | > Executable /usr/u0/tommy/pycvs/python/dist/src/python | > (dbx) run | > Process 563746 (python) started | > Python 2.1a2 (#6, Feb 13 2001, 17:43:32) [C] on irix6 | > Type "copyright", "credits" or "license" for more information. | > >>> 3 * 4.0 | > 12.0 | > >>> import math | > >>> 4 * math.exp(-.5) | > Process 563746 (python) stopped on signal SIGBUS: Bus error (default) at [float_mul:383 +0x4,0x1004c158] | > 383 CONVERT_TO_DOUBLE(v, a); | > (dbx) l | > >* 383 CONVERT_TO_DOUBLE(v, a); | > 384 CONVERT_TO_DOUBLE(w, b); | > 385 PyFPE_START_PROTECT("multiply", return 0) | > 386 a = a * b; | > 387 PyFPE_END_PROTECT(a) | > 388 return PyFloat_FromDouble(a); | > 389 } | > 390 | > 391 static PyObject * | > 392 float_div(PyObject *v, PyObject *w) | > (dbx) t | > > 0 float_mul(0x100b69fc, 0x10116788, 0x8, 0x100a1318, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Objects/floatobject.c":383, 0x1004c158] | > 1 binary_op1(0x100b69fc, 0x10116788, 0x0, 0x100a1318, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Objects/abstract.c":337, 0x1003ac5c] | > 2 binary_op(0x100b69fc, 0x10116788, 0x8, 0x0, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Objects/abstract.c":373, 0x1003ae70] | > 3 PyNumber_Multiply(0x100b69fc, 0x10116788, 0x8, 0x100a1318, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Objects/abstract.c":544, 0x1003b5a4] | > 4 eval_code2(0x1012c688, 0x0, 0xffffffec, 0x0, 0x0, 0x0, 0x0, 0x0) ["/usr/u0/tommy/pycvs/python/dist/src/Python/ceval.c":896, 0x10034a54] | > 5 PyEval_EvalCode(0x100b69fc, 0x10116788, 0x8, 0x100a1318, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Python/ceval.c":336, 0x10031768] | > 6 run_node(0x100f88c0, 0x10116788, 0x0, 0x0, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Python/pythonrun.c":931, 0x10040444] | > 7 PyRun_InteractiveOne(0x0, 0x100b1878, 0x8, 0x100a1318, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Python/pythonrun.c":540, 0x1003f1f0] | > 8 PyRun_InteractiveLoop(0xfb4a398, 0x100b1878, 0x8, 0x100a1318, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Python/pythonrun.c":486, 0x1003ef84] | > 9 PyRun_AnyFileEx(0xfb4a398, 0x100b1878, 0x0, 0x100a1318, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Python/pythonrun.c":461, 0x1003eeac] | > 10 Py_Main(0x1, 0x0, 0x8, 0x100a1318, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Modules/main.c":292, 0x1000bba4] | > 11 main(0x100b69fc, 0x10116788, 0x8, 0x100a1318, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Modules/python.c":10, 0x1000b7bc] | > More (n if no)?y | > 12 __start() ["/xlv55/kudzu-apr12/work/irix/lib/libc/libc_n32_M4/csu/crt1text.s":177, 0x1000b558] | > (dbx) | > | > _______________________________________________ | > Python-Dev mailing list | > Python-Dev@python.org | > http://mail.python.org/mailman/listinfo/python-dev | > | | -- Sjoerd Mullender <sjoerd.mullender@oratrix.com>

[Flying Cougar Burnette]
Oh, of course. Why didn't you say so? Micro-micro version 7.3.1.2m of the SGI compiler fixed a bus error when doing int->float conversion. What? You don't believe me? Harrumph -- you just proved it <wink>. thanks-for-playing-and-pick-up-a-fabulous-prize-at-the-door-ly y'rs - tim

[Flying Cougar Burnette]
Now let's look at the important <wink> part:
The first thing to try on any SGI box is to recompile Python with optimization turned off. After that confirms it's the compiler's fault, we can try to figure out where it's screwing up. Do either of these blow up too?
4 * 0.60653065971263342 4.0 * math.exp(-0.5)
Reason for asking: "NV_MAGICCONST = 4 * _exp(-0.5)/_sqrt(2.0)" is the first time random.py does either of a floating multiply or an int-to-float conversion.
is it just me?
Yup. So long as you use SGI software, it always will be <wink>. and-i-say-that-as-an-sgi-shareholder-ly y'rs - tim

Tim Peters writes: | [Flying Cougar Burnette] | > ... | > >>> 4 * math.exp(-0.5) | > Bus error (core dumped) | | Now let's look at the important <wink> part: | | > (tommy@mace)/u0/tommy/pycvs/python/dist/src$ ./python | > Python 2.1a2 (#2, Feb 13 2001, 09:49:17) [C] on irix6 | ^^^^^ figgered as much... | | The first thing to try on any SGI box is to recompile Python with | optimization turned off. After that confirms it's the compiler's fault, we | can try to figure out where it's screwing up. Do either of these blow up | too? | | >>> 4 * 0.60653065971263342 | >>> 4.0 * math.exp(-0.5) yup. | | > is it just me? | | Yup. So long as you use SGI software, it always will be <wink>. | | and-i-say-that-as-an-sgi-shareholder-ly y'rs - tim one these days sgi... Pow! Right to the Moon! ;) Okay, I recompiled after blanking OPT= in Makefile and things now work. This is where I start swearing "But, this has never happened to me before!" and the kind, gentle response is "Don't worry, it happens to lots of guys..." ;) And the next step is... ?

[Tommy turns off optimization, and all is well]
yup.
OK. Does the first one blow up? Does the second one blow up? Or do both blow up? Fourth question: does
4.0 * 0.60653065971263342
blow up?
... And the next step is... ?
Stop making me pull your teeth <wink>. I'm trying to narrow down where it's screwing up. At worst, then, you can disable optimization only for that particular file, and create a tiny bug case to send off to SGI World Headquarters so they fix this someday. At best, perhaps a tiny bit of code rearrangement will unstick your compiler (I'm good at guessing what might work in that respect, but need to narrow it down to a single function within Python first), and I can check that in for 2.1.

sorry- BOTH blew up until I turned off optimization. now neither does. shall I turn opts back on and try a few more cases? Tim Peters writes: | [Tommy turns off optimization, and all is well] | | >> Do either of these blow up too? | >> | >> >>> 4 * 0.60653065971263342 | >> >>> 4.0 * math.exp(-0.5) | | > yup. | | OK. Does the first one blow up? Does the second one blow up? Or do both | blow up? | | Fourth question: does | | >> 4.0 * 0.60653065971263342 | | blow up? | | > ... | > And the next step is... ? | | Stop making me pull your teeth <wink>. I'm trying to narrow down where it's | screwing up. At worst, then, you can disable optimization only for that | particular file, and create a tiny bug case to send off to SGI World | Headquarters so they fix this someday. At best, perhaps a tiny bit of code | rearrangement will unstick your compiler (I'm good at guessing what might | work in that respect, but need to narrow it down to a single function within | Python first), and I can check that in for 2.1.

[Tommy]
sorry- BOTH blew up until I turned off optimization.
OK, that rules out int->float conversion as the cause (one of the examples didn't do any conversions). That multiplication by 4 triggered it rules out that any IEEE exceptions are to blame either (mult by 4 doesn't even trigger the IEEE "inexact" exception).
now neither does. shall I turn opts back on and try a few more cases?
Yes, please, one more: 4.0 * 3.1 Or, if that works, go back to the failing 4.0 * math.exp(-0.5) In any failing case, can you jump into a debubber and get a stack trace? Do you happen to have WANT_SIGFPE_HANDLER #define'd when you compile Python on this platform? If so, it complicates the code a lot. I wonder about that because you got a "bus error", and when WANT_SIGFPE_HANDLER is #defined we get a whole pile of ugly setjmp/longjmp code that doesn't show up on my box. Another tack, as a temporary workaround: try disabling optimization only for Objects/floatobject.c. That will probably fix the problem, and if so that's enough of a workaround to get you unstuck while pursuing these other irritations.

Tim Peters writes: | | > now neither does. shall I turn opts back on and try a few more | > cases? | | Yes, please, one more: | | 4.0 * 3.1 | | Or, if that works, go back to the failing | | 4.0 * math.exp(-0.5) both of these work, but changing the 4.0 to an integer 4 produces the bus error. so it is definitely a conversion to double/float thats the problem. | | In any failing case, can you jump into a debubber and get a stack trace? sure. I've included an entire dbx session at the end of this mail. | | Do you happen to have | | WANT_SIGFPE_HANDLER | | #define'd when you compile Python on this platform? If so, it complicates | the code a lot. I wonder about that because you got a "bus error", and when | WANT_SIGFPE_HANDLER is #defined we get a whole pile of ugly setjmp/longjmp | code that doesn't show up on my box. a peek at config.h shows the WANT_SIGFPE_HANDLER define commented out. should I turn it on and see what happens? | | Another tack, as a temporary workaround: try disabling optimization only | for Objects/floatobject.c. That will probably fix the problem, and if so | that's enough of a workaround to get you unstuck while pursuing these other | irritations. this one works just fine. workarounds aren't a problem for me right now since I'm in no hurry to get this version in use here. I'm just trying to help debug this version for irix users in general. ------------%< snip %<----------------------%< snip %<------------ (tommy@mace)/u0/tommy/pycvs/python/dist/src$ dbx python dbx version 7.3 65959_Jul11 patchSG0003841 Jul 11 2000 02:29:30 Executable /usr/u0/tommy/pycvs/python/dist/src/python (dbx) run Process 563746 (python) started Python 2.1a2 (#6, Feb 13 2001, 17:43:32) [C] on irix6 Type "copyright", "credits" or "license" for more information. >>> 3 * 4.0 12.0 >>> import math >>> 4 * math.exp(-.5) Process 563746 (python) stopped on signal SIGBUS: Bus error (default) at [float_mul:383 +0x4,0x1004c158] 383 CONVERT_TO_DOUBLE(v, a); (dbx) l >* 383 CONVERT_TO_DOUBLE(v, a); 384 CONVERT_TO_DOUBLE(w, b); 385 PyFPE_START_PROTECT("multiply", return 0) 386 a = a * b; 387 PyFPE_END_PROTECT(a) 388 return PyFloat_FromDouble(a); 389 } 390 391 static PyObject * 392 float_div(PyObject *v, PyObject *w) (dbx) t > 0 float_mul(0x100b69fc, 0x10116788, 0x8, 0x100a1318, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Objects/floatobject.c":383, 0x1004c158] 1 binary_op1(0x100b69fc, 0x10116788, 0x0, 0x100a1318, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Objects/abstract.c":337, 0x1003ac5c] 2 binary_op(0x100b69fc, 0x10116788, 0x8, 0x0, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Objects/abstract.c":373, 0x1003ae70] 3 PyNumber_Multiply(0x100b69fc, 0x10116788, 0x8, 0x100a1318, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Objects/abstract.c":544, 0x1003b5a4] 4 eval_code2(0x1012c688, 0x0, 0xffffffec, 0x0, 0x0, 0x0, 0x0, 0x0) ["/usr/u0/tommy/pycvs/python/dist/src/Python/ceval.c":896, 0x10034a54] 5 PyEval_EvalCode(0x100b69fc, 0x10116788, 0x8, 0x100a1318, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Python/ceval.c":336, 0x10031768] 6 run_node(0x100f88c0, 0x10116788, 0x0, 0x0, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Python/pythonrun.c":931, 0x10040444] 7 PyRun_InteractiveOne(0x0, 0x100b1878, 0x8, 0x100a1318, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Python/pythonrun.c":540, 0x1003f1f0] 8 PyRun_InteractiveLoop(0xfb4a398, 0x100b1878, 0x8, 0x100a1318, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Python/pythonrun.c":486, 0x1003ef84] 9 PyRun_AnyFileEx(0xfb4a398, 0x100b1878, 0x0, 0x100a1318, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Python/pythonrun.c":461, 0x1003eeac] 10 Py_Main(0x1, 0x0, 0x8, 0x100a1318, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Modules/main.c":292, 0x1000bba4] 11 main(0x100b69fc, 0x10116788, 0x8, 0x100a1318, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Modules/python.c":10, 0x1000b7bc] More (n if no)?y 12 __start() ["/xlv55/kudzu-apr12/work/irix/lib/libc/libc_n32_M4/csu/crt1text.s":177, 0x1000b558] (dbx)

As an extra datapoint: I just tried this (4 * math.exp(-0.5)) on my SGI O2 and on our SGI file server with the current CVS version of Python, compiled with -O. I don't get a crash. I am running IRIX 6.5.10m on the O2 and 6.5.2m on the server. What version are you running? On Tue, Feb 13 2001 Flying Cougar Burnette wrote: > Tim Peters writes: > | > | > now neither does. shall I turn opts back on and try a few more > | > cases? > | > | Yes, please, one more: > | > | 4.0 * 3.1 > | > | Or, if that works, go back to the failing > | > | 4.0 * math.exp(-0.5) > > both of these work, but changing the 4.0 to an integer 4 produces the > bus error. so it is definitely a conversion to double/float thats > the problem. > > | > | In any failing case, can you jump into a debubber and get a stack trace? > > sure. I've included an entire dbx session at the end of this mail. > > | > | Do you happen to have > | > | WANT_SIGFPE_HANDLER > | > | #define'd when you compile Python on this platform? If so, it complicates > | the code a lot. I wonder about that because you got a "bus error", and when > | WANT_SIGFPE_HANDLER is #defined we get a whole pile of ugly setjmp/longjmp > | code that doesn't show up on my box. > > a peek at config.h shows the WANT_SIGFPE_HANDLER define commented > out. should I turn it on and see what happens? > > > | > | Another tack, as a temporary workaround: try disabling optimization only > | for Objects/floatobject.c. That will probably fix the problem, and if so > | that's enough of a workaround to get you unstuck while pursuing these other > | irritations. > > this one works just fine. workarounds aren't a problem for me right > now since I'm in no hurry to get this version in use here. I'm just > trying to help debug this version for irix users in general. > > > ------------%< snip %<----------------------%< snip %<------------ > > (tommy@mace)/u0/tommy/pycvs/python/dist/src$ dbx python > dbx version 7.3 65959_Jul11 patchSG0003841 Jul 11 2000 02:29:30 > Executable /usr/u0/tommy/pycvs/python/dist/src/python > (dbx) run > Process 563746 (python) started > Python 2.1a2 (#6, Feb 13 2001, 17:43:32) [C] on irix6 > Type "copyright", "credits" or "license" for more information. > >>> 3 * 4.0 > 12.0 > >>> import math > >>> 4 * math.exp(-.5) > Process 563746 (python) stopped on signal SIGBUS: Bus error (default) at [float_mul:383 +0x4,0x1004c158] > 383 CONVERT_TO_DOUBLE(v, a); > (dbx) l > >* 383 CONVERT_TO_DOUBLE(v, a); > 384 CONVERT_TO_DOUBLE(w, b); > 385 PyFPE_START_PROTECT("multiply", return 0) > 386 a = a * b; > 387 PyFPE_END_PROTECT(a) > 388 return PyFloat_FromDouble(a); > 389 } > 390 > 391 static PyObject * > 392 float_div(PyObject *v, PyObject *w) > (dbx) t > > 0 float_mul(0x100b69fc, 0x10116788, 0x8, 0x100a1318, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Objects/floatobject.c":383, 0x1004c158] > 1 binary_op1(0x100b69fc, 0x10116788, 0x0, 0x100a1318, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Objects/abstract.c":337, 0x1003ac5c] > 2 binary_op(0x100b69fc, 0x10116788, 0x8, 0x0, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Objects/abstract.c":373, 0x1003ae70] > 3 PyNumber_Multiply(0x100b69fc, 0x10116788, 0x8, 0x100a1318, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Objects/abstract.c":544, 0x1003b5a4] > 4 eval_code2(0x1012c688, 0x0, 0xffffffec, 0x0, 0x0, 0x0, 0x0, 0x0) ["/usr/u0/tommy/pycvs/python/dist/src/Python/ceval.c":896, 0x10034a54] > 5 PyEval_EvalCode(0x100b69fc, 0x10116788, 0x8, 0x100a1318, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Python/ceval.c":336, 0x10031768] > 6 run_node(0x100f88c0, 0x10116788, 0x0, 0x0, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Python/pythonrun.c":931, 0x10040444] > 7 PyRun_InteractiveOne(0x0, 0x100b1878, 0x8, 0x100a1318, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Python/pythonrun.c":540, 0x1003f1f0] > 8 PyRun_InteractiveLoop(0xfb4a398, 0x100b1878, 0x8, 0x100a1318, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Python/pythonrun.c":486, 0x1003ef84] > 9 PyRun_AnyFileEx(0xfb4a398, 0x100b1878, 0x0, 0x100a1318, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Python/pythonrun.c":461, 0x1003eeac] > 10 Py_Main(0x1, 0x0, 0x8, 0x100a1318, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Modules/main.c":292, 0x1000bba4] > 11 main(0x100b69fc, 0x10116788, 0x8, 0x100a1318, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Modules/python.c":10, 0x1000b7bc] > More (n if no)?y > 12 __start() ["/xlv55/kudzu-apr12/work/irix/lib/libc/libc_n32_M4/csu/crt1text.s":177, 0x1000b558] > (dbx) > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev > -- Sjoerd Mullender <sjoerd.mullender@oratrix.com>

'uname -a' tells me I'm running plain old 6.5 on my R10k O2 with version 7.3.1.1m of the sgi compiler. Which version of the compiler do you have? That might be the real culprit here. in fact... I just hopped onto a co-worker's machine that has version 7.3.1.2m of the compiler, remade everything, and the problem is gone. I think we can chalk this up to a compiler bug and take no further action. Thanks for listening... Sjoerd Mullender writes: | As an extra datapoint: | | I just tried this (4 * math.exp(-0.5)) on my SGI O2 and on our SGI | file server with the current CVS version of Python, compiled with -O. | I don't get a crash. | | I am running IRIX 6.5.10m on the O2 and 6.5.2m on the server. What | version are you running? | | On Tue, Feb 13 2001 Flying Cougar Burnette wrote: | | > Tim Peters writes: | > | | > | > now neither does. shall I turn opts back on and try a few more | > | > cases? | > | | > | Yes, please, one more: | > | | > | 4.0 * 3.1 | > | | > | Or, if that works, go back to the failing | > | | > | 4.0 * math.exp(-0.5) | > | > both of these work, but changing the 4.0 to an integer 4 produces the | > bus error. so it is definitely a conversion to double/float thats | > the problem. | > | > | | > | In any failing case, can you jump into a debubber and get a stack trace? | > | > sure. I've included an entire dbx session at the end of this mail. | > | > | | > | Do you happen to have | > | | > | WANT_SIGFPE_HANDLER | > | | > | #define'd when you compile Python on this platform? If so, it complicates | > | the code a lot. I wonder about that because you got a "bus error", and when | > | WANT_SIGFPE_HANDLER is #defined we get a whole pile of ugly setjmp/longjmp | > | code that doesn't show up on my box. | > | > a peek at config.h shows the WANT_SIGFPE_HANDLER define commented | > out. should I turn it on and see what happens? | > | > | > | | > | Another tack, as a temporary workaround: try disabling optimization only | > | for Objects/floatobject.c. That will probably fix the problem, and if so | > | that's enough of a workaround to get you unstuck while pursuing these other | > | irritations. | > | > this one works just fine. workarounds aren't a problem for me right | > now since I'm in no hurry to get this version in use here. I'm just | > trying to help debug this version for irix users in general. | > | > | > ------------%< snip %<----------------------%< snip %<------------ | > | > (tommy@mace)/u0/tommy/pycvs/python/dist/src$ dbx python | > dbx version 7.3 65959_Jul11 patchSG0003841 Jul 11 2000 02:29:30 | > Executable /usr/u0/tommy/pycvs/python/dist/src/python | > (dbx) run | > Process 563746 (python) started | > Python 2.1a2 (#6, Feb 13 2001, 17:43:32) [C] on irix6 | > Type "copyright", "credits" or "license" for more information. | > >>> 3 * 4.0 | > 12.0 | > >>> import math | > >>> 4 * math.exp(-.5) | > Process 563746 (python) stopped on signal SIGBUS: Bus error (default) at [float_mul:383 +0x4,0x1004c158] | > 383 CONVERT_TO_DOUBLE(v, a); | > (dbx) l | > >* 383 CONVERT_TO_DOUBLE(v, a); | > 384 CONVERT_TO_DOUBLE(w, b); | > 385 PyFPE_START_PROTECT("multiply", return 0) | > 386 a = a * b; | > 387 PyFPE_END_PROTECT(a) | > 388 return PyFloat_FromDouble(a); | > 389 } | > 390 | > 391 static PyObject * | > 392 float_div(PyObject *v, PyObject *w) | > (dbx) t | > > 0 float_mul(0x100b69fc, 0x10116788, 0x8, 0x100a1318, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Objects/floatobject.c":383, 0x1004c158] | > 1 binary_op1(0x100b69fc, 0x10116788, 0x0, 0x100a1318, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Objects/abstract.c":337, 0x1003ac5c] | > 2 binary_op(0x100b69fc, 0x10116788, 0x8, 0x0, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Objects/abstract.c":373, 0x1003ae70] | > 3 PyNumber_Multiply(0x100b69fc, 0x10116788, 0x8, 0x100a1318, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Objects/abstract.c":544, 0x1003b5a4] | > 4 eval_code2(0x1012c688, 0x0, 0xffffffec, 0x0, 0x0, 0x0, 0x0, 0x0) ["/usr/u0/tommy/pycvs/python/dist/src/Python/ceval.c":896, 0x10034a54] | > 5 PyEval_EvalCode(0x100b69fc, 0x10116788, 0x8, 0x100a1318, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Python/ceval.c":336, 0x10031768] | > 6 run_node(0x100f88c0, 0x10116788, 0x0, 0x0, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Python/pythonrun.c":931, 0x10040444] | > 7 PyRun_InteractiveOne(0x0, 0x100b1878, 0x8, 0x100a1318, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Python/pythonrun.c":540, 0x1003f1f0] | > 8 PyRun_InteractiveLoop(0xfb4a398, 0x100b1878, 0x8, 0x100a1318, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Python/pythonrun.c":486, 0x1003ef84] | > 9 PyRun_AnyFileEx(0xfb4a398, 0x100b1878, 0x0, 0x100a1318, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Python/pythonrun.c":461, 0x1003eeac] | > 10 Py_Main(0x1, 0x0, 0x8, 0x100a1318, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Modules/main.c":292, 0x1000bba4] | > 11 main(0x100b69fc, 0x10116788, 0x8, 0x100a1318, 0x10050000, 0x10116788, 0x100a1318, 0x100a1290) ["/usr/u0/tommy/pycvs/python/dist/src/Modules/python.c":10, 0x1000b7bc] | > More (n if no)?y | > 12 __start() ["/xlv55/kudzu-apr12/work/irix/lib/libc/libc_n32_M4/csu/crt1text.s":177, 0x1000b558] | > (dbx) | > | > _______________________________________________ | > Python-Dev mailing list | > Python-Dev@python.org | > http://mail.python.org/mailman/listinfo/python-dev | > | | -- Sjoerd Mullender <sjoerd.mullender@oratrix.com>

[Flying Cougar Burnette]
Oh, of course. Why didn't you say so? Micro-micro version 7.3.1.2m of the SGI compiler fixed a bus error when doing int->float conversion. What? You don't believe me? Harrumph -- you just proved it <wink>. thanks-for-playing-and-pick-up-a-fabulous-prize-at-the-door-ly y'rs - tim
participants (3)
-
Flying Cougar Burnette
-
Sjoerd Mullender
-
Tim Peters