
Well, it turns out that disabling fused-add-mul indeed fixes the problem. The CodeWarrior manual warns that results may be slightly different with and without fused instructions, but the example they give is with operations apparently done in higher precision with the fused instructions. No word about nonstandard behaviour for +0.0 and -0.0. As this seems to be a PowerPC issue, not a MacOS issue, it is something that other PowerPC porters may want to look out for too (does AIX still exist?). -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm

[Jack Jansen]
Well, it turns out that disabling fused-add-mul indeed fixes the problem. The CodeWarrior manual warns that results may be slightly different with and without fused instructions, but the example they give is with operations apparently done in higher precision with the fused instructions. No word about nonstandard behaviour for +0.0 and -0.0.
As this seems to be a PowerPC issue, not a MacOS issue, it is something that other PowerPC porters may want to look out for too (does AIX still exist?).
The PowerPC architecture's fused instructions are wonderful for experts, because in a*b+c (assuming IEEE doubles w/ 53 bits of precision) they compute the a*b part to 106 bits of precision internally, and the add of c gets to see all of them. This is great if you *know* c is pretty much the negation of the high-order 53 bits of the product, because it lets you get at the *lower* 53 bits too; e.g., hipart = a*b; lopart = a*b - hipart; /* assuming fused mul-sub is generated */ gives a pair of doubles (hipart, lopart) whose mathematical (not f.p.) sum hipart + lopart is exactly equal to the mathematical (not f.p.) product a*b. In the hands of an expert, this can, e.g., be used to write ultra-fast high-precision math libraries: it gives a very cheap way to get the effect of computing with about twice the native precision. So that's the kind of thing they're warning you about: without the fused mul-sub, "lopart" above is always computed to be exactly 0.0, and so is useless. Contrarily, some fp algorithms *depend* on cancelling out oodles of leading bits in intermediate results, and in the presence of fused mul-add deliver totally bogus results. However, screwing up 0's sign bit has nothing to do with any of that, and if the HW is producing -0 for a fused (+anything)*(+0)-(+0), it can't be called anything other than a HW bug (assuming it's not in the to-minus-infinity rounding mode). When a given compiler generates fused instructions (when available) is a x-compiler crap-shoot, and the compiler you're using *could* have generated them before with the same end result. There's really nothing portable we can do in the source code to convince a compiler never to generate them. So looks like you're stuck with a compiler switch here. not-the-outcome-i-was-hoping-for-but-i'll-take-it<wink>-ly y'rs - tim
participants (2)
-
Jack Jansen
-
Tim Peters