[Python-Dev] test_coercion failing

Tim Peters tim.one@home.com
Mon, 26 Mar 2001 17:08:30 -0500


[Jack Jansen]
> Well, it turns out that disabling fused-add-mul indeed fixes the
> problem. The CodeWarrior manual warns that results may be slightly
> different with and without fused instructions, but the example they
> give is with operations apparently done in higher precision with the
> fused instructions. No word about nonstandard behaviour for +0.0 and
> -0.0.
>
> As this seems to be a PowerPC issue, not a MacOS issue, it is
> something that other PowerPC porters may want to look out for too
> (does AIX still exist?).

The PowerPC architecture's fused instructions are wonderful for experts,
because in a*b+c (assuming IEEE doubles w/ 53 bits of precision) they compute
the a*b part to 106 bits of precision internally, and the add of c gets to
see all of them.  This is great if you *know* c is pretty much the negation
of the high-order 53 bits of the product, because it lets you get at the
*lower* 53 bits too; e.g.,

    hipart = a*b;
    lopart = a*b - hipart;  /* assuming fused mul-sub is generated */

gives a pair of doubles (hipart, lopart) whose mathematical (not f.p.) sum
hipart + lopart is exactly equal to the mathematical (not f.p.) product a*b.
In the hands of an expert, this can, e.g., be used to write ultra-fast
high-precision math libraries:  it gives a very cheap way to get the effect
of computing with about twice the native precision.

So that's the kind of thing they're warning you about:  without the fused
mul-sub, "lopart" above is always computed to be exactly 0.0, and so is
useless.  Contrarily, some fp algorithms *depend* on cancelling out oodles of
leading bits in intermediate results, and in the presence of fused mul-add
deliver totally bogus results.

However, screwing up 0's sign bit has nothing to do with any of that, and if
the HW is producing -0 for a fused (+anything)*(+0)-(+0), it can't be called
anything other than a HW bug (assuming it's not in the to-minus-infinity
rounding mode).

When a given compiler generates fused instructions (when available) is a
x-compiler crap-shoot, and the compiler you're using *could* have generated
them before with the same end result.  There's really nothing portable we can
do in the source code to convince a compiler never to generate them.  So
looks like you're stuck with a compiler switch here.

not-the-outcome-i-was-hoping-for-but-i'll-take-it<wink>-ly y'rs  - tim