On 1 April 2010 03:15, David Cournapeau david@silveregg.co.jp wrote:

Anne Archibald wrote:

Particularly given the comments in the boost source code, I'm leery of this fix; who knows what an optimizing compiler will do with it?

But the current code *is* wrong: it is not true that u == 1 implies u - 1 == 0 (and that (u-1) != 0 -> u != 1), because the spacing between two consecutive floats is much bigger at 1 than at 0. And the current code relies on this wrong assumption: at least with the correction, we test for what we care about.

I don't think this is true for IEEE floats, at least in the case we're interested in where u is approximately 1. After all, the issue is representation: if you take an x very close to zero and add 1 to it, you get exactly 1 (which is what you're pointing out); but if you then subtract 1 again, you get exactly zero. After all, u==1 means that the bit patterns for u and 1 are identical, so if you subtract 1 from u you get exactly zero. If u is not equal to 1 (and we're talking about IEEE floats) and you subtract 1, you will get something that is not zero.

The problem is that what we're dealing with is not IEEE floating-point: it resembles IEEE floating-point, but arithmetic is often done with extra digits, which are discarded or not at the whim of the compiler. If the extra digits were kept we'd still have (u-1)==0 iff u==1, but if the compiler discards the extra digits in one expression but not the other - which is a question about FPU register scheduling - funny things can happen.

So I would say the code correction I suggest is at least better in that respect.

Now the "correction" is completely bogus.

I am not sure to understand why ? It is at least not worse than the current code.

Well, it would be hard to be worse than inappropriate NaNs. But the point of the correction is that, through some deviousness I don't fully understand, the roundoff error involved in the log is compensated for by the roundoff error in the calculation of (1+x)-1. If the compiler uses a different number of digits to calculate the log than it uses to caiculate (1+x)-1, the "correction" will make a hash of things.

Since this is a subtle issue, I vote for delegating it (and log10_1p) to log1p. If *that* has trouble, well, at least it'll only be on non-C99 machines, and we can try compiler gymnastics.

Yes, we could do that. I note that on glibc, the function called is an intrinsic for log1p (FYL2XP1) if x is sufficiently small.

Even if we end up having to implement this function by hand, we should at least implement it only once - log1p - and not three times (log1p, log10_1p, log2_1p).

Clearly the optimizing compiler is inserting the DRDB (drain David's battery) opcode.

:)

That was tongue-in-cheek, but there was a real point: optimizing compilers do all sorts of peculiar things, particularly with floating-point. For one thing, many Intel CPUs have two (logically-)separate units for doing double-precision calculations - the FPU, which keeps extra digits, and SSE2, which doesn't. Who knows which one gets used for which operations? -ffloat-store is supposed to force the constant discarding of extra digits, but we certainly don't want to apply that to all of numpy. So if we have to get log1p working ourselves it'll be an exercise in suppressing specific optimizations in all optimizing compilers that might compile numpy.

Anne