Python -- floating point arithmetic

Wed Jul 7 12:29:55 EDT 2010

On Wed, 07 Jul 2010 15:08:07 +0200, Thomas Jollans wrote:

> you should never rely on a floating-point number to have exactly a
> certain value.

"Never" is an overstatement. There are situations where you can rely
upon a floating-point number having exactly a certain value.

First, floating-point values are exact. They may be an approximation
to some other value, but they are still exact values, not some kind of
indeterminate quantum state. Specifically, a floating-point value is a
rational number whose denominator is a power of two.

Second, if the result of performing a primitive arithmetic operation
(addition, subtraction, multiplication, division, remainder) or the
square-root function on the equivalent rational values is exactly
representable as a floating-point number, then the result will be exactly
that value.

Third, if the result of performing a primitive arithmetic operation or the
square-root function on the equivalent rational values *isn't* exactly
representable as a floating-point number, then the floating-point result
will be obtained by rounding the exact value according to the FPU's
current rounding mode.

All of this is entirely deterministic, and follows relatively simple
rules. Even if the CPU has a built-in random number generator, it will
*not* be used to generate the least-significant bits of a floating-point
arithmetic result.

The second and third cases above assume that floating-point arithmetic
follows IEEE-754 (the second case is likely to be true even for systems
which don't strictly adhere to IEEE-754). This is true for most modern
architectures, provided that:

1. You aren't using Borland C, which forcibly "optimises" x/y to x*(1/y),
so 12/3 isn't exactly equal to 4, as 1/3 isn't exactly representable. Real
compilers won't use this sort of approximation unless specifically
instructed (e.g. -funsafe-math-optimizations for gcc).

2. You aren't using one of the early Pentium chips.

In spite of this, there are some "gotcha"s. E.g. on x86, results are
computed to 80-bit (long double) precision internally. These will be
truncated to 64 bits if stored in memory. Whether the truncation occurs is
largely up to the compiler, although it can be forced with -ffloat-store
with gcc.

More complex functions (trigonometric, etc) are only accurate to within a
given relative error (e.g. +/- the value of the least significant bit), as
it isn't always possible to determine the correct value for the least
significant bit for a given rounding mode (and even if it is theoretically
possible, there is no limit to the number of bits of precision which would
be required).