Floating point equality [was Re: What exactly is "exact" (was Clean Singleton Docstrings)]

Wed Jul 20 01:42:50 EDT 2016

On Tuesday 19 July 2016 14:58, Rustom Mody wrote:

> So I again ask: You say «"Never compare floats for equality" is a pernicious
> myth»

It is the word *never* which makes it superstition. If people said "Take care 
with using == for floats, its often not what you want" I would have no argument 
with the statement.

I'd even (reluctantly) accept "usually not what you want". But "never" is out-
and-out cargo-cult programming.

> Given that for Chris’ is_equal we get
> is_equal(.1+.1+.1, .3) is True
> whereas for python builtin == its False
> 
> What (non)myth do you suggest for replacement?

Floating point maths is hard, thinking carefully about what you are doing and 
whether it is appropriate to use == or a fuzzy almost-equal comparison, or if 
equality is the right way at all.

"But thinking is hard, can't you just tell me the answer?"

No. But I can give some guidelines:

Floating point arithmetic is deterministic, it doesn't just randomly mix in 
error out of spite or malice. So in principle, you can always estimate the 
rounding error from any calculation -- and sometimes there is none.

Arithmetic on integer-values (e.g. 1.0) is always exact, up to a limit of 
either 2**53 or approximately 1e53, I forget which. (That's why most Javascript 
programmers fail to notice that they don't have an integer type.) So long as 
you're using operations that only produce integer values from integer arguments 
(such as + - * // but not / ) then all calculations are exact. It is a waste of 
time to do:

x = 2.0
y = x*1002.0
is_equal(y, 2004.0, 1e-16)

when you can just do y == 2004.0.

If you do decide to use an absolute error, e.g.:

abs(x - y) < tolerance

keep in mind that your tolerance needs to be chosen relative to the x and y. 
For large values of x and y, the smallest possible difference may be very 
large:

py> x = 1e80
py> delta = 2**-1000
py> assert delta
py> while x + delta == x:
...     delta *= 2
... else:
...     print(delta)
... 
6.58201822928e+63

So if you're comparing two numbers around 1e80 or so, doing a "fuzzy 
comparison" using an absolute tolerance of less than 6.5e63 or so is just a 
slow and complicated way of performing an exact comparison using the == 
operator.

Absolute tolerance is faster and easier to understand, and works when the 
numbers are on opposite sides of zero, or if one (or both) is zero. But 
generally speaking, relative tolerance of one form or another:

abs(x - y) <= abs(x)*relative_tolerance
abs(x - y) <= abs(y)*relative_tolerance
abs(x - y) <= min(abs(x), abs(y))*relative_tolerance
abs(x - y) <= max(abs(x), abs(y))*relative_tolerance

is probably better, but they are slower.

A nice, simple technique is just to round:

if round(x, 6) == round(y, 6):

but that's not quite the same as abs(x-y) < 1e-6.

For library code that cares greatly about precision, using "Unit Last Place" 
(ULP) calculations are probably best. But that's a whole different story.

-- 
Steve