[Python-ideas] Way to check for floating point "closeness"?

Steven D'Aprano steve at pearwood.info
Tue Jan 13 02:34:53 CET 2015


On Mon, Jan 12, 2015 at 09:02:17AM -0800, Chris Barker wrote:
> Now that we're talking about floating point conveniences (math.nan,
> linspace):
> 
> What about putting an
> 
> almost_equal(x,y,tol=1e14)
[...]
> Anyone else think this would be a good idea to add to the stdlib?

I do, and I have already done so!

It's an implementation detail of the statistics module (to be specific, 
its test suite), but it covers both relative and absolute error 
tolerances and handles infinities and NANs.

https://hg.python.org/cpython/file/1b145e8ae4be/Lib/test/test_statistics.py#l41

The default tolerances are more or less plucked out of thin air and 
probably should be discussed.

Ideally it should also handle ULP comparisons:

https://randomascii.wordpress.com/2012/02/25/comparing-floating-point-numbers-2012-edition/

Unfortunately a naive ULP comparison has trouble with NANs, INFs, and 
numbers close to zero, especially if they have opposite signs. The 
smallest representable denormalised floats larger, and smaller, than 
zero are:

5e-324
-5e-324

These are the smallest magnitude floats apart from zero, so we might 
hope that they are considered "close together", but they actually differ 
by 9223372036854775808 ULP. Ouch.

I have some ideas for dealing with that, and if anyone is interested I'm 
happy to talk about it, but they're not ready for production yet.

I think that the Bruce Dawson is right. Floating point comparisons are 
hard, really hard. I know that I've still got a lot to learn about it. I 
can think of at least five different ways to compare floats for 
equality, and they all have their uses:

- exact equality using ==
- absolute error tolerances
- relative error tolerances
- ULP comparisons
- the method unittest uses, using round()


I'm explicitly including == because it is a floating point superstition 
that one should never under any circumstances compare floats for exact 
equality. As general advice, "don't use == unless you know what you are 
doing" is quite reasonable, but it's the "never use" that turns it into 
superstition. As Bruce Dawson says, "Floating-point numbers aren’t 
cursed", and throwing epsilons into a problem where no epsilon is needed 
is a bad idea.

https://randomascii.wordpress.com/2012/06/26/doubles-are-not-floats-so-dont-compare-them/


Aside: I'm reminded of APL, which mandates fuzzy equality (i.e. with a 
tolerance) of floating point numbers:

    In an early talk Ken [Iverson] was explaining the advantages
    of tolerant comparison. A member of the audience asked 
    incredulously, “Surely you don’t mean that when A=B and B=C, 
    A may not equal C?” Without skipping a beat, Ken replied, 
    “Any carpenter knows that!” and went on to the next question.
    - Paul Berry



-- 
Steve


More information about the Python-ideas mailing list