[Python-ideas] Way to check for floating point "closeness"?
Steven D'Aprano
steve at pearwood.info
Tue Jan 13 02:34:53 CET 2015
On Mon, Jan 12, 2015 at 09:02:17AM -0800, Chris Barker wrote:
> Now that we're talking about floating point conveniences (math.nan,
> linspace):
>
> What about putting an
>
> almost_equal(x,y,tol=1e14)
[...]
> Anyone else think this would be a good idea to add to the stdlib?
I do, and I have already done so!
It's an implementation detail of the statistics module (to be specific,
its test suite), but it covers both relative and absolute error
tolerances and handles infinities and NANs.
https://hg.python.org/cpython/file/1b145e8ae4be/Lib/test/test_statistics.py#l41
The default tolerances are more or less plucked out of thin air and
probably should be discussed.
Ideally it should also handle ULP comparisons:
https://randomascii.wordpress.com/2012/02/25/comparing-floating-point-numbers-2012-edition/
Unfortunately a naive ULP comparison has trouble with NANs, INFs, and
numbers close to zero, especially if they have opposite signs. The
smallest representable denormalised floats larger, and smaller, than
zero are:
5e-324
-5e-324
These are the smallest magnitude floats apart from zero, so we might
hope that they are considered "close together", but they actually differ
by 9223372036854775808 ULP. Ouch.
I have some ideas for dealing with that, and if anyone is interested I'm
happy to talk about it, but they're not ready for production yet.
I think that the Bruce Dawson is right. Floating point comparisons are
hard, really hard. I know that I've still got a lot to learn about it. I
can think of at least five different ways to compare floats for
equality, and they all have their uses:
- exact equality using ==
- absolute error tolerances
- relative error tolerances
- ULP comparisons
- the method unittest uses, using round()
I'm explicitly including == because it is a floating point superstition
that one should never under any circumstances compare floats for exact
equality. As general advice, "don't use == unless you know what you are
doing" is quite reasonable, but it's the "never use" that turns it into
superstition. As Bruce Dawson says, "Floating-point numbers aren’t
cursed", and throwing epsilons into a problem where no epsilon is needed
is a bad idea.
https://randomascii.wordpress.com/2012/06/26/doubles-are-not-floats-so-dont-compare-them/
Aside: I'm reminded of APL, which mandates fuzzy equality (i.e. with a
tolerance) of floating point numbers:
In an early talk Ken [Iverson] was explaining the advantages
of tolerant comparison. A member of the audience asked
incredulously, “Surely you don’t mean that when A=B and B=C,
A may not equal C?” Without skipping a beat, Ken replied,
“Any carpenter knows that!” and went on to the next question.
- Paul Berry
--
Steve
More information about the Python-ideas
mailing list