[Python-ideas] Fwd: Way to check for floating point "closeness"?

Sun Jan 18 21:14:04 CET 2015

On Sun, Jan 18, 2015 at 11:27 AM, Ron Adam <ron3200 at gmail.com> wrote:

I'm going to try to summarise what I got out of this discussion.  Maybe it
> will help bring some focus to the topic.
>
> I think there are two case's to consider.
>
>      # The most common case.
>

why do you think this is the most common case?

>      rel_is_good(actual, expected, delta)   # value +- %delta.
>
>      # Testing for possible equivalence?
>      rel_is_close(value1, value2, delta)    # %delta close to each other.
>
> I don't think they are quite the same thing.
>
>      rel_is_good(9, 10, .1) --> True
>      rel_is_good(10, 9, .1) --> False
>
>      rel_is_close(9, 10, .1) --> True
>      rel_is_close(10, 9, .1) --> True
>

agreed -- they are not the same thing. But I'm not convinced that they are
all that different from a practical perspective -- 0.1 is a very large
relative tolerance (I wouldn't cal it delta for a relative measure) if you
use a more-common, much smaller tolerance (1e-8 -- 1e-12 maybe?) then the
difference between these becomes pretty minimal. And for the most part,
this kind of testing is looking for an "approximation" -- so you can't get
really upset about exactly where the cut-off is.

Though, given my thoughts on that, I suppose if other people want to be
able to clearly specify which of the two values should be used to scale
"relative", then it won't make much difference anyway.

The next issue is, where does the numeric accuracy of the data, significant
> digits, and the languages accuracy (ULPs), come into the picture.
>
> My intuition.. I need to test the idea to make a firmer claim.. is that in
> the case of is_good, you want to exclude the uncertain parts, but with
> is_close, you want to include the uncertain parts.
>

I think this is pretty irrelevant -- you can't do better than the
precession of the data type, doesn't make a difference which definition you
are using -- they only change which value is used to scale relative.

There are cases where ULPs, etc are key -- those are for testing precision
of algorithms, etc, and I think a special use case that would require a
different function -- no need to try to cram it all into one function.

> Two values "are close" if you can't tell one from the other with
> certainty.  The is_close range includes any uncertainty.
>

I think "uncertainly" is entirely use-case dependent -- floating point
calculations are not uncertain -- with a defined rounding procedure, etc,
they are completely deterministic. IT can often make things easier to think
about if you think of the errors as random, but they are, in fact, not
random. And if you care about the precision down to close to limits of
representation, then you care about ULPS, etc, and you'd better be
reasoning about the errors carefully.

This is where taking in consideration of an absolute delta comes in. The
> minimum range for both is the uncertainty of the data. But is_close and
> is_good do different things with it.
>

an absolute delta is simply a different use-case -- sometimes you know
exactly how much difference you care about, and sometimes you only care
that the numbers are relatively close (or, in any case, you need to test a
wide variety of magnitudes of numbers, so don't want to have to calculate
the absolute delta for each one). This is akin to saying that the numbers
are the same to a certain number of significant figures -- a common and
usefull way to think about it.

-Chris

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150118/26836d1f/attachment-0001.html>