Mailman 3 Floating point "closeness" Proposal Outline - Python-ideas

19 Jan 2015

      OK folks,

There has been a lot of chatter about this, which I think has served to
provide some clarity, at least to me. However, I'm concerned that the
upshot, at least for folks not deep into the discussion, will be: clearly
there are too many use-case specific details to put any one thing in the
std lib. But I still think we can provide something that is useful for most
use-cases, and would like to propose what that is, and what the decision
points are:

A function for the math module, called somethign like "is_close",
"approx_equal", etc. It will compute a relative tolerance, with a  default
maybe around 1-e12, with the user able to specify the tolerance they want.

Optionally, the user can specify an "minimum absolute tolerance", it will
default to zero, but can be set so that comparisons to zero can be handled
gracefully.

The relative tolerance will be computed from the smallest of the two input
values, so as to get symmetry : is_close(a,b) == is_close(b,a). (this is
the Boost "strong" definition, and what is used by Steven D'Aprano's code
in the statistics test module)

Alternatively, the relative error could be computed against a particular
one of the input values (the second one?). This would be asymmetric, but be
more clear exactly how "relative" is defined, and be closer to what people
may expect when using it as a "actual vs expected" test. --- "expected"
would be the scaling value. If the tolerance is small, it makes very little
difference anyway, so I'm happy with whatever consensus moves us to. Note
that if we go this way, then the parameter names should make it at least a
little more clear -- maybe "actual" and "expected", rather than x and y or
a and b or... and the function name should be something like is_close_to,
rather than just is_close.

It will be designed for floating point numbers, and handle inf, -inf, and
NaN "properly". But is will also work with other numeric types, to the
extent that duck typing "just works" (i.e. division and comparisons all
work).

complex numbers will be handled by:
is_close(x.real, y.real) and is_close(x.imag, y.imag)
(but i haven't written any code for that yet)

It will not do a simple absolute comparison -- that is the job of a
different function, or, better yet, folks just write it themselves:

abs(x - y) <= delta

really isn't much harder to write than a function call:

absolute_diff(x,y,delta)

Here is a gist with a sample implementation:

https://gist.github.com/PythonCHB/6e9ef7732a9074d9337a

I need to add more tests, and make the test proper unit tests, but it's a
start.

I also need to see how it does with other data types than float --
hopefully, it will "just work" with the core set.

I hope we can come to some consensus that something like this is the way to
go.

-Chris

On Sun, Jan 18, 2015 at 11:27 AM, Ron Adam  wrote:
...
On 01/17/2015 11:37 PM, Chris Barker wrote:
...
(Someone claimed that 'nothing is close to zero'.  This is
    nonsensical both in applied math and everyday life.)
I'm pretty sure someone (more than one of use) asserted that "nothing is
*relatively* close to zero -- very different.
Yes, that is the case.
And I really wanted a way to have a default behavior that would do a
...
reasonable transition to an absolute tolerance near zero, but I no longer
thing that's possible. (numpy's implimentaion kind of does that, but it is
really wrong for small numbers, and if you made the default min_tolerance
the smallest possible representable number, it really wouldn't be useful.
I'm going to try to summarise what I got out of this discussion.  Maybe it
will help bring some focus to the topic.
I think there are two case's to consider.
# The most common case.
     rel_is_good(actual, expected, delta)   # value +- %delta.
# Testing for possible equivalence?
     rel_is_close(value1, value2, delta)    # %delta close to each other.
I don't think they are quite the same thing.
rel_is_good(9, 10, .1) --> True
     rel_is_good(10, 9, .1) --> False
rel_is_close(9, 10, .1) --> True
     rel_is_close(10, 9, .1) --> True
In the "is close" case, it shouldn't matter what order the arguments are
given. The delta is the distance from the larger number the smaller number
is.  (of the same sign)
So when calculating the relative error from two values, you want it to be
consistent with the rel_is_close function.
rel_is_close(a, b, delta) <---> rel_err(a, b) <= delta
And you should not use the rel_err function in the rel_is_good function.
The next issue is, where does the numeric accuracy of the data,
significant digits, and the languages accuracy (ULPs), come into the
picture.
My intuition.. I need to test the idea to make a firmer claim.. is that in
the case of is_good, you want to exclude the uncertain parts, but with
is_close, you want to include the uncertain parts.
Two values "are close" if you can't tell one from the other with
certainty.  The is_close range includes any uncertainty.
A value is good if it's within a range with certainty.  And this excludes
any uncertainty.
This is where taking in consideration of an absolute delta comes in. The
minimum range for both is the uncertainty of the data. But is_close and
is_good do different things with it.
Of course all of this only applies if you agree with these definitions of
is_close, and is_good. ;)
Cheers,
   Ron
_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/
-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker@noaa.gov

Floating point "closeness" Proposal Outline

Chris Barker

Stephen J. Turnbull

Chris Barker

Ethan Furman

Chris Barker

Chris Angelico

Terry Reedy

Chris Barker

Ron Adam

Chris Barker

Andrew Barnert

Chris Barker - NOAA Federal

Ron Adam

Chris Barker - NOAA Federal

Neil Girdhar

Paul Moore

Steven D'Aprano

Neil Girdhar

Chris Barker

Ron Adam

Joao S. O. Bueno

Steven D'Aprano

Joao S. O. Bueno

Joao S. O. Bueno

Chris Barker - NOAA Federal

Chris Barker - NOAA Federal

Terry Reedy

Ethan Furman

Nathaniel Smith

Andrew Barnert

Chris Barker

Chris Barker

Ethan Furman

Neil Girdhar

Chris Barker

tags

participants (13)