[Python-ideas] Floating point "closeness" Proposal Outline

Tue Jan 20 17:56:58 CET 2015

On Mon, Jan 19, 2015 at 8:29 PM, Neil Girdhar <mistersheik at gmail.com> wrote:

> Also for complex numbers, I think comparing the magnitude (distance from
> the origin, or absolute value) of (x-y) to the size of x or y makes more
> sense than calling is_close on the real and imaginary parts.  What if the
> real parts are much larger than the imaginary parts, e.g.   x=1e5+1e-5j,
> y=1e5-1e-5j.  Do you think x and y are not close?
>

I've vacillated on this one. Personally I have no use case in mind, so
really hard to know what's best. but your example case was exactly what I
was thinking -- if the magnitude of one component is very different that
the other, then I'm not sure if you would want it to swamp the answer, and
requiring both components to be "close" would be the more conservative
approach.

On the other hand, there are cases where the exact result of a computation
on complex numbers is, in fact, a pure real or imaginary number. In these
cases, the computed result may be a large value in one component and a tiny
value in the other, and you would test against the "actual" value, which
would have a zero in there, so the relative error of that component by
itself would never be "small".

So yes, probably better to use the absolute value(s) to scale relative.

But if anyone has some use-cases that would suggest the more strict
approach, speak now.

-Chris

> Best,
>
> Neil
>
>
> On Monday, January 19, 2015 at 1:33:44 AM UTC-5, Chris Barker wrote:
>>
>> OK folks,
>>
>> There has been a lot of chatter about this, which I think has served to
>> provide some clarity, at least to me. However, I'm concerned that the
>> upshot, at least for folks not deep into the discussion, will be: clearly
>> there are too many use-case specific details to put any one thing in the
>> std lib. But I still think we can provide something that is useful for most
>> use-cases, and would like to propose what that is, and what the decision
>> points are:
>>
>> A function for the math module, called somethign like "is_close",
>> "approx_equal", etc. It will compute a relative tolerance, with a  default
>> maybe around 1-e12, with the user able to specify the tolerance they want.
>>
>> Optionally, the user can specify an "minimum absolute tolerance", it will
>> default to zero, but can be set so that comparisons to zero can be handled
>> gracefully.
>>
>> The relative tolerance will be computed from the smallest of the two
>> input values, so as to get symmetry : is_close(a,b) == is_close(b,a). (this
>> is the Boost "strong" definition, and what is used by Steven D'Aprano's
>> code in the statistics test module)
>>
>> Alternatively, the relative error could be computed against a particular
>> one of the input values (the second one?). This would be asymmetric, but be
>> more clear exactly how "relative" is defined, and be closer to what people
>> may expect when using it as a "actual vs expected" test. --- "expected"
>> would be the scaling value. If the tolerance is small, it makes very little
>> difference anyway, so I'm happy with whatever consensus moves us to. Note
>> that if we go this way, then the parameter names should make it at least a
>> little more clear -- maybe "actual" and "expected", rather than x and y or
>> a and b or... and the function name should be something like is_close_to,
>> rather than just is_close.
>>
>> It will be designed for floating point numbers, and handle inf, -inf, and
>> NaN "properly". But is will also work with other numeric types, to the
>> extent that duck typing "just works" (i.e. division and comparisons all
>> work).
>>
>> complex numbers will be handled by:
>> is_close(x.real, y.real) and is_close(x.imag, y.imag)
>> (but i haven't written any code for that yet)
>>
>> It will not do a simple absolute comparison -- that is the job of a
>> different function, or, better yet, folks just write it themselves:
>>
>> abs(x - y) <= delta
>>
>> really isn't much harder to write than a function call:
>>
>> absolute_diff(x,y,delta)
>>
>> Here is a gist with a sample implementation:
>>
>> https://gist.github.com/PythonCHB/6e9ef7732a9074d9337a
>>
>> I need to add more tests, and make the test proper unit tests, but it's a
>> start.
>>
>> I also need to see how it does with other data types than float --
>> hopefully, it will "just work" with the core set.
>>
>> I hope we can come to some consensus that something like this is the way
>> to go.
>>
>> -Chris
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Sun, Jan 18, 2015 at 11:27 AM, Ron Adam <ron... at gmail.com> wrote:
>>
>>>
>>>
>>> On 01/17/2015 11:37 PM, Chris Barker wrote:
>>>
>>>>        (Someone claimed that 'nothing is close to zero'.  This is
>>>>     nonsensical both in applied math and everyday life.)
>>>>
>>>>
>>>> I'm pretty sure someone (more than one of use) asserted that "nothing is
>>>> *relatively* close to zero -- very different.
>>>>
>>>
>>> Yes, that is the case.
>>>
>>>
>>>  And I really wanted a way to have a default behavior that would do a
>>>> reasonable transition to an absolute tolerance near zero, but I no
>>>> longer
>>>> thing that's possible. (numpy's implimentaion kind of does that, but it
>>>> is
>>>> really wrong for small numbers, and if you made the default
>>>> min_tolerance
>>>> the smallest possible representable number, it really wouldn't be
>>>> useful.
>>>>
>>>
>>> I'm going to try to summarise what I got out of this discussion.  Maybe
>>> it will help bring some focus to the topic.
>>>
>>> I think there are two case's to consider.
>>>
>>>      # The most common case.
>>>      rel_is_good(actual, expected, delta)   # value +- %delta.
>>>
>>>      # Testing for possible equivalence?
>>>      rel_is_close(value1, value2, delta)    # %delta close to each other.
>>>
>>> I don't think they are quite the same thing.
>>>
>>>      rel_is_good(9, 10, .1) --> True
>>>      rel_is_good(10, 9, .1) --> False
>>>
>>>      rel_is_close(9, 10, .1) --> True
>>>      rel_is_close(10, 9, .1) --> True
>>>
>>>
>>> In the "is close" case, it shouldn't matter what order the arguments are
>>> given. The delta is the distance from the larger number the smaller number
>>> is.  (of the same sign)
>>>
>>> So when calculating the relative error from two values, you want it to
>>> be consistent with the rel_is_close function.
>>>
>>>      rel_is_close(a, b, delta) <---> rel_err(a, b) <= delta
>>>
>>> And you should not use the rel_err function in the rel_is_good function.
>>>
>>>
>>>
>>> The next issue is, where does the numeric accuracy of the data,
>>> significant digits, and the languages accuracy (ULPs), come into the
>>> picture.
>>>
>>> My intuition.. I need to test the idea to make a firmer claim.. is that
>>> in the case of is_good, you want to exclude the uncertain parts, but with
>>> is_close, you want to include the uncertain parts.
>>>
>>> Two values "are close" if you can't tell one from the other with
>>> certainty.  The is_close range includes any uncertainty.
>>>
>>> A value is good if it's within a range with certainty.  And this
>>> excludes any uncertainty.
>>>
>>> This is where taking in consideration of an absolute delta comes in. The
>>> minimum range for both is the uncertainty of the data. But is_close and
>>> is_good do different things with it.
>>>
>>> Of course all of this only applies if you agree with these definitions
>>> of is_close, and is_good. ;)
>>>
>>> Cheers,
>>>    Ron
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Python-ideas mailing list
>>> Python... at python.org
>>> https://mail.python.org/mailman/listinfo/python-ideas
>>> Code of Conduct: http://python.org/psf/codeofconduct/
>>>
>>
>>
>>
>> --
>>
>> Christopher Barker, Ph.D.
>> Oceanographer
>>
>> Emergency Response Division
>> NOAA/NOS/OR&R            (206) 526-6959   voice
>> 7600 Sand Point Way NE   (206) 526-6329   fax
>> Seattle, WA  98115       (206) 526-6317   main reception
>>
>> Chris.... at noaa.gov
>>
>

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150120/905c7451/attachment.html>