[Python-ideas] Floating point "closeness" Proposal Outline

Neil Girdhar mistersheik at gmail.com
Tue Jan 20 05:29:37 CET 2015


Also for complex numbers, I think comparing the magnitude (distance from 
the origin, or absolute value) of (x-y) to the size of x or y makes more 
sense than calling is_close on the real and imaginary parts.  What if the 
real parts are much larger than the imaginary parts, e.g.   x=1e5+1e-5j, 
y=1e5-1e-5j.  Do you think x and y are not close?

Best,

Neil


On Monday, January 19, 2015 at 1:33:44 AM UTC-5, Chris Barker wrote:
>
> OK folks,
>
> There has been a lot of chatter about this, which I think has served to 
> provide some clarity, at least to me. However, I'm concerned that the 
> upshot, at least for folks not deep into the discussion, will be: clearly 
> there are too many use-case specific details to put any one thing in the 
> std lib. But I still think we can provide something that is useful for most 
> use-cases, and would like to propose what that is, and what the decision 
> points are:
>
> A function for the math module, called somethign like "is_close", 
> "approx_equal", etc. It will compute a relative tolerance, with a  default 
> maybe around 1-e12, with the user able to specify the tolerance they want.
>
> Optionally, the user can specify an "minimum absolute tolerance", it will 
> default to zero, but can be set so that comparisons to zero can be handled 
> gracefully.
>
> The relative tolerance will be computed from the smallest of the two input 
> values, so as to get symmetry : is_close(a,b) == is_close(b,a). (this is 
> the Boost "strong" definition, and what is used by Steven D'Aprano's code 
> in the statistics test module)
>
> Alternatively, the relative error could be computed against a particular 
> one of the input values (the second one?). This would be asymmetric, but be 
> more clear exactly how "relative" is defined, and be closer to what people 
> may expect when using it as a "actual vs expected" test. --- "expected" 
> would be the scaling value. If the tolerance is small, it makes very little 
> difference anyway, so I'm happy with whatever consensus moves us to. Note 
> that if we go this way, then the parameter names should make it at least a 
> little more clear -- maybe "actual" and "expected", rather than x and y or 
> a and b or... and the function name should be something like is_close_to, 
> rather than just is_close.
>
> It will be designed for floating point numbers, and handle inf, -inf, and 
> NaN "properly". But is will also work with other numeric types, to the 
> extent that duck typing "just works" (i.e. division and comparisons all 
> work).
>
> complex numbers will be handled by:
> is_close(x.real, y.real) and is_close(x.imag, y.imag)
> (but i haven't written any code for that yet)
>
> It will not do a simple absolute comparison -- that is the job of a 
> different function, or, better yet, folks just write it themselves:
>
> abs(x - y) <= delta
>
> really isn't much harder to write than a function call:
>
> absolute_diff(x,y,delta)
>
> Here is a gist with a sample implementation:
>
> https://gist.github.com/PythonCHB/6e9ef7732a9074d9337a
>
> I need to add more tests, and make the test proper unit tests, but it's a 
> start.
>
> I also need to see how it does with other data types than float -- 
> hopefully, it will "just work" with the core set.
>
> I hope we can come to some consensus that something like this is the way 
> to go.
>
> -Chris
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Sun, Jan 18, 2015 at 11:27 AM, Ron Adam <ron... at gmail.com <javascript:>
> > wrote:
>
>>
>>
>> On 01/17/2015 11:37 PM, Chris Barker wrote:
>>
>>>        (Someone claimed that 'nothing is close to zero'.  This is
>>>     nonsensical both in applied math and everyday life.)
>>>
>>>
>>> I'm pretty sure someone (more than one of use) asserted that "nothing is
>>> *relatively* close to zero -- very different.
>>>
>>
>> Yes, that is the case.
>>
>>
>>  And I really wanted a way to have a default behavior that would do a
>>> reasonable transition to an absolute tolerance near zero, but I no longer
>>> thing that's possible. (numpy's implimentaion kind of does that, but it 
>>> is
>>> really wrong for small numbers, and if you made the default min_tolerance
>>> the smallest possible representable number, it really wouldn't be useful.
>>>
>>
>> I'm going to try to summarise what I got out of this discussion.  Maybe 
>> it will help bring some focus to the topic.
>>
>> I think there are two case's to consider.
>>
>>      # The most common case.
>>      rel_is_good(actual, expected, delta)   # value +- %delta.
>>
>>      # Testing for possible equivalence?
>>      rel_is_close(value1, value2, delta)    # %delta close to each other.
>>
>> I don't think they are quite the same thing.
>>
>>      rel_is_good(9, 10, .1) --> True
>>      rel_is_good(10, 9, .1) --> False
>>
>>      rel_is_close(9, 10, .1) --> True
>>      rel_is_close(10, 9, .1) --> True
>>
>>
>> In the "is close" case, it shouldn't matter what order the arguments are 
>> given. The delta is the distance from the larger number the smaller number 
>> is.  (of the same sign)
>>
>> So when calculating the relative error from two values, you want it to be 
>> consistent with the rel_is_close function.
>>
>>      rel_is_close(a, b, delta) <---> rel_err(a, b) <= delta
>>
>> And you should not use the rel_err function in the rel_is_good function.
>>
>>
>>
>> The next issue is, where does the numeric accuracy of the data, 
>> significant digits, and the languages accuracy (ULPs), come into the 
>> picture.
>>
>> My intuition.. I need to test the idea to make a firmer claim.. is that 
>> in the case of is_good, you want to exclude the uncertain parts, but with 
>> is_close, you want to include the uncertain parts.
>>
>> Two values "are close" if you can't tell one from the other with 
>> certainty.  The is_close range includes any uncertainty.
>>
>> A value is good if it's within a range with certainty.  And this excludes 
>> any uncertainty.
>>
>> This is where taking in consideration of an absolute delta comes in. The 
>> minimum range for both is the uncertainty of the data. But is_close and 
>> is_good do different things with it.
>>
>> Of course all of this only applies if you agree with these definitions of 
>> is_close, and is_good. ;)
>>
>> Cheers,
>>    Ron
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Python-ideas mailing list
>> Python... at python.org <javascript:>
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
>
>
> -- 
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR&R            (206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115       (206) 526-6317   main reception
>
> Chris.... at noaa.gov <javascript:>
>  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150119/9afdef67/attachment-0001.html>


More information about the Python-ideas mailing list