[Python-ideas] Way to check for floating point "closeness"?
abarnert at yahoo.com
Wed Jan 14 11:44:27 CET 2015
On Jan 14, 2015, at 1:08, Ron Adam <ron3200 at gmail.com> wrote:
> On 01/14/2015 12:10 AM, Andrew Barnert wrote:
>> On Jan 13, 2015, at 21:12, Ron Adam<ron3200 at gmail.com> wrote:
>>> >On 01/13/2015 09:25 PM, Chris Barker - NOAA Federal wrote:
>>>>>> >>> >On Jan 13, 2015, at 6:58 PM, Ron Adam<ron3200 at gmail.com> wrote:
>>>>>>>> >>>> >> maybe we could specify an absolute
>>>>>>>> >>>> >>tolerance near zero, and a relative tolerance elsewhere, both at once.
>>>>>>>> >>>> >>Tricky to document, even if possible.
>>>>> >>>Doesn't this problem come up at any boundary comparison, and not just zero?
>>>> >>Zero is special because you lose the ability to use a relative
>>>> >>tolerance. Everything is huge compared to zero.
>>> >After I posted I realised that when you compare anything you subtract what you are comparing to, and if it's equal to zero, then it's equal to what you are comparing to. So testing against zero is fundamental to all comparisons, is this is correct?
>>> >Wouldn't a relative tolerance be set relative to some value that is*not* zero? And then used when comparing any close values, including zero.
>> I think you're missing the meanings of these terms.
>> Absolute tolerance means a fixed tolerance, like 1e-5; relative tolerance means you pick a tolerance that's relative to the values being compared--think of it as a percentage, say, 0.01%. Plenty of values are within ± 1e-5 of 0. But the only value within ± 0.01% of 0 is 0 itself.
>> Another way to look at it: x is close to y with absolute tolerance 1e-5 if abs(x-y) < 1e-5. x is close to y with relative tolerance 1e-5 if abs(x-y)/y < 1e-5. So, no value is ever close to 0 within any relative tolerance.
>>>>>> >>> >So isn't the issue about any n distance from any floating point number that is less than 1 ulp?
>>>> >>I'm still a bit fuzzy on Ulps, but it seems the goal here is to define
>>>> >>a tolerance larger than an ulp. This is for the use case where we
>>>> >>expect multiple rounding errors -- many more than one ulp,
>>>> >>That's why I think the use case for ulp comparisons is more about
>>>> >>assessment of accuracy of algorithms than "did I introduce a big old
>>>> >>bug?" or, "is this computed value close enough to what I measured?"
>>> >I haven't looked into the finer aspects of ulps myself. It seems to me ulps
>>> >only matter if the exponent part of two floating point numbers are equal
>>> >and the value part is within 1 (or a small few) ulps of each other. Then
>>> >there may be problems determining if they are equal, or one is greater or
>>> >less than the other. And only if there is no greater tolerance value set.
>> No, this is wrong.
>> First, 1.1e1 and 1.0e10 (binary) are only off by 1 ulp even though they have different exponents.
> I see your point about the exponent.
> To be clear, Are we referring to significant digits here, or ...
> 1.0000...1e1 and 1.0000...0e10
No, those are pretty far apart. We're referring to
1.1111…1e1 and 1.0000…e10
I think your confusion here is entirely my fault. For simplicity, it's often helpful to look at tiny float representations--e.g., a 4-bit float with 1 sign bit, 1 mantissa bit, and 2 exponent bits (that, if both 1, mean inf/nan), because writing 51 0's tends to obscure what you're looking at. But it's pretty stupid to do that without mentioning that you're doing so, or how it extends to larger representations if non-obvious, and I think I was just that stupid.
> What I've read indicates ULP usually refers to the limits of the implementation/device.
Yes, it means "Unit of Least Precision" or "Unit in Least Place". There are a few ways to define this, but one definition is:
Ignoring zeroes and denormals, two numbers are 1 ulp apart if they're finite and have the same sign and either (a) they have the same exponent and a mantissa that differs by one, or (b) they have an exponent that differs by one, the smaller one has the max mantissa and the larger the min mantissa.
For zeroes, you can either define pos and neg zero as 1 ulp from each other and from the smallest denormal of the same sign, or as 0 ulp from each other and both 1 ulp from the smallest denormal of either sign.
> Significant digits has more to do with error of measurements, (and estimates), while ULPs is about accuracy limits of the hardware/software ability to calculate.
And, importantly, to represent your values in the first place. If you have a value that's, say, exactly 0.3 (or, for that matter, 0.3 to 28 significant digits), +/- 1 ulp is larger than your measurement error, but it's the minimum error range you can store.l
>> Second, there's never any problem determining if two finite numbers are
>> equal or which one is greater. The issue is determining whether two
>> numbers ± their error bars are too close to call vs. unambiguously
>> greater or lesser. For example, if I get 1.0e1 and 1.1e1 (assuming 1 bit
>> mantissa for ease of discussion), the latter is clearly greater--but if
>> I have 2 ulp of error, or an absolute error of 1e1, or a relative error
>> of 50%, the fact that the latter is greater is irrelevant--each value is
>> within the other value's error range.
> You would use the larger of the three. And possibly give a warning if the 2 ulp error is the largest. (if the application is set to do so.)
You use whichever is/are relevant to your error analysis and ignore the others. (Technically I guess you could say you're just using 0 as the error for the two you don't care about, and then it's guaranteed that the one you do care about is largest, but I don't think that's the way you'd normally think about it.) Also, you have to be careful about how that extends to inequality.
> I presuming the 2 ulp is twice the limit of the floating point precision here.
Yes, 2 ulp means off by 2 units of least precision. Of course for binary, that actually means off by 1 in the unit of penultimate precision.
> 50% accuracy of data
> 1e1 limit of significant digits/measurement
> 2 ulp twice floating point unit of least precision
>> But if I get 1.0e1 and 1.0e10 with
>> the same error, then I can say that the latter is unambiguously
> Yes, This was the point I was alluding to earlier.
>> I think Chris is right that ulp comparisons usually only come up in
>> testing an algorithm. You have to actually do an error analysis, and you
>> have to have input data with a precision specified in ulp, or you're not
>> going to get a tolerance in ulp. When you want to verify that you've
>> correctly implemented an algorithm that guarantees to multiply input ulp by
>> no more than 4, you can feed in numbers with ± 1 ulp error (just typing in
>> decimal numbers does that) and verify that the results are within ± 4 ulp.
>> But when you have real data, or inherent rounding issues, or an error
>> analysis that's partly made up of rules of thumb and winging it, you're
>> almost always going to end up with absolute or relative error instead. (Or,
>> occasionally, something more complicated that you have to code up man
>> ually, like logarithmic relative error.)
> If the algorithm doesn't track error accumulation, then yes.
> This is interesting but I'm going to search for some examples of how to use some of this. I'm not sure I can add to the conversation much, but thanks for taking the time to explain some of it.
> Python-ideas mailing list
> Python-ideas at python.org
> Code of Conduct: http://python.org/psf/codeofconduct/
More information about the Python-ideas