[Python-ideas] Floating point "closeness" Proposal Outline

Tue Jan 20 00:16:54 CET 2015

On 01/19/2015 02:28 PM, Chris Barker wrote:
> On Mon, Jan 19, 2015 at 11:51 AM, Ron Adam <ron3200 at gmail.com
> <mailto:ron3200 at gmail.com>> wrote:
>
>     Here is a gist with a sample implementation:
>
>
>         https://gist.github.com/__PythonCHB/6e9ef7732a9074d9337a
>         <https://gist.github.com/PythonCHB/6e9ef7732a9074d9337a>
>
>
>     For the most part I think it looks good.
>
>     Boost describes both a week and strong version, but I didn't see why
>     they choose the strong version.
>
>
> Actually, Boost didn't choose, there is a flag you can set to select which
> one to use. I don't want to do that, if you really want something special,
> write it yourself. I did choose the "strong" version -- it just seemed more
> conservative. Also it's what I think Steven chose for his version (though
> not with and, but the result is the same)

The two different cases probably should be two different functions, and not 
use a flag.  I'm not suggesting we need both.

> As, in the common case, the tolerance is approximate, and usually small,
> then it doesn't matter which version we use: string, weak, or declared
> which value to scale to. But I prefer a symmetric version, as I suspect hat
> will be the least surprising -- it's good to get the same answer every
> time, even if it is approximate!

Well, approximate-in, approximate-out.  Unfortunately that applies to all 
math.  If you use approximations in your math, you will get approximate 
answers.  But many computer programmers like things to be a bit more precise.

So I suggest not using the word approximate or estimate in the docs.  The 
calculation isn't an approximation even if the values you supply it are. 
It actually is a well defined range test.

>         I hope we can come to some consensus that something like this is
>         the way to go.
>
>     Good examples will help with this.  It may also help with choosing a
>     good name.
>
>
> you mean use-case examples? rather than specific value examples?

Yes, specific values don't indicate how something should be used.

>     To me, the strong version is an "is-good" test, and the weak version is
>     an "is-close" test.  I think it could be important to some people.
>
>     I like the idea of being able to use these as a teaching tool to
>     demonstrate how our ideas of closeness, equality, and inequality can be
>     subjective.
>
>
> Are you suggesting that we allow a flag for the user to set to choose
> whether ot use weak or string version? I'd rather not -- I see this is a
> practical, works most of the time thing, not a teaching tool, or a
> "provides every use case" tool.

No flag, just that it needs to be well defined and not mix explanations of 
use of one with the other.  Pick one, and then document how to use it 
correctly.  At some point maybe someone will add the other if it's needed.

It is possible to use one for the other if you take the differences into 
account in the arguments.

>     There are two cases...
>
>     1: (The weak version is require for this to work.)
>
>     Two numbers are definitely not equivalent if they are further apart
>     than the largest error amount.  (The larger number better indicates the
>     largeness of the the possible relative error.)
>
>     And two numbers are close if you can't determine if they are
>     equivalent, or not-equivalent with certainty.*
>
>     (* "close numbers" may include equivalent numbers if you define it as a
>     set of all definitely not-equivalent numbers.)
>
>     2: (The strong version is required for this to work.)
>
>     A value is good if it's within a valid range with certainty.  It is
>     less than the smaller relative range of either number.  The smaller
>     number better indicates the magnitude of smallness.
>
>     So case 1 should be used to test for errors, and case 2 should be used
>     to test for valid ranges.
>
>     It seems you have the 2nd case in mind, and that's fine.  Some of us
>     where thinking of the first case, and possibly switching from one to
>     the other during the discussion which is probably why it got confusing
>     or repetitious at some points.
>
>
> yes, I suppose I do -- and again, in the common use case, where the
> tolerance is also approximate, it really doesn't matter.

I'm curious to what degree it can matter, given different size values and 
tolerances?

>     I think both of these are useful, but you definitely need to be clear
>     which one you are implementing, and to document it clearly.
>
>
> yup.

Cheers,
    Ron