[Python-ideas] PEP 485: A Function for testing approximate equality

Tue Jan 27 15:28:18 CET 2015

On 27 January 2015 at 19:49, Paul Moore <p.f.moore at gmail.com> wrote:
> On 27 January 2015 at 06:24, Chris Barker <chris.barker at noaa.gov> wrote:
>> If you ask the question: "are these to values close to each-other?" then the
>> symmetric test makes the most sense -- obviously if a is close to b then b
>> is close to a.
>>
>> whereas:
>>
>> If you ask the question: Is this value  within a defined relative difference
>> of an expected value? (i.e 10% of b) then you want an asymmetric test -- you
>> clearly are defining "relative" to the known value.
>>
>> However, and I've tried to say this multiple times -- and no one disagreed
>> (or agreed) yet:
>>
>> I don't care much which we use, because
>>
>> IT DOES NOT MATTER in most cases which you choose.
>
> All of that makes sense to me (I'm leaving out the detail of "it does
> not matter" as I'm sort of trusting you guys with the expertise to
> tell me that ;-))
>
> But what *does* matter to me is usability and how the behaviour
> matches people's intuition. Not because the mathematical results will
> differ, but because it makes it easy for people to *think* about what
> they are doing, and whether it's OK.
>
> I would say that the correct approach is to make the default case as
> easy to use as possible. For that, a symmetrical are_close(a,b) is a
> no-brainer IMO. (Of course it has to work when either of a and b is
> zero). It works either way - if one value is a known "target", or if
> both values are approximations (e.g. when looking at convergence).

Translate that into explicit English and I'm not sure a symmetric
definition reads more clearly:

"a and b are close to each other"
"a is close to b"
"b is close to a"

Given that the "is close to" formulation also simplifies the
calculation of a relative tolerance (it's always relative to the right
hand operand), it has quite a bit to recommend it.

> Once we have that as a basis, look at how people might want to tweak it:
>
> are_close(a, b, within_abs=1e-8)  # Within a specific distance of each
> other (absolute tolerance)
> are_close(a, b, within_rel=0.1) # Within 10% of each other
>
> In the relative case, I'd like "the experts" to decide for me what
> precisely "within 10% of each other" means (document the details,
> obviously, but don't bother me with them unless I go looking for
> them).
>
> In either case, I'd be happy to assume that if you change the
> defaults, you understand the implications (they can be explained in
> the documentation) such as relative tolerances being unstable near
> zero. I don't think it's a problem that the default behaviour can't be
> expressed in terms of explicit settings for the tolerance arguments
> (it's a wart, and could be annoying, but it's not a showstopper for me
> - allow setting both explicitly to None to mean "default" if it
> matters that much).

With an asymmetric comparison, another alternative would be to have an
explicit threshold value for the reference where it switched from
relative to absolute tolerance checking. That is:

    def is_close_to(value, reference, *, error_ratio=1e-8,
near_zero_threshold=1e-6, near_zero_tolerance=1e-14):
        """Check if the given value is close to a reference value

            In most cases, the two values are close if
'abs(value-reference) < reference*error_ratio'
            If abs(reference) < near_zero_threshold, or
near_zero_threshold is None,
            the values are close if 'abs(value-reference) < near_zero_tolerance'
        """

Setting near_zero_threshold to 0 would force a relative comparison
(even near zero), while setting it to None would force an absolute one
(even far away from zero).

If you look at the default values, this is actually a very similar
definition to the one Chris has in PEP 485, as the default near zero
threshold is the default error ratio multiplied by the default near
zero tolerance, although I'm not sure as to the suitability of those
numbers.

The difference is that this takes the cutoff point between using a
relative error definition (to handle the dynamic range issues of a
floating point representation) and an absolute error definition (to
handle the instability of relative difference near zero) and *gives it
a name*, rather than deriving it from a confusing combination of the
reference value, the error ratio and the near zero tolerance.

> That's it. Anyone wanting to specify both parameters together, or
> wanting the defaults to still apply "as well as" an explicitly
> specified tolerance, is deemed an "expert" and should be looking for a
> more specialised function (or writing their own).

I believe breaking out the cutoff point as a separately named
parameter makes the algorithm easy enough to explain that restricting
it isn't necessary.

Regards,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia