[Python-ideas] PEP 485: A Function for testing approximate equality

Tue Jan 27 16:45:54 CET 2015

On 27 January 2015 at 14:28, Nick Coghlan <ncoghlan at gmail.com> wrote:
> Translate that into explicit English and I'm not sure a symmetric
> definition reads more clearly:
>
> "a and b are close to each other"
> "a is close to b"
> "b is close to a"

However, in programming terms,

are_close(a, b)
is_close_to(a, b)
is_close_to(b, a)

the latter two have the "which is the target" issue. And yes, real
code will have more obvious argument names. It's not a huge deal, I
agree. I'm just saying that the first form takes less mental effort to
parse while reading through a block of code.

Enough said. It's not a big deal, someone (not me) ultimately needs to
make the decision. I've explained my view so I'll stop.

> Given that the "is close to" formulation also simplifies the
> calculation of a relative tolerance (it's always relative to the right
> hand operand), it has quite a bit to recommend it.

Agreed. It's a trade-off, and my expectation is that most code will
simply use the defaults, so making that read better is a good choice.
If you believe that most people will explicitly set a tolerance of
some form, the asymmetric choice may well be better.

> With an asymmetric comparison, another alternative would be to have an
> explicit threshold value for the reference where it switched from
> relative to absolute tolerance checking. That is:
>
>     def is_close_to(value, reference, *, error_ratio=1e-8,
> near_zero_threshold=1e-6, near_zero_tolerance=1e-14):
>         """Check if the given value is close to a reference value
>
>             In most cases, the two values are close if
> 'abs(value-reference) < reference*error_ratio'
>             If abs(reference) < near_zero_threshold, or
> near_zero_threshold is None,
>             the values are close if 'abs(value-reference) < near_zero_tolerance'
>         """

Eep. All I can say is that I never expect to write code where I'd even
consider changing the parameters as documented there. I don't think I
could understand the implications well enough to trust my judgement.
Remember, my intuition hits its limit at "within 1 millionth of a
percent of each other" (that's the 1e-8) or "numbers under 1e-6 have
to differ by no more than 1e-14" (the other two). And I'd punt on what
might happen if both conditions apply.

> Setting near_zero_threshold to 0 would force a relative comparison
> (even near zero), while setting it to None would force an absolute one
> (even far away from zero).

The latter choice would make the name "near_zero_tolerance" a pretty
odd thing to see...

> If you look at the default values, this is actually a very similar
> definition to the one Chris has in PEP 485, as the default near zero
> threshold is the default error ratio multiplied by the default near
> zero tolerance, although I'm not sure as to the suitability of those
> numbers.
>
> The difference is that this takes the cutoff point between using a
> relative error definition (to handle the dynamic range issues of a
> floating point representation) and an absolute error definition (to
> handle the instability of relative difference near zero) and *gives it
> a name*, rather than deriving it from a confusing combination of the
> reference value, the error ratio and the near zero tolerance.
>
>> That's it. Anyone wanting to specify both parameters together, or
>> wanting the defaults to still apply "as well as" an explicitly
>> specified tolerance, is deemed an "expert" and should be looking for a
>> more specialised function (or writing their own).
>
> I believe breaking out the cutoff point as a separately named
> parameter makes the algorithm easy enough to explain that restricting
> it isn't necessary.

I'd love to see the proposed documentation, as I think it would
probably read as "complicated stuff, leave well alone" to most people.
But I *am* assuming a target audience that currently uses
"abs(x-y)<1e-8" [1], and unittest's assertAlmostEqual, and doesn't
think they need anything different. The rules are different if the
target audience is assumed to know more than that.

Paul

[1] Someone, it may have been Chris or it may have been someone else,
used that snippet, and I've seen 1e-8 turn up elsewhere in books on
numerical algorithms. I'm not sure why people choose 1e-8 (precision
of a C float?), and how it relates to the 1e-6, 1e-8 and 1e-14 you
chose for your definition. It feels like the new function may be a lot
stricter than the code people naively (or otherwise) write today. Is
that fair, or am I reading too much into some arbitrary numbers? (Note
- I won't understand the explanation, I'm happy with just "that's a
good point" or "no, the numbers ultimately chosen will be fine" :-))