
On 27 January 2015 at 14:28, Nick Coghlan <ncoghlan@gmail.com> wrote:
Translate that into explicit English and I'm not sure a symmetric definition reads more clearly:
"a and b are close to each other" "a is close to b" "b is close to a"
However, in programming terms, are_close(a, b) is_close_to(a, b) is_close_to(b, a) the latter two have the "which is the target" issue. And yes, real code will have more obvious argument names. It's not a huge deal, I agree. I'm just saying that the first form takes less mental effort to parse while reading through a block of code. Enough said. It's not a big deal, someone (not me) ultimately needs to make the decision. I've explained my view so I'll stop.
Given that the "is close to" formulation also simplifies the calculation of a relative tolerance (it's always relative to the right hand operand), it has quite a bit to recommend it.
Agreed. It's a trade-off, and my expectation is that most code will simply use the defaults, so making that read better is a good choice. If you believe that most people will explicitly set a tolerance of some form, the asymmetric choice may well be better.
With an asymmetric comparison, another alternative would be to have an explicit threshold value for the reference where it switched from relative to absolute tolerance checking. That is:
def is_close_to(value, reference, *, error_ratio=1e-8, near_zero_threshold=1e-6, near_zero_tolerance=1e-14): """Check if the given value is close to a reference value
In most cases, the two values are close if 'abs(value-reference) < reference*error_ratio' If abs(reference) < near_zero_threshold, or near_zero_threshold is None, the values are close if 'abs(value-reference) < near_zero_tolerance' """
Eep. All I can say is that I never expect to write code where I'd even consider changing the parameters as documented there. I don't think I could understand the implications well enough to trust my judgement. Remember, my intuition hits its limit at "within 1 millionth of a percent of each other" (that's the 1e-8) or "numbers under 1e-6 have to differ by no more than 1e-14" (the other two). And I'd punt on what might happen if both conditions apply.
Setting near_zero_threshold to 0 would force a relative comparison (even near zero), while setting it to None would force an absolute one (even far away from zero).
The latter choice would make the name "near_zero_tolerance" a pretty odd thing to see...
If you look at the default values, this is actually a very similar definition to the one Chris has in PEP 485, as the default near zero threshold is the default error ratio multiplied by the default near zero tolerance, although I'm not sure as to the suitability of those numbers.
The difference is that this takes the cutoff point between using a relative error definition (to handle the dynamic range issues of a floating point representation) and an absolute error definition (to handle the instability of relative difference near zero) and *gives it a name*, rather than deriving it from a confusing combination of the reference value, the error ratio and the near zero tolerance.
That's it. Anyone wanting to specify both parameters together, or wanting the defaults to still apply "as well as" an explicitly specified tolerance, is deemed an "expert" and should be looking for a more specialised function (or writing their own).
I believe breaking out the cutoff point as a separately named parameter makes the algorithm easy enough to explain that restricting it isn't necessary.
I'd love to see the proposed documentation, as I think it would probably read as "complicated stuff, leave well alone" to most people. But I *am* assuming a target audience that currently uses "abs(x-y)<1e-8" [1], and unittest's assertAlmostEqual, and doesn't think they need anything different. The rules are different if the target audience is assumed to know more than that. Paul [1] Someone, it may have been Chris or it may have been someone else, used that snippet, and I've seen 1e-8 turn up elsewhere in books on numerical algorithms. I'm not sure why people choose 1e-8 (precision of a C float?), and how it relates to the 1e-6, 1e-8 and 1e-14 you chose for your definition. It feels like the new function may be a lot stricter than the code people naively (or otherwise) write today. Is that fair, or am I reading too much into some arbitrary numbers? (Note - I won't understand the explanation, I'm happy with just "that's a good point" or "no, the numbers ultimately chosen will be fine" :-))