On 26 January 2015 at 05:54, Steven D'Aprano
I really don't think that setting one or two error tolerances is "floating point arcana".
The hundreds of messages on this topic would tend to imply otherwise :-( And to many users (including me, apparently - I expected the first one to give False), the following is "floating point arcana":
0.1*10 == 1.0 True sum([0.1]*10) 0.9999999999999999 sum([0.1]*10) == 1 False
I don't think that having to explicitly decide on what counts as "close" (as either an absolute difference or a relative difference) is especially onerous: surely anyone writing code will be able to cope with one or two decisions:
- close enough means they differ by no more than X
- close enough means they differ by no more than X%, expressed as a fraction
This does seem relatively straightforward, though. Although in the second case you glossed over the question of X% of *what* which is the root of the "comparison to zero" question, and is precisely where the discussion explodes into complexity that I can't follow, so maybe that's precisely the bit of "floating point arcana" that the naive user doesn't catch on to. I'm not saying that you are being naive, rather that readers of the docs (and hence users of the function) will be, and will find it confusing for precisely this reason.
I'm almost inclined to not set any defaults, except perhaps zero for both (in which case "close to" cleanly degrades down to "exactly equal" except slower) and force the user to explicitly choose a value.
But the function (in the default case) then won't mean "close to" (at least not in any sense that people will expect). Maybe making it mandatory to specify one or the other parameter, and making them keyword-only parameters, would be sufficiently explicit. But see below).
Arguments in favour of setting some defaults:
- People who want a "zero-thought" solution will get one, even if it does the wrong thing for their specific application, but at least they didn't have to think about it.
- The defaults might occasionally be appropriate.
Arguments against:
- There is no default error tolerance we can pick, whether relative or absolute, which will suit everyone all the time. Unless the defaults are appropriate (say) 50% of the time or more, they will just be an attractive nuisance (see zero-thought above).
I'm not sure what you're saying here - by "not setting defaults" do you mean making it mandatory for the user to supply a tolerance, as I suggested above?
I really think that having three tolerances, once of which is nearly always ignored, is poor API design. The user usually knows when they are comparing against an expected value of zero and can set an absolute error tolerance.
Agreed.
How about this?
- Absolute tolerance defaults to zero (which is equivalent to exact equality).
- Relative tolerance defaults to something (possibly zero) to be determined after sufficient bike-shedding.
Starting the bike-shedding now, -1 on zero. Having is_close default to something that most users won't think of as behaving like their naive expectation of "is close" (as opposed to "equals") would be confusing.
- An argument for setting both values to zero by default is that it will make it easy to choose one of "absolute or relative". You just supply a value for the one that you want, and let the other take the default of zero.
Just make it illegal to set both. What happens when you have both set is another one of the things that triggers discussions that make my head explode. Setting just one implies the other is zero, setting neither implies whatever default is agreed on.
- At the moment, I'm punting on the behaviour when both abs and rel tolerances are provided. That can be bike-shedded later.
Don't allow it, it's too confusing for the target audience.
Setting both defaults to zero means that the zero-thought version:
if is_close(x, y): ...
will silently degrade to x == y, which is no worse than what people do now (except slower). We can raise a warning in that case.
It is worse, because it no longer says what it means.
The only tricky situation might be if you *may* be comparing against zero, but don't know so in advance.
This can probably be handled by sufficiently good documentation. Once it was clear to me that this was an asymmetric operation, and that you were comparing whether X is close to a known value Y, I stopped finding the requirement that you know what Y is to make sense of the function odd. Having said that, I don't think the name "is_close" makes the asymmetry clear enough. Maybe "is_close_to" would work better (there's still room for bikeshedding over which of the 2 arguments is implied as the "expected" value in that case)?
My claim wasn't that is_close_to(x, 0.0) provides a mathematically ill-defined result. I agree that that's a reasonable definition of "relatively close to" (though one could make an argument that zero is not relatively close to itself -- after all, abs(actual - expected)/expected is ill-defined).
Don't write it that way. Write it this way:
abs(actual - expected) <= relative_tolerance*expected
Now if expected is zero, the condition is true if and only if actual==expected.
I would call out this edge case explicitly in the documentation. It's a straightforward consequence of the definition, but it *will* be surprising for many users. Personally, I'd not thought of the implication till it was pointed out here.
It would be bizarre for is_close(a, a) to return False (or worse, raise an exception!) for any finite number. NANs, of course, are allowed to be bizarre. Zero is not :-)
Definitely agreed.
Instead, my point was that if the user is asking "is this close to 0?" instead of "is this exactly equal to zero?" then they probably are expecting that there exist some inputs for which those two questions give different answers. Saying "well TECHNICALLY this is a valid definition of 'close to'" is certainly true but somewhat unkind.
I agree, but I think this is a symptom of essential complexity in the problem domain. Ultimately, "is close" is ill-defined, and *somebody* has to make the decision what that will be, and that decision won't satisfy everyone always. We can reduce the complexity in one place:
* provide sensible default values that work for expected != 0
but only by increasing the complexity elsewhere:
* when expected == 0 the intuition that is_close is different from exact equality fails
We can get rid of that complexity, but only by adding it back somewhere else:
Agreed. The essential complexity here may not seem that complex to specialists, but I can assure you that it is for at least this user :-) Overall, I think it would be better to simplify the proposed function in order to have it better suit the expectations of its intended audience, rather than trying to dump too much functionality in it on the grounds of making it "general". If there's one clear lesson from this thread, it's that floating point closeness can mean a lot of things to people - and overloading one function with all of those meanings doesn't seem like a good way of having a clean design. Paul