Re: [Python-ideas] PEP 485: A Function for testing approximate equality

26 Jan 2015

      On 26 January 2015 at 05:54, Steven D'Aprano  wrote:
...
I really don't think that setting one or two error tolerances is
"floating point arcana".
The hundreds of messages on this topic would tend to imply otherwise :-(

And to many users (including me, apparently - I expected the first one
to give False), the following is "floating point arcana":
...
...
...
0.1*10 == 1.0
True
sum([0.1]*10)
0.9999999999999999
sum([0.1]*10) == 1
False
...
I don't think that having to explicitly decide
on what counts as "close" (as either an absolute difference or a
relative difference) is especially onerous: surely anyone writing code
will be able to cope with one or two decisions:
- close enough means they differ by no more than X
- close enough means they differ by no more than X%, expressed
  as a fraction
This does seem relatively straightforward, though. Although in the
second case you glossed over the question of X% of *what* which is the
root of the "comparison to zero" question, and is precisely where the
discussion explodes into complexity that I can't follow, so maybe
that's precisely the bit of "floating point arcana" that the naive
user doesn't catch on to. I'm not saying that you are being naive,
rather that readers of the docs (and hence users of the function) will
be, and will find it confusing for precisely this reason.
...
I'm almost inclined to not set any defaults, except perhaps zero for
both (in which case "close to" cleanly degrades down to "exactly equal"
except slower) and force the user to explicitly choose a value.
But the function (in the default case) then won't mean "close to" (at
least not in any sense that people will expect). Maybe making it
mandatory to specify one or the other parameter, and making them
keyword-only parameters, would be sufficiently explicit. But see
below).
...
Arguments in favour of setting some defaults:
- People who want a "zero-thought" solution will get one, even
  if it does the wrong thing for their specific application, but
  at least they didn't have to think about it.
- The defaults might occasionally be appropriate.
Arguments against:
- There is no default error tolerance we can pick, whether relative
  or absolute, which will suit everyone all the time. Unless the
  defaults are appropriate (say) 50% of the time or more, they will
  just be an attractive nuisance (see zero-thought above).
I'm not sure what you're saying here - by "not setting defaults" do
you mean making it mandatory for the user to supply a tolerance, as I
suggested above?
...
I really think that having three tolerances, once of which is nearly
always ignored, is poor API design. The user usually knows when they are
comparing against an expected value of zero and can set an absolute
error tolerance.
Agreed.
...
How about this?
- Absolute tolerance defaults to zero (which is equivalent to
  exact equality).
- Relative tolerance defaults to something (possibly zero) to be
  determined after sufficient bike-shedding.
Starting the bike-shedding now, -1 on zero. Having is_close default to
something that most users won't think of as behaving like their naive
expectation of "is close" (as opposed to "equals") would be confusing.
...
- An argument for setting both values to zero by default is that
  it will make it easy to choose one of "absolute or relative". You
  just supply a value for the one that you want, and let the other
  take the default of zero.
Just make it illegal to set both. What happens when you have both set
is another one of the things that triggers discussions that make my
head explode. Setting just one implies the other is zero, setting
neither implies whatever default is agreed on.
...
- At the moment, I'm punting on the behaviour when both abs and rel
  tolerances are provided. That can be bike-shedded later.
Don't allow it, it's too confusing for the target audience.
...
Setting both defaults to zero means that the zero-thought version:
if is_close(x, y): ...
will silently degrade to x == y, which is no worse than what people
do now (except slower). We can raise a warning in that case.
It is worse, because it no longer says what it means.
...
The only tricky situation might be if you *may* be comparing against
zero, but don't know so in advance.
This can probably be handled by sufficiently good documentation. Once
it was clear to me that this was an asymmetric operation, and that you
were comparing whether X is close to a known value Y, I stopped
finding the requirement that you know what Y is to make sense of the
function odd.

Having said that, I don't think the name "is_close" makes the
asymmetry clear enough. Maybe "is_close_to" would work better (there's
still room for bikeshedding over which of the 2 arguments is implied
as the "expected" value in that case)?
...
...
My claim wasn't that is_close_to(x, 0.0) provides a mathematically
ill-defined result. I agree that that's a reasonable definition of
"relatively close to" (though one could make an argument that zero is
not relatively close to itself -- after all, abs(actual -
expected)/expected is ill-defined).
Don't write it that way. Write it this way:
abs(actual - expected) <= relative_tolerance*expected
Now if expected is zero, the condition is true if and only if
actual==expected.
I would call out this edge case explicitly in the documentation. It's
a straightforward consequence of the definition, but it *will* be
surprising for many users. Personally, I'd not thought of the
implication till it was pointed out here.
...
It would be bizarre for is_close(a, a) to return False (or worse,
raise an exception!) for any finite number. NANs, of course, are
allowed to be bizarre. Zero is not :-)
Definitely agreed.
...
...
Instead, my point was that if the
user is asking "is this close to 0?" instead of "is this exactly equal
to zero?" then they probably are expecting that there exist some
inputs for which those two questions give different answers. Saying
"well TECHNICALLY this is a valid definition of 'close to'" is
certainly true but somewhat unkind.
I agree, but I think this is a symptom of essential complexity in the
problem domain. Ultimately, "is close" is ill-defined, and *somebody*
has to make the decision what that will be, and that decision won't
satisfy everyone always. We can reduce the complexity in one place:
* provide sensible default values that work for expected != 0
but only by increasing the complexity elsewhere:
* when expected == 0 the intuition that is_close is different
     from exact equality fails
We can get rid of that complexity, but only by adding it back somewhere
else:
Agreed. The essential complexity here may not seem that complex to
specialists, but I can assure you that it is for at least this user
:-)

Overall, I think it would be better to simplify the proposed function
in order to have it better suit the expectations of its intended
audience, rather than trying to dump too much functionality in it on
the grounds of making it "general". If there's one clear lesson from
this thread, it's that floating point closeness can mean a lot of
things to people - and overloading one function with all of those
meanings doesn't seem like a good way of having a clean design.

Paul

Re: [Python-ideas] PEP 485: A Function for testing approximate equality

Paul Moore