[Python-ideas] PEP 485: A Function for testing approximate equality

Paul Moore p.f.moore at gmail.com
Mon Jan 26 08:07:25 CET 2015


On 26 January 2015 at 05:54, Steven D'Aprano <steve at pearwood.info> wrote:

> I really don't think that setting one or two error tolerances is
> "floating point arcana".

The hundreds of messages on this topic would tend to imply otherwise :-(

And to many users (including me, apparently - I expected the first one
to give False), the following is "floating point arcana":

>>> 0.1*10 == 1.0
True
>>> sum([0.1]*10)
0.9999999999999999
>>> sum([0.1]*10) == 1
False

> I don't think that having to explicitly decide
> on what counts as "close" (as either an absolute difference or a
> relative difference) is especially onerous: surely anyone writing code
> will be able to cope with one or two decisions:
>
> - close enough means they differ by no more than X
>
> - close enough means they differ by no more than X%, expressed
>   as a fraction

This does seem relatively straightforward, though. Although in the
second case you glossed over the question of X% of *what* which is the
root of the "comparison to zero" question, and is precisely where the
discussion explodes into complexity that I can't follow, so maybe
that's precisely the bit of "floating point arcana" that the naive
user doesn't catch on to. I'm not saying that you are being naive,
rather that readers of the docs (and hence users of the function) will
be, and will find it confusing for precisely this reason.

> I'm almost inclined to not set any defaults, except perhaps zero for
> both (in which case "close to" cleanly degrades down to "exactly equal"
> except slower) and force the user to explicitly choose a value.

But the function (in the default case) then won't mean "close to" (at
least not in any sense that people will expect). Maybe making it
mandatory to specify one or the other parameter, and making them
keyword-only parameters, would be sufficiently explicit. But see
below).

> Arguments in favour of setting some defaults:
>
> - People who want a "zero-thought" solution will get one, even
>   if it does the wrong thing for their specific application, but
>   at least they didn't have to think about it.
>
> - The defaults might occasionally be appropriate.
>
>
> Arguments against:
>
> - There is no default error tolerance we can pick, whether relative
>   or absolute, which will suit everyone all the time. Unless the
>   defaults are appropriate (say) 50% of the time or more, they will
>   just be an attractive nuisance (see zero-thought above).

I'm not sure what you're saying here - by "not setting defaults" do
you mean making it mandatory for the user to supply a tolerance, as I
suggested above?

> I really think that having three tolerances, once of which is nearly
> always ignored, is poor API design. The user usually knows when they are
> comparing against an expected value of zero and can set an absolute
> error tolerance.

Agreed.

> How about this?
>
> - Absolute tolerance defaults to zero (which is equivalent to
>   exact equality).
>
> - Relative tolerance defaults to something (possibly zero) to be
>   determined after sufficient bike-shedding.

Starting the bike-shedding now, -1 on zero. Having is_close default to
something that most users won't think of as behaving like their naive
expectation of "is close" (as opposed to "equals") would be confusing.

> - An argument for setting both values to zero by default is that
>   it will make it easy to choose one of "absolute or relative". You
>   just supply a value for the one that you want, and let the other
>   take the default of zero.

Just make it illegal to set both. What happens when you have both set
is another one of the things that triggers discussions that make my
head explode. Setting just one implies the other is zero, setting
neither implies whatever default is agreed on.

> - At the moment, I'm punting on the behaviour when both abs and rel
>   tolerances are provided. That can be bike-shedded later.

Don't allow it, it's too confusing for the target audience.

> Setting both defaults to zero means that the zero-thought version:
>
>     if is_close(x, y): ...
>
> will silently degrade to x == y, which is no worse than what people
> do now (except slower). We can raise a warning in that case.

It is worse, because it no longer says what it means.

> The only tricky situation might be if you *may* be comparing against
> zero, but don't know so in advance.

This can probably be handled by sufficiently good documentation. Once
it was clear to me that this was an asymmetric operation, and that you
were comparing whether X is close to a known value Y, I stopped
finding the requirement that you know what Y is to make sense of the
function odd.

Having said that, I don't think the name "is_close" makes the
asymmetry clear enough. Maybe "is_close_to" would work better (there's
still room for bikeshedding over which of the 2 arguments is implied
as the "expected" value in that case)?

>> My claim wasn't that is_close_to(x, 0.0) provides a mathematically
>> ill-defined result. I agree that that's a reasonable definition of
>> "relatively close to" (though one could make an argument that zero is
>> not relatively close to itself -- after all, abs(actual -
>> expected)/expected is ill-defined).
>
> Don't write it that way. Write it this way:
>
> abs(actual - expected) <= relative_tolerance*expected
>
> Now if expected is zero, the condition is true if and only if
> actual==expected.

I would call out this edge case explicitly in the documentation. It's
a straightforward consequence of the definition, but it *will* be
surprising for many users. Personally, I'd not thought of the
implication till it was pointed out here.

> It would be bizarre for is_close(a, a) to return False (or worse,
> raise an exception!) for any finite number. NANs, of course, are
> allowed to be bizarre. Zero is not :-)

Definitely agreed.

>> Instead, my point was that if the
>> user is asking "is this close to 0?" instead of "is this exactly equal
>> to zero?" then they probably are expecting that there exist some
>> inputs for which those two questions give different answers. Saying
>> "well TECHNICALLY this is a valid definition of 'close to'" is
>> certainly true but somewhat unkind.
>
> I agree, but I think this is a symptom of essential complexity in the
> problem domain. Ultimately, "is close" is ill-defined, and *somebody*
> has to make the decision what that will be, and that decision won't
> satisfy everyone always. We can reduce the complexity in one place:
>
>     * provide sensible default values that work for expected != 0
>
> but only by increasing the complexity elsewhere:
>
>    * when expected == 0 the intuition that is_close is different
>      from exact equality fails
>
> We can get rid of that complexity, but only by adding it back somewhere
> else:

Agreed. The essential complexity here may not seem that complex to
specialists, but I can assure you that it is for at least this user
:-)

Overall, I think it would be better to simplify the proposed function
in order to have it better suit the expectations of its intended
audience, rather than trying to dump too much functionality in it on
the grounds of making it "general". If there's one clear lesson from
this thread, it's that floating point closeness can mean a lot of
things to people - and overloading one function with all of those
meanings doesn't seem like a good way of having a clean design.

Paul


More information about the Python-ideas mailing list