Re: [Python-ideas] PEP 485: A Function for testing approximate equality

26 Jan 2015

      On Mon, Jan 26, 2015 at 07:43:20AM +0000, Paul Moore wrote:
...
On 26 January 2015 at 06:39, Steven D'Aprano <steve@pearwood.info> wrote:
...
...
I suggest that in the interest of not flooding everyone's inboxes, we
take that off-list until we have either a concensus or at least
agreement that we cannot reach concensus.
Does it need to go off-list?
And now you know why there are hundreds of messages in this thread ;-)

No, it doesn't need to go off-list, but I'm suffering badly from email 
fatigue, not just because of this thread but it is one of the major 
causes, and I'm sure I'm not the only one.
...
I'm still unclear about the arguments
over asymmetric vs symmetric (I suspect, as you alluded to earlier,
that they reflect a more fundamental problem, which is that there are
2 different types of use case with different constraints) so I'd like
to at least be aware of the content of any discussion...
Symmetry and asymmetry of "close to" is a side-effect of the way you 
calculate the fuzzy comparison. In real life, "close to" is always 
symmetric because distance is the same whether you measure from A to B 
or from B to A. The distance between two numbers is their difference, 
which is another way of saying the error between them:

delta = abs(x - y)

(delta being the traditional name for this quantity in mathematics), and 
obviously delta doesn't depend on the order of x and y.

But if we express that difference as a fraction of some base value, i.e. 
as a relative error, the result depends on which base value you choose:

delta/x != delta/y

so suddenly we introduce an asymmetry which doesn't reflect any physical 
difference. The error between x and y is the same whichever way you 
measure, but that error might be 10% of x and 12.5% of y (say).

What *fundamentally* matters is the actual error, delta. But to decide 
whether any specific value for delta is too much or not, you need to 
pick a maximum acceptable delta, and that depends on context: a maximum 
acceptable delta of 0.0001 is probably too big if your x and y are 
around a billionth, and way too small if they are around a billion.

Hence we often prefer to work with relative tolerances ("give or take 
1%") since that automatically scales with the size of x and y, but that 
introduces an asymmetry.

Asymmetry is bad, because it is rather surprising and counter-intuitive 
that "x is close to y", but "y is not close to x". It may also be bad in 
a practical sense, because people will forget which order they need to 
give x and y and will give them in the wrong order. I started off with 
an approx_equal function in test_statistics that was symmetric, and I 
could never remember which way the arguments went.

(We can mitigate against the practical failure with explicit argument 
names "actual" and "expected" instead of generic ones. But who wants to 
be using keyword arguments for this all the time?)

Example: suppose the user supplies a relative tolerance of 0.01 ("plus 
or minus one percent"), with x=100.0 and y=99.0. Then delta = 1.0. Is 
that close? If we use x as the base:

1 <= 0.01*100

returns True, but if we use y as the base:

1 <= 0.01*99

returns False.

Instead, Bruce Dawson recommends using the larger of x and y:

https://randomascii.wordpress.com/2012/02/25/comparing-floating-point-number...

Quote:

    To compare f1 and f2 calculate diff = fabs(f1-f2). If diff is
    smaller than n% of max(abs(f1),abs(f2)) then f1 and f2 can be 
    considered equal.

This is especially appropriate when you just want to know whether x and 
y differ from each other by an acceptibly small amount, without 
specifying which is the "true" value and which the "true value plus or 
minus some error".

Other alternatives would be to take the smaller, or the average, of x 
and y.

Time permitting, over the next day or so I'll draw up some diagrams to 
show how each of these tactics change what counts as close or not close.

-- 
Steve