[Python-ideas] PEP 485: A Function for testing approximate equality
steve at pearwood.info
Tue Jan 27 04:08:00 CET 2015
On Mon, Jan 26, 2015 at 07:43:20AM +0000, Paul Moore wrote:
> On 26 January 2015 at 06:39, Steven D'Aprano <steve at pearwood.info> wrote:
> > I suggest that in the interest of not flooding everyone's inboxes, we
> > take that off-list until we have either a concensus or at least
> > agreement that we cannot reach concensus.
> Does it need to go off-list?
And now you know why there are hundreds of messages in this thread ;-)
No, it doesn't need to go off-list, but I'm suffering badly from email
fatigue, not just because of this thread but it is one of the major
causes, and I'm sure I'm not the only one.
> I'm still unclear about the arguments
> over asymmetric vs symmetric (I suspect, as you alluded to earlier,
> that they reflect a more fundamental problem, which is that there are
> 2 different types of use case with different constraints) so I'd like
> to at least be aware of the content of any discussion...
Symmetry and asymmetry of "close to" is a side-effect of the way you
calculate the fuzzy comparison. In real life, "close to" is always
symmetric because distance is the same whether you measure from A to B
or from B to A. The distance between two numbers is their difference,
which is another way of saying the error between them:
delta = abs(x - y)
(delta being the traditional name for this quantity in mathematics), and
obviously delta doesn't depend on the order of x and y.
But if we express that difference as a fraction of some base value, i.e.
as a relative error, the result depends on which base value you choose:
delta/x != delta/y
so suddenly we introduce an asymmetry which doesn't reflect any physical
difference. The error between x and y is the same whichever way you
measure, but that error might be 10% of x and 12.5% of y (say).
What *fundamentally* matters is the actual error, delta. But to decide
whether any specific value for delta is too much or not, you need to
pick a maximum acceptable delta, and that depends on context: a maximum
acceptable delta of 0.0001 is probably too big if your x and y are
around a billionth, and way too small if they are around a billion.
Hence we often prefer to work with relative tolerances ("give or take
1%") since that automatically scales with the size of x and y, but that
introduces an asymmetry.
Asymmetry is bad, because it is rather surprising and counter-intuitive
that "x is close to y", but "y is not close to x". It may also be bad in
a practical sense, because people will forget which order they need to
give x and y and will give them in the wrong order. I started off with
an approx_equal function in test_statistics that was symmetric, and I
could never remember which way the arguments went.
(We can mitigate against the practical failure with explicit argument
names "actual" and "expected" instead of generic ones. But who wants to
be using keyword arguments for this all the time?)
Example: suppose the user supplies a relative tolerance of 0.01 ("plus
or minus one percent"), with x=100.0 and y=99.0. Then delta = 1.0. Is
that close? If we use x as the base:
1 <= 0.01*100
returns True, but if we use y as the base:
1 <= 0.01*99
Instead, Bruce Dawson recommends using the larger of x and y:
To compare f1 and f2 calculate diff = fabs(f1-f2). If diff is
smaller than n% of max(abs(f1),abs(f2)) then f1 and f2 can be
This is especially appropriate when you just want to know whether x and
y differ from each other by an acceptibly small amount, without
specifying which is the "true" value and which the "true value plus or
minus some error".
Other alternatives would be to take the smaller, or the average, of x
Time permitting, over the next day or so I'll draw up some diagrams to
show how each of these tactics change what counts as close or not close.
More information about the Python-ideas