TL;DR -- I can live with, and would indeed be happy with, either a symmetric or asymmetric test -- because most of the use-cases it just doesn't matter.

But we have to pick one, so if you're interested -- read on ---


On Mon, Jan 26, 2015 at 7:08 PM, Steven D'Aprano <steve@pearwood.info> wrote:
> I'm still unclear about the arguments
> over asymmetric vs symmetric (I suspect, as you alluded to earlier,
> that they reflect a more fundamental problem, which is that there are
> 2 different types of use case with different constraints) so I'd like
> to at least be aware of the content of any discussion...

Actually, I think this is exactly true:

If you ask the question: "are these to values close to each-other?" then the symmetric test makes the most sense -- obviously if a is close to b then b is close to a.

whereas:

If you ask the question: Is this value  within a defined relative difference of an expected value? (i.e 10% of b) then you want an asymmetric test -- you clearly are defining "relative" to the known value.

However, and I've tried to say this multiple times -- and no one disagreed (or agreed) yet:

I don't care much which we use, because

IT DOES NOT MATTER in most cases which you choose.

Why is that?

The most common use cases for this kind of thing are a general check of "is my computed number in the right ballpark?" And I've never seen anyone define "ballpark" to a high degree of precision -- usually you are choosing between say 1e-8 and 1e-9 (more of less 8 or 9 significant figures) This is why the 10% example we keep throwing around is a bit deceiving -- it makes the asymmetry seem far more important than it is.

Remember that we are talking about:

abs(a-b) <= tol*abs(b)
vs
abs(a-b) <= tol*abs(a)


tol*abs(something) defines the absolute difference that can be tolerated. The difference between the two methods is tol*abs(a-b).  In the  "is 9 withing 10% of 10" example, that's the difference between "tolerating" .9 or 1 as a difference -- seems pretty significant. But if you have a more realistic tolerance, like 1e-8, then you are talking about a difference in absolute tolerance of around 1e-8 -- tiny. So you'll still get:
9.9999999 is close to 10, but 10 is not close to 9.9999999, but if you tack on even an extra 1e8 on there, you get it close both ways:

In [45]: is_close_to(10, 9.99999991)
Out[45]: True

In [46]: is_close_to(9.99999991, 10)
Out[46]: True

Same if you go down a bit:
In [47]: is_close_to(9.9999998, 10)
testing: 9.9999998 10
Out[47]: False

In [48]: is_close_to(10, 9.9999998)
testing: 10 9.9999998
Out[48]: False

So there is this tiny range of values for which is it asymmetric.

Yes, it is still asymmetric, but remember that the usual use case is someone choosing between a rel_tolerance of 1e-8 or 1e-9, not 1e-8 or 1.00000001e-8, so within the precision of the specified tolerance -- they are the same.

OK -- but we need to choose one (or set a flag for selecting one -- but the point of this is to have something people can just use)

So -- there are some use cases where people may want to be testing against a specific value -- is the measured resistance within 1% of the nominal value? (is anyone ever going to write resistor testing code in Python???). In this case, they really want the symmetric test, and there is no way to simulate it with an asymmetric test.

Granted, I think the use-case is rare that it would matter, but what is very common is testing a computed value against an expected value -- so the asymmetric case makes more sense there, too, or is at least easier to explain.

So what I haven't seen yet is an example use case where you really need the symmetric case -- i.e. it matters that is_close(a,b) is guaranteed to be the same as is_close(b,a).

Does anyone have a use-case??

Note: I took a look at the tests for the Statistics module -- as far as I could tell, all but one were  comparing a computed value to an expected one -- in fact, I even see:

self.assertApproxEqual(actual, expected)

Happens to use the same names I used for the parameters ;-)

The one exception is testing against math.fsum, where it's testing if two different implementations get (almost) the same result -- that arguable want's a symmetric test, though I can't imagine it would make a real difference: rel_tol set to 1e-16 (by the way, a _very_ small tolerance for a python float! -- this may be where an ULPS check would make sense)

And I don't see a tolerance ever specified with more than one significant figure.

And I see a lot of 1e-8 (though not all, but any means), so maybe that's a good default.


Asymmetry is bad, because it is rather surprising and counter-intuitive
that "x is close to y", but "y is not close to x". It may also be bad in
a practical sense, because people will forget which order they need to
give x and y and will give them in the wrong order. I started off with
an approx_equal function in test_statistics that was symmetric, and I
could never remember which way the arguments went.

My point is that it very rarely matters which order you give them in anyway. So I agree that asymmetry is esthetically "bad", but I'm still looking for a practical example where it matters -- again, for the fairly casual user.

Instead, Bruce Dawson recommends using the larger of x and y:

https://randomascii.wordpress.com/2012/02/25/comparing-floating-point-numbers-2012-edition/

Quote:

    To compare f1 and f2 calculate diff = fabs(f1-f2). If diff is
    smaller than n% of max(abs(f1),abs(f2)) then f1 and f2 can be
    considered equal.

Sure -- but he then jumps right to the whole ULPS thing -- having not explained why that particular definition is best -- in fact, I'd probably go with:

n% of min(abs(f1),abs(f2)) -- it's a bit more rigorous -- this is the Boost "strong" test.

But again, these are really subtle differences in results, and if you know your allowed error that accurately, you probably should be doing the ULPS thing anyway.

In fact, the only use cases I can imagine, or anyone has brought up, for using a tolerance as high as 1% or 10% is for the case when you are testing against a known value, and the asymmetric case makes more sense.
 
Time permitting, over the next day or so I'll draw up some diagrams to
show how each of these tactics change what counts as close or not close.

I'm not sure we need a whole lot more explanation (maybe some folks do). But I think we do need one of either:

Uses cases for when it's important to have a symmetric test.

and/or

Pronouncements (from anyone) that s/he "can't live with" one or the other

Again -- "can't live with" means you think it's better to have nothing in the std lib.

I took the time to write the PEP, and I'd like to see this through -- but we need to pick something -- and any of the three options on the table are fine with me.

Three options:
 - The asymmetric test in the PEP
 - The Boost "strong" test (max rel error)
 - The Boost "weak" test (min rel error)

-Chris


--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker@noaa.gov