On Thu, Jan 15, 2015 at 10:28 PM, Stephen J. Turnbull <stephen@xemacs.org> wrote:

Neil Girdhar writes:

> The symmetric error that people are proposing in this thread has no
> intuitive meaning to me.

There are many applications where the goal is to match two values,
neither of which is the obvious standard (eg, statistical tests
comparing populations,

No, if you're trying to answer the question whether two things belong to the same population as opposed to another, you should infer the population statistics based on a and b and a your estimated overall population statistics and then calculate cross entropies. Using some symmetric cross relative error has no meaning.

or even electrical circuits, where it may be
important that two components be matched to within 1%, although the
absolute value might be allowed to vary by up to 10%). Symmetric
error is appropriate for those applications. Symmetric error may be
less appropriate for applications where you want to hit an absolute
value, but it's (provably) not too bad.

By "provably not too bad" I mean that if you take the word "close" as
a qualitative predicate, then although you can make the "distance"
explode by taking the "actual" to be an order of magnitude distant in
absolute units, you'll still judge it "not close" (just more so, but
"more so" is meaningless in this qualitative context). On the other
hand, for values that *are* close (with reasonable tolerances) it
doesn't much matter which value you choose as the standard: "most" of
the time you will get the "right" answer (and as the tolerance gets
tighter, "most" tends to a limit of 100%).

In statistics and machine learning at least many people have argued that the cross entropy error is the most reasonable loss function. When you have an observed value and estimated value, the right way of comparing them is a cross entropy error, and that's what absolute error and relative error are doing. They correspond to cross entropies of the minimum assumptive distributions over the reals and positive reals.

I think the numpy.allclose function almost always gives you what you want when you have an actual and an estimated value, which is the more usual case.

The generic "are_close()" function should be symmetric. I suppose it
might also to useful to have an "is_close_to()" function that is
asymmetric.

I disagree. Since the usual case is to have an observed and estimated value, then the close function should not be symmetric. Either you should have two functions: relative error and absolute error, or you should combine them like numpy did.

Best,

Neil