On Sun, Jan 25, 2015 at 7:32 AM, Nathaniel Smith <njs@pobox.com> wrote:
> If you have a tolerance that you use only when expected is zero (or when
> either is...) then you have the odd result that a small number will be
> "close" to zero, but NOT close to a smaller number.
 
> And you get the odd result:
>
> In [9]: is_close_to(1e-9, 0.0)
> Out[9]: True
>
> fine -- the default zero_tol is 1e-8
>
> In [10]: is_close_to(1e-9, 1e-12)
> Out[10]: False
>
> but huh??? 1e-9 is close to zero, but not close to 1e-12????

Yes that's.... the idea? :-)

If someone says that their expected value is exactly zero, then using
relative tolerance just makes no sense at all. If they wanted an exact
test they'd have written ==. And this is reasonable, because even if
you know that the exact answer is zero, then you can't expect to get
that with floating point -- +/-1e-16 or so is often  the best you can
hope for.

sure -- that's why I (and numpy and Steven's statistics test function) put in an absolute tolerance as well. If you know you testing near near, then you set an abs_tolernace that define what "near zero" or "small" mean in this case.

But if someone says their expected value is 1e-12, then... well, it's
possible that they'd be happy to instead get 0. But likely not. 0 is
extremely far from 1e-12 in relative terms,

And 1e-12 from zero also, of course. Which is the trick here. Even with an asymmetric test, 0.0 is not relatively close to anything, and nothing is relatively close to zero (as long as the relative tolerance is less than 1 -- which it really should be. So I think we should use the zero_tolerance option if either input is zero, but then we get these continuities.

So It seems, if a user wants to use the same parameters to test a bunch of numbers, and some of them may be zero, that they should define what small is to them by setting an abs_tolerance.

Though I guess I'd rather a zero_tol that defaulted to non-zero that an abs_tol that did. So we might be able to satisfy your observation that a lot of use cases call for testing against zero.
 
 The example that came up in the numpy
discussion of these defaults is that statsmodels has lots of tests to
make sure that their computations of tail value probabilities are
correct. These are often tiny (e.g., P(being 6 standard deviations off
from the mean of a normal) = 9.9e-10), but emphatically different from
zero. So it's definitely safer all around to stick to relative error
by default for non-zero expected values.

But would you even need to test for zero then in that case? And if so, wouldn't setting abs_tol to what you wanted for "very small" be the right thing to do? I note that Steven's testing code the the stdlib statistics library used a rel_tolerance and abs_tolerance approach as well. I haven't seen any example of special casing zero anywhere.
 
Admittedly I am leaning pretty heavily on the "testing" use case here,
but that's because AFAICT that's the overwhelming use case for this
kind of functionality.

I agree that it is as well -- sure you could use it for a simple recursive solution to an implicit equation, but how may people whip those up, compared to either testing code or writing a custom comparison designed specifically for the case at hand.

> I'd much rather require people to have to think about what makes sense for
> their use case than get trapped by a default that's totally inappropriate.

But this seems a strange reason to advocate for a default that's
totally inappropriate. is_close_to(x, 0.0) simply doesn't mean
anything sensible in the current PEP -- even giving an error would be
better.

Sure it does -- it means nothing is relatively close to zero -- haven't we all agreed that that's the mathematically correct result? And if you write a test against zero it will reliably fail first time if you haven't set an abs_tolerance. So you will then be forces to decide what "near zero" means to you, and set an appropriate abs_tolerance.

I think this points to having a separate function for absolute tolerance compared to zero -- but that's just abs(val) > zero_tolerance, so why bother?

Or do you think there are common use cases where you would want purely relative tolerance, down to very close to zero, but want a larger tolerance for zero itself, all in the same comprehension?


-Chris


--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker@noaa.gov