
On Sat, Jan 24, 2015 at 7:59 PM, Chris Barker <chris.barker@noaa.gov> wrote:
One option would be to add a zero_tol argument, which is an absolute
tolerance that is only applied if expected == 0.
OK -- now I know what the problem is here -- I thought I"d explored it already.
If you have a tolerance that you use only when expected is zero (or when either is...) then you have teh odd reslut that a samll number will be "close" to zero, but NOT close to a smaller number. I implemented this on a branch in github:
https://github.com/PythonCHB/close_pep/tree/zero_tol
And you get the odd result:
In [9]: is_close_to(1e-9, 0.0) Out[9]: True
fine -- the default zero_tol is 1e-8
In [10]: is_close_to(1e-9, 1e-12) Out[10]: False
but huh??? 1e-9 is close to zero, but not close to 1e-12????
Yes that's.... the idea? :-) If someone says that their expected value is exactly zero, then using relative tolerance just makes no sense at all. If they wanted an exact test they'd have written ==. And this is reasonable, because even if you know that the exact answer is zero, then you can't expect to get that with floating point -- +/-1e-16 or so is often the best you can hope for. But if someone says their expected value is 1e-12, then... well, it's possible that they'd be happy to instead get 0. But likely not. 0 is extremely far from 1e-12 in relative terms, and can easily cause qualitatively different behaviour downstream (e.g. log10(1e-12) == -12, log10(0) == error). The example that came up in the numpy discussion of these defaults is that statsmodels has lots of tests to make sure that their computations of tail value probabilities are correct. These are often tiny (e.g., P(being 6 standard deviations off from the mean of a normal) = 9.9e-10), but emphatically different from zero. So it's definitely safer all around to stick to relative error by default for non-zero expected values. Admittedly I am leaning pretty heavily on the "testing" use case here, but that's because AFAICT that's the overwhelming use case for this kind of functionality. Guido's optimization example is fine, but using a function like this isn't really the most obvious way to do optimization termination (as evidenced by the fact that AFAICT none of scipy's optimizers actually use use a rel_tol+abs_tol comparison on two values -- maybe they should?). And I don't understand Skip's example at all. (I get that you need to quantize the prices and you want to minimize error in doing this, but I don't understand why it matters whether you're within 1% of a breakpoint versus 40% of a breakpoint -- either way you're going to have to round.)
I'd much rather require people to have to think about what makes sense for their use case than get trapped by a default that's totally inappropriate.
But this seems a strange reason to advocate for a default that's totally inappropriate. is_close_to(x, 0.0) simply doesn't mean anything sensible in the current PEP -- even giving an error would be better. -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org