[Python-ideas] PEP 485: A Function for testing approximate equality

Sun Jan 25 16:32:11 CET 2015

On Sat, Jan 24, 2015 at 7:59 PM, Chris Barker <chris.barker at noaa.gov> wrote:
>   >> One option would be to add a zero_tol argument, which is an absolute
>>
>> >> tolerance that is only applied if expected == 0.
>
>
> OK -- now I know what the problem is here -- I thought I"d explored it
> already.
>
> If you have a tolerance that you use only when expected is zero (or when
> either is...) then you have teh odd reslut that a samll number will be
> "close" to zero, but NOT close to a smaller number. I implemented this on a
> branch in github:
>
> https://github.com/PythonCHB/close_pep/tree/zero_tol
>
> And you get the odd result:
>
> In [9]: is_close_to(1e-9, 0.0)
> Out[9]: True
>
> fine -- the default zero_tol is 1e-8
>
> In [10]: is_close_to(1e-9, 1e-12)
> Out[10]: False
>
> but huh??? 1e-9 is close to zero, but not close to 1e-12????

Yes that's.... the idea? :-)

If someone says that their expected value is exactly zero, then using
relative tolerance just makes no sense at all. If they wanted an exact
test they'd have written ==. And this is reasonable, because even if
you know that the exact answer is zero, then you can't expect to get
that with floating point -- +/-1e-16 or so is often  the best you can
hope for.

But if someone says their expected value is 1e-12, then... well, it's
possible that they'd be happy to instead get 0. But likely not. 0 is
extremely far from 1e-12 in relative terms, and can easily cause
qualitatively different behaviour downstream (e.g. log10(1e-12) ==
-12, log10(0) == error). The example that came up in the numpy
discussion of these defaults is that statsmodels has lots of tests to
make sure that their computations of tail value probabilities are
correct. These are often tiny (e.g., P(being 6 standard deviations off
from the mean of a normal) = 9.9e-10), but emphatically different from
zero. So it's definitely safer all around to stick to relative error
by default for non-zero expected values.

Admittedly I am leaning pretty heavily on the "testing" use case here,
but that's because AFAICT that's the overwhelming use case for this
kind of functionality. Guido's optimization example is fine, but using
a function like this isn't really the most obvious way to do
optimization termination (as evidenced by the fact that AFAICT none of
scipy's optimizers actually use use a rel_tol+abs_tol comparison on
two values -- maybe they should?). And I don't understand Skip's
example at all. (I get that you need to quantize the prices and you
want to minimize error in doing this, but I don't understand why it
matters whether you're within 1% of a breakpoint versus 40% of a
breakpoint -- either way you're going to have to round.)

> I'd much rather require people to have to think about what makes sense for
> their use case than get trapped by a default that's totally inappropriate.

But this seems a strange reason to advocate for a default that's
totally inappropriate. is_close_to(x, 0.0) simply doesn't mean
anything sensible in the current PEP -- even giving an error would be
better.

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org