On Sun, Jan 25, 2015 at 5:17 PM, Nathaniel Smith <njs@pobox.com> wrote:
> Though I guess I'd rather a zero_tol that defaulted to non-zero that an
> abs_tol that did. So we might be able to satisfy your observation that a lot
> of use cases call for testing against zero.

Yes, that's the idea -- defaulting rel_tol and zero_tol to non-zero
values, and abs_tol to zero, gives you a set of defaults that will
just work for people who want to write useful tests without having to
constantly be distracted by floating point arcana.

OK -- I get it now -- this is really about getting a default for a zero tolerance test that does not mess up the relative test -- that may be a way to go.
 
This does require that zero_tol is only applied for expected == 0.0,
*not* for actual == 0.0, though. If you expected 1e-10 and got 0.0
then this *might* be okay in your particular situation but it really
requires the user to think things through; a generic tool should
definitely flag this by default.

got it -- if they want hat, they can set the abs_tolerance to what they need.

>>  The example that came up in the numpy
>> discussion of these defaults is that statsmodels has lots of tests to
>> make sure that their computations of tail value probabilities are
>> correct. These are often tiny (e.g., P(being 6 standard deviations off
>> from the mean of a normal) = 9.9e-10), but emphatically different from
>> zero. So it's definitely safer all around to stick to relative error
>> by default for non-zero expected values.

Exactly why I don't think abs_tolerance should be anything other than 0.0
 
> But would you even need to test for zero then in that case? And if so,
> wouldn't setting abs_tol to what you wanted for "very small" be the right
> thing to do? I note that Steven's testing code the the stdlib statistics
> library used a rel_tolerance and abs_tolerance approach as well. I haven't
> seen any example of special casing zero anywhere.

Right, this example came up when it was discovered that np.allclose()
has a non-zero abs_tol by default, and that
np.testing.assert_allclose() has a zero abs_tol by default. It's a
terrible and accidental API design, but it turns out that people
really are intentionally use one or the other depending on whether
they expect to be dealing with exact zeros or to be dealing with
small-but-non-zero values.

why didn't they just override the defaults? but whatever.

The whole motivation for zero_tol is to
allow a single set of defaults that satisfies both groups.

OK -- I'm buying it. However, what is a sensible default for zero_tolerance? I agree it's less critical than for abs_tolerance, but what should it be? Can we safely figure that order of magnitude one is most common, and something in the  1e-8 to 1e-14 range makes sense? I suppose that wouldn't be surprising to most folks.

Tests against zero won't necessarily fail -- sometimes rounding errors
do cancel out, and you do get 0.0 instead of 1e-16 or whatever. At
least for some sets of inputs, or until the code gets
perturbed/refactored slightly, etc. That's why I said it might
actually be better to unconditionally fail when expected==0.0 rather
than knowingly perform a misleading operation.

I get it -- seems rare, certainly more rare than the other case, is_close_to passes for small numbers when it really shouldn't. And sure, you could get a pass the first time around, because, indeed you DID get exactly zero -- that should pass. But when you do refactor and introduce a slightly different answer, you'll get a failure then an can figure it out then.

Are you actually proposing that the function should raise an Exception if expected == 0.0 and abs_tolerance is also 0.0? (and i guess zero_tolerance if there is one)

> Or do you think there are common use cases where you would want purely
> relative tolerance, down to very close to zero, but want a larger tolerance
> for zero itself, all in the same comprehension?

inf = float("inf")
for (x, expected) in [
    (inf, inf),
    (100, 1e100),
    (1, 10),
    (0, 1),
    (-1, 0.1),
    (-100, 1e-100),
    (-inf, 0),
    ]:
    assert is_close_to(10 ** x, expected)

I meant a case that wasn't contrived ;-)
 
Though really what I'm arguing is that all in the same userbase people
want relative tolerance down close to zero but a larger tolerance for
zero itself.

Absolutely -- and adding a zero_tolerance may be a way to get everyone useful defaults.

-Chris



--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker@noaa.gov