
On Sun, Jan 25, 2015 at 5:17 PM, Nathaniel Smith <njs@pobox.com> wrote:
Though I guess I'd rather a zero_tol that defaulted to non-zero that an abs_tol that did. So we might be able to satisfy your observation that a lot of use cases call for testing against zero.
Yes, that's the idea -- defaulting rel_tol and zero_tol to non-zero values, and abs_tol to zero, gives you a set of defaults that will just work for people who want to write useful tests without having to constantly be distracted by floating point arcana.
OK -- I get it now -- this is really about getting a default for a zero tolerance test that does not mess up the relative test -- that may be a way to go.
This does require that zero_tol is only applied for expected == 0.0, *not* for actual == 0.0, though. If you expected 1e-10 and got 0.0 then this *might* be okay in your particular situation but it really requires the user to think things through; a generic tool should definitely flag this by default.
got it -- if they want hat, they can set the abs_tolerance to what they need.
The example that came up in the numpy
discussion of these defaults is that statsmodels has lots of tests to make sure that their computations of tail value probabilities are correct. These are often tiny (e.g., P(being 6 standard deviations off from the mean of a normal) = 9.9e-10), but emphatically different from zero. So it's definitely safer all around to stick to relative error by default for non-zero expected values.
Exactly why I don't think abs_tolerance should be anything other than 0.0
But would you even need to test for zero then in that case? And if so, wouldn't setting abs_tol to what you wanted for "very small" be the right thing to do? I note that Steven's testing code the the stdlib statistics library used a rel_tolerance and abs_tolerance approach as well. I haven't seen any example of special casing zero anywhere.
Right, this example came up when it was discovered that np.allclose() has a non-zero abs_tol by default, and that np.testing.assert_allclose() has a zero abs_tol by default. It's a terrible and accidental API design, but it turns out that people really are intentionally use one or the other depending on whether they expect to be dealing with exact zeros or to be dealing with small-but-non-zero values.
why didn't they just override the defaults? but whatever. The whole motivation for zero_tol is to
allow a single set of defaults that satisfies both groups.
OK -- I'm buying it. However, what is a sensible default for zero_tolerance? I agree it's less critical than for abs_tolerance, but what should it be? Can we safely figure that order of magnitude one is most common, and something in the 1e-8 to 1e-14 range makes sense? I suppose that wouldn't be surprising to most folks. Tests against zero won't necessarily fail -- sometimes rounding errors
do cancel out, and you do get 0.0 instead of 1e-16 or whatever. At least for some sets of inputs, or until the code gets perturbed/refactored slightly, etc. That's why I said it might actually be better to unconditionally fail when expected==0.0 rather than knowingly perform a misleading operation.
I get it -- seems rare, certainly more rare than the other case, is_close_to passes for small numbers when it really shouldn't. And sure, you could get a pass the first time around, because, indeed you DID get exactly zero -- that should pass. But when you do refactor and introduce a slightly different answer, you'll get a failure then an can figure it out then. Are you actually proposing that the function should raise an Exception if expected == 0.0 and abs_tolerance is also 0.0? (and i guess zero_tolerance if there is one)
Or do you think there are common use cases where you would want purely
relative tolerance, down to very close to zero, but want a larger tolerance for zero itself, all in the same comprehension?
inf = float("inf") for (x, expected) in [ (inf, inf), (100, 1e100), (1, 10), (0, 1), (-1, 0.1), (-100, 1e-100), (-inf, 0), ]: assert is_close_to(10 ** x, expected)
I meant a case that wasn't contrived ;-)
Though really what I'm arguing is that all in the same userbase people want relative tolerance down close to zero but a larger tolerance for zero itself.
Absolutely -- and adding a zero_tolerance may be a way to get everyone useful defaults. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov