
On Fri, Jan 23, 2015 at 12:40 AM, Chris Barker <chris.barker@noaa.gov> wrote:
Existing Implementations ------------------------
The standard library includes the ``unittest.TestCase.assertAlmostEqual`` method, but it:
* Is buried in the unittest.TestCase class
* Is an assertion, so you can't use it as a general test (easily)
* Uses number of decimal digits or an absolute delta, which are particular use cases that don't provide a general relative error.
I might phrase this a bit more strongly -- assertAlmostEqual is confusing and broken-by-default for common cases like comparing two small values, or comparing two large values.
The numpy package has the ``allclose()`` and ``isclose()`` functions.
The statistics package tests include an implementation, used for its unit tests.
One can also find discussion and sample implementations on Stack Overflow, and other help sites.
These existing implementations indicate that this is a common need, and not trivial to write oneself, making it a candidate for the standard library.
Proposed Implementation =======================
NOTE: this PEP is the result of an extended discussion on the python-ideas list [1]_.
The new function will have the following signature::
is_close_to(actual, expected, tol=1e-8, abs_tol=0.0)
``actual``: is the value that has been computed, measured, etc.
``expected``: is the "known" value.
``tol``: is the relative tolerance -- it is the amount of error allowed, relative to the magnitude of the expected value.
``abs_tol``: is an minimum absolute tolerance level -- useful for comparisons near zero.
Modulo error checking, etc, the function will return the result of::
abs(expected-actual) <= max(tol*expected, abs_tol)
So for reference, it looks like the differences from numpy are: 1) kwarg names: "tol" and "abs_tol" versus "atol", "rtol". Numpy's names seem fine to me, but if you want the longer ones then probably "rel_tol", "abs_tol" would be better? 2) use of max() instead of + to combine the relative and absolute tolerance. I understand that you find the + conceptually offensive, but I'm not really sure why -- max() is maybe a bit better, but it seems like much of a muchness to me in practice. (Sure, like you say further down, the total error using + might end up being higher by a factor of two or so -- but either people are specifying the tolerances they want, in which case they can say what they mean either way, or else they're just accepting the defaults, in which case they don't care.) It might be worth switching to + just for compatibility. 3) The default tolerances. Numpy is inconsistent with itself on this point though (allclose vs. assert_allclose), so I wouldn't worry about it too much :-). However, a lot of the benefit of numpy.allclose is that it will do something mostly-reasonable out-of-the-box even if the users haven't thought things through at all. 99% of the benefit of having something like this available is that it makes it easy to write tests, and 99% of the benefit of a test is that it exists and makes sure that your values are not wildly incorrect. So that's nice. BUT if you want that kind of out-of-the-box utility then you need to have some kind of sensible default for comparisons to zero. (I just did a quick look at uses of python code uses of assertAlmostEqual on github, and in my unscientific survey of reading the first page of results, 30.4% of the calls were comparisons against zero. IMO asking all these people to specify tolerances by hand on every call is not very nice.) One option would be to add a zero_tol argument, which is an absolute tolerance that is only applied if expected == 0. [And a nice possible side-effect of this is that numpy could conceivably then add such an argument as well "for compatibility with the stdlib", and possibly use this as a lever to fix it's weird allclose/assert_allclose discrepancy. The main blocker to making them consistent is that there is lots of code in the wild that assumes allclose handles comparisons-to-zeros right, and also lots of code that assumes that assert_allclose is strict with very-small non-zero numbers, and with only rtol and atol you can't get both of these behaviours simultaneously.]
Inappropriate uses ------------------
One use case for floating point comparison is testing the accuracy of a numerical algorithm. However, in this case, the numerical analyst ideally would be doing careful error propagation analysis, and should understand exactly what to test for. It is also likely that ULP (Unit in the Last Place) comparison may be called for. While this function may prove useful in such situations, It is not intended to be used in that way.
I'd strongly consider expanding the scope of this PEP a bit so that it's proposing both a relative/absolute-error-based function *and* a ULP-difference function. There was a plausible-looking one using struct posted in the other thread, it would cover a wider variety of cases, and having both functions next to each other in the docs would provide a good opportunity to explain why the differences and which might be preferred in which situation. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org