How to get "correct" values for unit tests

Hi, while working on the tests for the new scipy.interpolate.{Smooth,LSQ}SphereBiavariateSpline classes, I'm wondering how to come up with sensible TRUE example values to test against. In the case mentioned (see https://github.com/scipy/scipy/pull/192), I simply wrapped a routine (sphere.f) from FITPACK. So I could write a direct FORTRAN program using sphere.f to calculate some "TRUE" values. However, that would just check that the wrapping actually works. Is this considered enough? Ultimately, I would like a test to assure that the results are correct. But for that, wouldn't it be "better" (whatever that means) to use a different library to calculate the TRUE results? Sorry, this might be a confusing email. Cheers, Andreas.

On Wed, May 30, 2012 at 12:46 PM, Andreas Hilboll <lists@hilboll.de> wrote:
Hi,
while working on the tests for the new scipy.interpolate.{Smooth,LSQ}SphereBiavariateSpline classes, I'm wondering how to come up with sensible TRUE example values to test against.
In the case mentioned (see https://github.com/scipy/scipy/pull/192), I simply wrapped a routine (sphere.f) from FITPACK. So I could write a direct FORTRAN program using sphere.f to calculate some "TRUE" values. However, that would just check that the wrapping actually works.
Is this considered enough? Ultimately, I would like a test to assure that the results are correct. But for that, wouldn't it be "better" (whatever that means) to use a different library to calculate the TRUE results?
It's better to verify against results from an outside library, but it's not always possible to find exactly the same algorithm. In that case, all we can test is whether the numbers are approximately (wil low precision) the same. Many of the scipy.stats function are now and most of statsmodels models are verified against R (or other packages). (lowess is no identical to R up to 6 decimals or so.) In some cases it's possible to verify against a theoretical and hand calculated example, but I guess not in your case. Josef
Sorry, this might be a confusing email.
Cheers, Andreas. _______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev

30.05.2012 18:46, Andreas Hilboll kirjoitti:
while working on the tests for the new scipy.interpolate.{Smooth,LSQ}SphereBiavariateSpline classes, I'm wondering how to come up with sensible TRUE example values to test against.
In the case mentioned (see https://github.com/scipy/scipy/pull/192), I simply wrapped a routine (sphere.f) from FITPACK. So I could write a direct FORTRAN program using sphere.f to calculate some "TRUE" values. However, that would just check that the wrapping actually works.
Is this considered enough? Ultimately, I would like a test to assure that the results are correct. But for that, wouldn't it be "better" (whatever that means) to use a different library to calculate the TRUE results?
I think an useful philosophy for the tests should be about ensuring that the code, as a whole, does what is promised. (Not all decades-old Fortran code is reliable...) So, more in the direction of functional tests than unit tests, and this works also as a QA step... Testing interpolation is a bit more difficult to do than for other types of code, since what is a "good" result is more fuzzily defined there, and the "correct" results are not fully well-defined. If I had to manually verify that the interpolation on a sphere works, what I'd try at first would be: generate a random dataset (with fixed random seed) and check (plot) that the result looks reasonable: - interpolant at points maps to original data values - continuity across "edges" of the sphere - checks for the flat derivative options - that the interpolant is "nice" in some sense The first three can be converted to assert_allclose style tests with some amount of work. The last relies on the eyeball-norm, but I could just pick a few data points out from a plot I think is reasonable, and write a small test that checks against those (as a statement that someone actually looked at the output). I'd guess the above would also catch essentially all possible problems in the wrapping. IMO testing just the wrapper is not very useful --- the above sort of tests are not much more difficult to write, and should catch a wider range of problems. Pauli
participants (3)
-
Andreas Hilboll
-
josef.pktd@gmail.com
-
Pauli Virtanen