[SciPy-User] peer review of scientific software

Wed Jun 5 18:08:10 EDT 2013

On Wed, Jun 5, 2013 at 10:36 PM, Matt Newville
<newville at cars.uchicago.edu> wrote:
> The paper that Alan Isaac referred to that started this conversation
> seemed to advocate for unit testing in the sense of "don't trust the
> codes you're using, always test them".  At first reading, this seems
> like good advice. Since unit testing (or, at least, the phrase) is
> relatively new for software development, it gives the appearance of
> being new advice.  But the authors damage their case by continuing on
> by saying not to trust analysis tools built by other scientists based
> on the reputation and prior use of thse tools.  Here, they expose the
> weakness of favoring "unit tests" over "functional tests".  They are
> essentially advocating throwing out decades of proven, tested work
> (and claiming that the use of this work to date is not justified, as
> it derives from un-due reputation of the authors of prior work) for a
> fashionable new trend.  Science is deliberately conservative, and
> telling scientists that unit testing is all the rage among the cool
> programmers and they should jump on that bandwagon is not likely to
> gain much traction.

But... have you ever sat down and written tests for a piece of widely
used academic software? (Not LAPACK, but some random large package
that's widely used within a field but doesn't have a comprehensive
test suite of its own.) Everyone I've heard of who's done this
discovers bugs all over the place. Would you personally trip over them
if you didn't test the code? Who knows, maybe not. And probably most
of the rest -- off by one errors here and there, maybe an incorrect
normalizing constant, etc., -- end up not mattering too much. Or maybe
they do. How could you even tell?

You should absolutely check scipy.optimize.leastsq before using it!
You could rewrite it too if you want, I guess, and if you write a
thorough test suite it might even work out. But it's pretty bizarre to
me to think that someone is going to think "ah-hah, writing my own
code + test suite will be easier than just writing a test suite!" Sure
some people are going to find ways to procrastinate on the real
problem (*cough*grad students*cough*) and NIH ain't just a funding
body. But that's totally orthogonal to whether tests are good.

Honestly I'm not even sure what unit-testing "bandwagon" you're
talking about. I insist on unit tests for my code because every time I
fail to write them I regret it sooner or later, and I'd rather it be
sooner. And because they pay themselves back ridiculously quickly
because you never have to debug more than 15 lines of code at a time,
you always know that everything the current 15 lines of code depends
on is working correctly.

Plus, white-box unit-testing can be comprehensive in a way that
black-box functional testing just can't be. The code paths in a system
grow like 2**n; you can reasonably test all of them for a short
function with n < 5, but not for a whole system with n >> 100. And
white-box unit-testing is what lets you move quickly when programming,
because you can quickly isolate errors instead of spending all your
time tracing through stuff in a debugger. If you want to *know* your
code is correct, this kind of thorough testing is just a necessary
(not sufficient!) condition. (Building on libraries that have large
user bases is also very helpful!)

-n