[SciPy-User] peer review of scientific software

Tue May 28 18:16:23 EDT 2013

On Tue, May 28, 2013 at 5:52 PM, John Hassler <hasslerjc at comcast.net> wrote:
>
> On 5/28/2013 4:58 PM, Matt Newville wrote:
>> Hi,
>>
>> As others have said, I find the low average programming skill level
>> among scientists frustrating,  but I also found this article quite
>> frustrating.
>>
>> >From my perspective, the authors main complaint seems to be that there
>> is not enough independent checking of specialized scientific software
>> written by scientists.  They seem particularly unhappy about the
>> tendency to use existing packages written by other scientists based on
>> "trust", "reputation", "previous citations" and without independent
>> checking.  They also say:
>>
>>        A "well-respected" end-user developer will almost certainly have
>> earned that respect
>>        through scientific breakthroughs, perhaps not for their software
>> engineering skills
>>        (although agreement on what constitutes "appropriate" scientific
>> software engineering
>>        standards is still under debate).
>>
>> On this point in particular, and indeed in this whole line of
>> argument, I think the authors are misguided, perhaps even to the point
>> of fatality damaging their whole argument.   I believe much more
>> common case is for the "well-respected" end-user developer to be known
>> for the programs written and supported, and less so for the scientific
>> breakthroughs (unless you count new programs as new instrumentation,
>> and so, well, breakthroughs, but it's pretty clear that the authors
>> are making a distinction).    It's too often the case that spending
>> any significant time on such programs is career suicide, as it takes
>> time and attention away from such breakthroughs.   It's perfectly
>> believable that the programming skills of such a scientific developer
>> may be incomplete, but I think it's fair to say that most supported
>> and well-used programs are likely the effort of people with
>> above-average programming skills and the interest and intent to
>> support such programs.   Indeed, I would argue that instead of being
>> unhappy about the reliance on trusted programs and developers, the
>> authors would better serve the scientific community by arguing that
>> the authors of such programs should be better supported, and given
>> access to tools and resources (ie, fund them) to improve their work
>> rather than treat them as untrustworthy programmers.
>>
>> I should admit to being one such author of a "well-respected" and
>> "trust" package for a very small scientific discipline, and with the
>> proverbial "many citations etc" because of this.  So I would admit to
>> being the just sort of person the authors are unhappy about.  I
>> suspect many people on this mailing list are in the same category.   I
>> would like to think the trust and respect for certain packages have
>> been earned, and that people use such packages because they are "known
>> to work", both in the sense of actually having been tested on
>> idealized cases, and in producing verifiable results in real cases
>> (where "testing" would not always be possible).   Indeed, the small,
>> decentralized group of scientific programmers that I work with (mostly
>> trained as physicists, and learning to program in Fortran -- some of
>> us still use mostly Fortran, in fact) do test and verify such codes,
>> precisely because we know other people use them.   Of course errors
>> occur, and of course testing is important.   Modern techniques like
>> distributed version control and unit testing are very good tools to
>> use.   I agree they should be used more thoroughly, and that one
>> should always be willing to question the results of a computer
>> program.
>>
>> Then again, when was the last time I tested the correctness of results
>> from my handheld HP calculator?    Hmm, a very, very long time ago.
>> That's software.  I tend to believe the messages I read in my inbox
>> are actually the message sent, and hardly ever do a checksum on it.
>> But that's software.  Indeed, all science is a social enterprise and
>> so "trust", "reputation", and reliance on the literature (aka "past
>> experience") are not merely unfortunate outcomes of laziness, but an
>> important part of the process.
>>
>> I am certainly am happy to support the notion that "more scientists
>> should be able to program better", so  I am not going to say the
>> entire article is wrong, and I don't disagree with their main
>> conclusions.  But I think they have a fatal flaw in their assumptions
>> and arguments.
>>
>> --Matt Newville
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-use
>
> Exactly!   There is actually a question here that hasn't been made
> explicit.  For whom is this advice intended?  There are all levels of
> programming/programmers in STEM.  Some of my colleagues use Excel for
> everything.  (As in, EVERYTHING.)  Some fewer use Matlab.  Still fewer
> use C/Fortran/Java/C#/whatever.  So far as I know, I'm the one lone
> Pythonista.  Each group uses programming differently.
>
> I've been programming for more than 50 years.  I've taught programming
> to engineers in several contexts over the years.  For a time, I really
> wanted to 'do it right.'  (I even taught 'structured programming' and
> 'Warnier-Orr' at one point, but realized that it was worse than useless
> for the particular audience.)  I've come to realize that most engineers
> just want an answer.  They are not interested in how gracefully the
> answer was arrived at.  MOST programs written by MOST engineers are
> small, short, simple, and intended to solve one problem one time.  (The
> deficiency I've most often seen is the lack of error checking for the
> answer, and better programming techniques would not generally help much.)
>
> The problem is that nobody sets out to write a "well respected"
> program.  Someone sets out to scratch a particular itch ('one problem
> one time').  It expands.  Others find it useful.  It becomes widely
> used.  The original author, however, was solving his/her own particular
> problem, and was not at all interested in "proper" programming.  So, I
> guess my question is, how do we find that person who is going to write
> the "well respected" program and convince him/her to take time out and
> learn proper programming first? Because we are certainly not going to
> convince everybody to do it.
>
> john

I had the same impression as Matt about the article, but his writing
is clearer than my thinking.

For statistics and econometrics (and some economics), there are
researcher who develop tools and some who write tools and sometimes
they are the same.

R, Stata, SAS and matlab have support for user contributions,
journals, conferences, distribution channels.
Developers of new algorithms, statistical tests or estimators have an
incentive to see that the code goes to potential users because it
boosts adoption and with it the number of citations.

some examples
open source maybe without source control, unit tests and without license

http://ideas.repec.org/s/boc/bocode.html
http://www.feweb.vu.nl/econometriclinks/software.html#GAUSS
http://www.unc.edu/~jbhill/Gauss_by_code.htm
Alan Isaac had a Gauss program page, but I cannot find it anymore

example bocode and Stata Journal
Stata is very good in supporting user code
with peer review on the mailing lists (besides the articles)
and if everybody else is using it, then it must be "correct"

Josef

>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user