[SciPy-User] peer review of scientific software

Sun Jun 2 14:38:09 EDT 2013

On Sun, Jun 2, 2013 at 12:00 PM, zetah <otrov at hush.ai> wrote:

> Thomas Kluyver wrote:
> >'type of users' might have been a more accurate phrase, but it has an
> >unfortunate negative ring that I wanted to avoid. There are a lot of
> people
> >doing important data analysis in quite risky and hard-to-maintain ways.
> >Using spreadsheets where some simple code might be more reliable is one
> >symptom of that, and there have been a couple of major examples from
> >economics where spreadsheet errors led to serious mistakes.
> >The discussion is revolving roughly around whether and how we can push
> >those users towards better tools and methods, like coding, version control
> >and testing.
>
> Thanks for overview Thomas, I read all emails on the subject and will
> comment briefly, for the sake of my participation, although topic is huge
>
> I don't have experience with critical modeling, but I do and learn data
> analysis with historical data and generally.
>
> If we speak about errors, I think that most of it, like taught in
> Numerical analysis course, are due to human factor not understanding data
> types and also variety of data sources representing data differently.
> Trivial example that sql and netcdf databases represent same data in
> different format. Similarly for other data sources which in turn can be
> just plain text dumps. If that is handled correctly and user is familiar
> with the tool used, there shouldn't be any surprises.
>

At least when no one checks ;) The errors that the gods of analysis gift to
us are often hidden away and are easy to overlook. They also tend to creep
in when one is overconfident. It's all part of the devine sense of humor.

>
> If it is of any interest, I thought to generalize my usual workflow, as
> single user example (hope it's not useless):
>  - collecting data: if not directly available I use Python, and depending
> on source do validation. I don't change format if it's not necessary.
>  - pre-processing: if I preprocess (usually with Python), I store data to
> sql server.
>  - using data: single set or multiple datasets in PowerPivot (limited just
> by amount of RAM), where DAX allows calculations on pivoted views values. I
> haven't yet found any other tool that allows such diverse views in such
> short time.
>  - post-processing: when needed I export results to CSV. Usually to just
> load in numpy array and plot with Matplotlib, or 3D viewing in VisIt or
> Gephi.
>  - versioning: data in source database(s) stays intact, and all
> calculations can be saved to a file (with values), and then opened again
> even if datasource is not available.
>
> So I use Excel mainly for data manipulation and Python back and forth.
> Also I use additional tools for 3D visualization.
> I never liked to learn about versioning systems, and I'm happy with my
> current scheme
>

I confess to my shame that I have never learned to use a spreadsheet for
any but the simplest things. It's just so darn complicated ;)

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20130602/48da6418/attachment.html>