[Matrix-SIG] Re: a full-blown interactive data analysis environment

Joe Harrington jh@oobleck.tn.cornell.edu
Mon, 8 Feb 1999 16:55:50 -0500

Thanks to Travis Oliphant for urging me on.  I've been dealing with
yet another (and another after that!) IDL core dump that had stopped
my work.  Reminds me why I'm doing this!  This is in response to the
posts of 19Jan99...

I appreciate the sentiments of Perry Greenfield and Paul DuBois, but I
think they misinterpret what I am saying, or are missing a crucial
point.  Of course we should write our science analysis code in the
interactive languages; that's why we're here in the first place!
Nobody is suggesting we go back to C for that.  That code is usually
either fire-and-forget, or is only run in a few places, or both.  It's
only when transportability is the key issue that the logic swings the
other way, and I'd cite IDL in astronomy as a prime piece of evidence
in favor of not repeating that mistake yet again.  IDL is a terrible
language.  The syntax is awful.  It costs a thousand dollars for their
cheapest license (try teaching a class on that!).  The bug fix time is
months, and all the other arguments against non-free software apply:
too slow response from the developers (see above), you can't fix it
yourself if it's broken, if you implement in it only people with a
license to their product can run your code, etc.

My goal here is to get a core of standard routines that are used in
many fields of science into a coherent package with good docs, and to
integrate it initially with Python.  The point is that then *any*
language, now or in the future, can be a "real" science language, just
by interfacing to the package.  At ADASS IV, we heard from a Sun
presenter about how Java would soon rule the world and solve all our
problems.  When I asked about support for numerics, he said to an
audience of several hundred astronomy programmers, "uhhh..use
Fortran".  With this package and a little work, they could easily
extend their language to cover numerics, and do it well.  So could
Guile, Perl, and a host of others.  Without such a package, language
implementors look at the job of doing it themselves, and give up.

Fortunately, it's not a matter (as Perry and Paul say or imply) of
"expecting developers to confine themselves", it's a matter of finding
developers who are interested in this approach.  We don't have to
worry about a "middle layer" of interpreted code from the developers.
That's a winning strategy for science analysis projects, but a losing
one for anything that is to survive long-term (meaning decades).  I'm
in full agreement with points a and d of Paul DuBois's posting, and
the whole reason I'm trying to start this effort here is that I think
it will strengthen NumPy to the point where it can become a usable
language for the general scientist, and that it's the best available
in terms of language capability.  Marble is better than salt for
building monuments, though salt is easier to work and tastes better.

Such a project is significant.  However, gathering and integrating is
not more than twice as large a project as writing everything from
scratch in the current interactive language.  And it only needs to be
done once, ever.  I'm committed to creating a coherent package of
widely-used numerical methods implemented as compiled-language
routines and interfaced eventually to multiple languages.  If another
path is chosen, someone else will have to take on the task of
coordinating it.  Assuming that enough others are interested in this
approach, here's what I see as the next steps:

	-Determine how and where to organize ourselves, and set those
         resources up (mailing lists, web sites, etc.).

	-Figure out how best to lay out the umbrella package, and what
         a "leaf" package looks like.  The key item here is to make it
         flexible enough that when we have future interactive
         environments, we can just modify the install programs and
         have them create the environment around that new interpreter.

	-Identify the components we need in the first release
         (numerics: at least FFT, interpolation, simple fitting, a
         Romberg integrator; graphics: at least plotting image
         display, color table manipulation, cursor readback, and some
         form of widget).

	-Distribute the work of implementing the previous two items
         and of integration and testing.

Looking at the first item, I'd be interested in a very quick list of
software efforts we think have been successful that have been
organized by volunteers working a few hours a week.  Gimp springs to
mind, and perhaps some of the window managers like Afterstep.  Has
anyone been involved in such an effort?  Did they do anything unique
that helped them out?

For the second item, I see the following needs:

documentation with images, formulae, and hyperlinks
	It seems reasonable to make whatever we use generate HTML,
	since all platforms support it.  Do people think TeXinfo will
	stand the test of time?  Its ability to handle formulae is

Windows and Mac support
	Who has any experience building software here?  Can we use
	make/autoconf with these systems and do they work well?  How
	far away from g++ is C++ on the Mac and PC's various
	compilers?  Is g++ a good thing to standardize upon?  Do we
	need to pick a compiler and implement for it, or has C++
	become standard enough not to worry?

"leaf" packages
	Each should have its own docs, code, examples, test scripts,
	and wrapper generation files.  Code and docs get built and
	installed into a location that might be shared with code,
	docs, and examples from other packages, for assembly into an
	intermediate collection of related packages (a "branch"?).
	This lets us distribute precompiled bite-sized package updates
	without recompiling or reinstalling the whole thing.

There's lots more under this item, of course, but this should be
enough to get a discussion started.  

For the third item, what do people see as *crucial* for a first
release?  In my view, the first release should define the structure we
will be implementing into and provide several useful packages that can
get people started with some basic tasks.  It should have good docs,
and generally carry the aura of something professional, not slapdash.
I'd rather it be small and elegant than large and sloppy, as it's
easier to grow larger than to fix a large amount of sloppiness, given
the tendency of net programmers to jump on the successful bandwagons.

Joe Harrington
326 Space Sciences Building
Cornell University
Ithaca, NY 14853-6801
(607) 255-5913 office
(607) 255-9002 fax
jh@alum.mit.edu (permanent)