Re: Meta: too many numerical libraries doing the same thing?

Yes, this issue has been raised here before. It was the main conclusion of Paul Barrett's and my BOF session at ADASS a 5 years ago (see our report at http://oobleck.astro.cornell.edu/jh/ast/papers/idae96.ps). The main problems are that we scientists are too individualistic to get organized around a single library, too pushed by job pressures to commit much concentrated time to it ourselves, and too poor to pay the architects, coders, doc writers, testers, etc. to write it for us. Socially, we *want* to reinvent the wheel, because we want to be riding on our own wheels. Once we are riding reasonably well for our own needs, our interest and commitment vanishes. We're off to write the next paper. Following that conference, I took a poll on this list looking for help to implement the library. About half a dozen people responded that they could put in up to 10 hours a week, which in my experience isn't enough, once things get hard and attrition sets in. Nonetheless, Paul and I proposed to the NASA Astrophysics Data Analysis Program to hire some people to write it, but we were turned down. We proposed the idea to the head of the High Energy Astrophysics group at NASA Goddard, and he agreed -- as long as what we were really doing was writing software for his group's special needs. The frustrating thing is how many hundreds of astronomy projects hire people to do their 10% of this problem, and how unwilling they are to pool resources to do the general problem. A few of the volunteers in my query to this list have gone on to do SciPy, to their credit, but I don't see them moving in the direction we outlined. Still, they have the capacity to do it right in Python and compiled code written explicitly for Python. They won't solve the general problem, but they may solve the first problem, namely getting a data analysis environment that is OSS and as good as IDL et al. in terms of end-to-end functionality, completeness, and documentation. I like the notion that the present list is for designing and building the underlying language capabilities into Python, and for getting them standardized, tested, and included in the main Python distribution. It is also a good place for debating the merits of different implementations of particular functionality. That leaves the job of building coherent end-user data analysis packages (which necessarily have to pick one routine to be called "fft", one device-independent graphics subsystem, etc.) to application groups like SciPy. There can be more than one of these, if that's necessary, but they should all use the same underlying numerical language capability. I hope that the application groups from several array-based OSS languages will someday get together and collaborate on an ueberlibrary of numerical and graphics routines (the latter being the real sticking point) that are easily wrapped by most languages. That seems backwards, but I think the social reality is that that's the way it is going to be, if it ever happens at all. --jh--

There is more to this issue than meets the eye, both technically and historically. For numerical algorithms to be available independent of language, they would have to be packaged as components such as COM objects. While there is research in this field, nobody knows whether it can be done is a way that is efficient enough. For a given language like C, C++, Eiffel or Fortran used as the speed-demon base for wrapping up in Python, there are some difficult technical issues. Reusable numerical software needs context to operate and there is no decent way to supply the context in a non-object-oriented language. Geoff Furnish wrote a good paper about the issue for C++ showing the way to truly reusable libraries in that language, and recent improvements in Eiffel make it easier to do there now. In C or Fortran you simply can't do it. (Note that Eiffel or C++ versions of some NAG routines typically have methods with one or two arguments while the C or Fortran ones have 15 or more; a routine is not reusable if you have to understand that many arguments to try it. There are also important issue with regard to error handling and memory). The second issue is the algorithmic issue: most scientists do NOT know the right algorithms to use, and the ones they do use are often inferior. The good algorithms are for the most part in commercial libraries, and the numerical analysis literature, where they were written by numerical analysts. Often the coding from both sources is unavailable for free use, in the wrong language, and/or wretched. The commerical libraries also exist because some companies have requirements for fiduciary responsibility; in effect, they need a guarantor of the software to show that they have not carelessly depended on software of unknown quality. In short, computer scientists are not going to be able to write such a library without an army of numerical analysts familiar with the literature, and the numerical analysts aren't going to write it unless they are OO-experienced, which almost all of them aren't, so far. Most people when they discuss mathematical software think of leaves on the call tree. In fact the most useful mathematical software, in the sense that it incorporates the most expertise, is middleware such as ODE solvers, integrators, root finders, etc. The algorithm itself will have many controls, optional outputs, etc. This requires a library-wide design motif. I thus feel there are perfectly good reasons not to expect such a library soon. The Python community could do a good OO-design using what is available (such as LAPACK) but we haven't -- all the contributions are functional.

There is more to this issue than meets the eye, both technically and historically. For numerical algorithms to be available independent of language, they would have to be packaged as components such as COM objects. While there is research in this field, nobody knows whether it can be done is a way that is efficient enough. For a given language like C, C++, Eiffel or Fortran used as the speed-demon base for wrapping up in Python, there are some difficult technical issues. Reusable numerical software needs context to operate and there is no decent way to supply the context in a non-object-oriented language. Geoff Furnish wrote a good paper about the issue for C++ showing the way to truly reusable libraries in that language, and recent improvements in Eiffel make it easier to do there now. In C or Fortran you simply can't do it. (Note that Eiffel or C++ versions of some NAG routines typically have methods with one or two arguments while the C or Fortran ones have 15 or more; a routine is not reusable if you have to understand that many arguments to try it. There are also important issue with regard to error handling and memory). The second issue is the algorithmic issue: most scientists do NOT know the right algorithms to use, and the ones they do use are often inferior. The good algorithms are for the most part in commercial libraries, and the numerical analysis literature, where they were written by numerical analysts. Often the coding from both sources is unavailable for free use, in the wrong language, and/or wretched. The commerical libraries also exist because some companies have requirements for fiduciary responsibility; in effect, they need a guarantor of the software to show that they have not carelessly depended on software of unknown quality. In short, computer scientists are not going to be able to write such a library without an army of numerical analysts familiar with the literature, and the numerical analysts aren't going to write it unless they are OO-experienced, which almost all of them aren't, so far. Most people when they discuss mathematical software think of leaves on the call tree. In fact the most useful mathematical software, in the sense that it incorporates the most expertise, is middleware such as ODE solvers, integrators, root finders, etc. The algorithm itself will have many controls, optional outputs, etc. This requires a library-wide design motif. I thus feel there are perfectly good reasons not to expect such a library soon. The Python community could do a good OO-design using what is available (such as LAPACK) but we haven't -- all the contributions are functional.
participants (2)
-
Joe Harrington
-
Paul F. Dubois