I wanted to send an update to this list regarding the meeting at
Berkeley that I attended. A lot of good disscussions took place at the
meeting that should stimulate larger feedback. Personally, I had far
more to discuss before I had to leave, and so I hope that the
discussions can continue.
I was looking to try and understand why with an increasing number of
Scientific users of Python, relatively few people actually seem to want
to contribute to scipy, regularly, even becoming active developers.
There are lots of people who seem to identify problems (though very
often vague ones), but not many who seem able (either through time or
interest contraints) to actually contribute to code, documentation, or
infrastructure.
Scipy is an Open source project and relies on the self-selection process
of open source contributors.
It would seem that while the scipy conference demonstrates a continuing
and even increasing use of Python for scientific computing, not as many
of these users are scipy devotees. Why?
I think the answers come down to a few issues which I will attempt to
answer with proposals.
1) Plotting -- scipy's plotting wasn't good enough (we knew that) and
the promised solution (chaco) took too long to emerge as a simple
replacement. While the elements were all there for chaco to work, very
few people knew that and nobody stepped up to take chaco to the level
that matplotlib, for example, has reached in terms of cross-gui
applicability and user-interface usability.
Proposal:
Incorporate matplotlib as part of the scipy framework (replacing plt).
Chaco is not there anymore and the other two plotting solutions could
stay as backward compatible but not progressing solutions. I have not
talked to John about this, though I would like to. I think if some
other packaging issues are addressed we might be able to get John to
agree.
2) Installation problems -- I'm not completely clear on what the
"installation problems" really are. I hear people talk about them, but
Pearu has made significant strides to improve installation, so I'm not
sure what precise issues remain. Yes, installing ATLAS can be a pain,
but scipy doesn't require it. Yes, fortran support can be a pain, but
if you use g77 then it isn't a big deal. The reality, though, is that
there is this perception of installation trouble and it must be based on
something. Let's find out what it is. Please speak up users of the
world!!!!
Proposal (just an idea to start discussion):
Subdivide scipy into several super packages that install cleanly but can
also be installed separately. Implement a CPAN-or-yum-like repository
and query system for installing scientific packages.
Base package:
scipy_core -- this super package should be easy to install (no Fortran)
and should essentially be old Numeric. It was discussed at Berkeley,
that very likely Numeric3 should just be included here. I think this
package should also include plotting, weave, scipy_distutils, and even
f2py.
Some of these could live in dual namespaces (i.e. both weave and
scipy.weave are available on install).
scipy.traits
scipy.weave (weave)
scipy.plt (matplotlib)
scipy.numeric (Numeric3 -- uses atlas when installed later)
scipy.f2py
scipy.distutils
scipy.fft
scipy.linalg? (include something like lapack-lite for basic but
slow functionality, installation of
improved package replaces this with atlas usage)
scipy.stats
scipy.util (everything else currently in scipy_core)
scipy.testing (testing facilities)
Each of these should be a separate package installable and distributable
separately (though there may be co-dependencies so that scipy.plt would
have to be distributed with scipy.
Libraries (each separately installable).
scipy.lib -- there should be several sub-packages that could live under
hear. This is simply raw code with basic wrappers (kind of like a /usr/lib)
scipy.lib.lapack -- installation also updates narray and linalg
(hooks to do that)
scipy.lib.blas -- installation updates narray and linalg
scipy.lib.quadpack
etc...
Extra sub-packages: named in a hierarchy to be determined and probably
each dependent on a variety of scipy-sub-packages.
I haven't fleshed this thing out yet as you can tell. I'm mainly
talking publicly to spur discussion. The basic idea is that we should
force ourselves to distribute scipy in separate packages. This would
force us to implement a yum-or-CPAN-like package repository, so that we
define the interface as to how an additional module could be developed
by someone, even maintained separately (with a different license), and
simply inserted into an intelligent point under the scipy infrastructure.
It also allow installation/compilation issues to be handled on a more
per-module basis so that difficult ones could be noted. I think this
would also help interested people get some of the enthought stuff put in
to the scipy hierarchy as well.
Thoughts and comments (and even half-working code) welcomed and
encouraged...
-Travis O.