[Numpy-discussion] Future directions for SciPy in light of meeting at Berkeley

Tue Mar 8 23:34:07 EST 2005

I wanted to send an update to this list regarding the meeting at 
Berkeley that I attended.  A lot of good disscussions took place at the 
meeting that should stimulate larger feedback.  Personally, I had far 
more to discuss before I had to leave, and so I hope that the 
discussions can continue.

I was looking to try and understand why with an increasing number of 
Scientific users of Python, relatively few people actually seem to want 
to contribute to scipy, regularly, even becoming active developers.      
There are lots of people who seem to identify problems (though very 
often vague ones), but not many who seem able (either through time or 
interest contraints) to actually contribute to code, documentation, or 
infrastructure. 

Scipy is an Open source project and relies on the self-selection process 
of open source contributors.

It would seem that while the scipy conference demonstrates a continuing 
and even increasing use of Python for scientific computing,  not as many 
of these users are scipy devotees.   Why?

I think the answers come down to a few issues which I will attempt to 
answer with proposals.

1) Plotting -- scipy's plotting wasn't good enough (we knew that) and 
the promised solution (chaco) took too long to emerge as a simple 
replacement.  While the elements were all there for chaco to work, very 
few people knew that and nobody stepped up to take chaco to the level 
that matplotlib, for example, has reached in terms of cross-gui 
applicability and user-interface usability.

Proposal:  

Incorporate matplotlib as part of the scipy framework (replacing plt).  
Chaco is not there anymore and the other two plotting solutions could 
stay as backward compatible but not progressing solutions.  I have not 
talked to John about this, though I would like to.   I think if some 
other packaging issues are addressed we might be able to get John to 
agree.  

2) Installation problems -- I'm not completely clear on what the 
"installation problems" really are.  I hear people talk about them, but 
Pearu has made significant strides to improve installation, so I'm not 
sure what precise issues remain.  Yes, installing ATLAS can be a pain, 
but scipy doesn't require it.  Yes, fortran support can be a pain, but 
if you use g77 then it isn't a big deal.    The reality, though, is that 
there is this perception of installation trouble and it must be based on 
something.   Let's find out what it is.  Please speak up users of the 
world!!!!

Proposal (just an idea to start discussion):

Subdivide scipy into several super packages that install cleanly but can 
also be installed separately.  Implement a CPAN-or-yum-like repository 
and query system for installing scientific packages.

Base package:

scipy_core -- this super package should be easy to install (no Fortran) 
and should essentially be old Numeric.  It was discussed at Berkeley, 
that very likely Numeric3 should just be included here.  I think this 
package should also include plotting, weave, scipy_distutils, and even 
f2py.  

Some of these could live in dual namespaces (i.e. both weave and 
scipy.weave are available on install).

scipy.traits
scipy.weave      (weave)
scipy.plt            (matplotlib)
scipy.numeric   (Numeric3  -- uses atlas when installed later)
scipy.f2py        
scipy.distutils
scipy.fft
scipy.linalg?      (include something like lapack-lite for basic but 
slow functionality, installation of
                            improved package replaces this with atlas usage)
scipy.stats          
scipy.util            (everything else currently in scipy_core)
scipy.testing      (testing facilities)

Each of these should be a separate package installable and distributable 
separately (though there may be co-dependencies so that scipy.plt would 
have to be distributed with scipy.

Libraries (each separately installable).

scipy.lib -- there should be several sub-packages that could live under 
hear.  This is simply raw code with basic wrappers (kind of like a /usr/lib)

scipy.lib.lapack     -- installation also updates narray and linalg 
(hooks to do that)
scipy.lib.blas  --  installation updates narray and linalg
scipy.lib.quadpack

etc...

Extra sub-packages: named in a hierarchy to be determined and probably 
each dependent on a variety of scipy-sub-packages.

I haven't fleshed this thing out yet as you can tell.  I'm mainly 
talking publicly to spur discussion.  The basic idea is that we should 
force ourselves to distribute scipy in separate packages.  This would 
force us to implement a yum-or-CPAN-like package repository, so that we 
define the interface as to how an additional module could be developed 
by someone, even maintained separately (with a different license), and 
simply inserted into an intelligent point under the scipy infrastructure.

It also allow installation/compilation issues to be handled on a more 
per-module basis so that difficult ones could be noted.    I think this 
would also help interested people get some of the enthought stuff put in 
to the scipy hierarchy as well.

Thoughts and comments (and even half-working code) welcomed and 
encouraged...

-Travis O.