- cluster : low maintenance cost, small. not sure about usage, quality.
I think cluster overlaps with scikits-learn quite a bit. It basically contains a K-means vector quantization code with functionality that I suspect exists in scikits-learn. I would recommend deprecation and removal while pointing people to scikits-learn for equivalent functionality (or moving it to scikits-learn).
- ndimage : difficult one. hard to understand code, may not see much development either way.
This overlaps with scikits-image but has quite a bit of useful functionality on its own. The package is fairly mature and just needs maintenance.
- spatial : kdtree is widely used, of good quality. low maintenance cost.
Good to hear maintenance cost is low.
- odr : quite small, low cost to keep in core. pretty much done as far as I can tell.
Agreed.
- maxentropy : is deprecated, will disappear.
Great.
- signal : not in great shape, could be viable independent package. On the other hand, if scikits-signal takes off and those developers take care to improve and build on scipy.signal when possible, that's OK too.
What are the needs of this package? What needs to be fixed / improved? It is a broad field and I could see fixing scipy.signal with a few simple algorithms (the filter design, for example), and then pushing a separate package to do more advanced signal processing algorithms. This sounds fine to me. It looks like I can put attention to scipy.signal then, as It was one of the areas I was most interested in originally.
- weave : no point spending any effort on it. keep for backwards compatibility only, direct people to Cython instead.
Agreed. Anyway we can deprecate this for SciPy 1.0?
Overall, I don't see many viable independent packages there. So here's an alternative to spending a lot of effort on reorganizing the package structure:
1. Formulate a coherent vision of what in principle belongs in scipy (current modules + what's missing).
O.K. so SciPy should contain "basic" modules that are going to be needed for a lot of different kinds of analysis to be a dependency for other more advanced packages. This is somewhat vague, of course.
What do others think is missing? Off the top of my head: basic wavelets (dwt primarily) and more complete interpolation strategies (I'd like to finish the basic interpolation approaches I started a while ago). Originally, I used GAMS as an "overview" of the kinds of things needed in SciPy. Are there other relevant taxonomies these days?
2. Focus on making it easier to contribute to scipy. There are many ways to do this; having more accessible developer docs, having a list of "easy fixes", adding info to tickets on how to get started on the reported issues, etc. We can learn a lot from Sympy and IPython here.
Definitely!
3. Recognize that quality of code and especially documentation is important, and fill the main gaps.
Is there a write-up of recognized gaps here that we can start with?
4. Deprecate sub-modules that don't belong in scipy (anymore), and remove them for scipy 1.0. I think that this applies only to maxentropy and weave.
I think it also applies to cluster as described above.
5. Find a clear (group of) maintainer(s) for each sub-module. For people familiar with one module, responding to
tickets and pull requests for that module would not cost so much time.
Is there a list where this is kept?
In my opinion, spending effort on improving code/documentation quality and attracting new developers (those go hand in hand) instead of reorganizing will have both more impact and be more beneficial for our users.
Agreed. Thanks for the feedback.
Best,
-Travis